chapters/sec1/1-2-proj-arch.qmd

---
title: "Data Project Architecture"
execute:
    echo: true
    output: true
    message: false
    warning: false
---

## Key Takeaways

This chapter gives an opinionated overview of good design and conceptual layout practices in regards to   a data project. The areas of responsibility within the project are broken out into

  1) _Presentation_,
  2) _Processing_, and
  3) _Data_ layers.

The categories that a given data project may fall into our further divided into

  1) _jobs_,
  2) _apps_,
  3) _reports_ and
  4) _API's_.

The rest of the chapter discusses how to break a project down into the previously mentioned layers, as well as considerations for optimizing the Processing and Data layers.


## Lab / Project

### Initial Setup

The last chapter, [Environments as Code](./1-1-env-as-code.qmd), introduced the example project that we will use throughout the book. You can either ~~clone a starter template for~~ fork the project repo from <a href="https://github.com/ruralinnovation/do4ds_project/fork" target="_blank">do4ds_project</a> or create the project from scratch yourself using the following Quarto CLI commands (taken from the <a href="https://quarto.org/docs/websites/" target="_blank">Quarto documentation</a>):
```
quarto create project website do4ds_project
# Choose (don't open) when prompted

quarto preview do4ds_project
```

... if the `quarto preview` command loads a new website in your web browser, go back to the terminal and use `Ctrl+C` to terminate the preview server. Change to the project directory and setup a local python virtual environment (you can grab the <a href="https://raw.githubusercontent.com/ruralinnovation/do4ds_project/main/requirements.txt" target="_blank">`requirements.txt` file from here</a>, if needed):
```
cd do4ds_project
# If using python, create and activate a local virtual environment
python -m venv ./venv
source venv/bin/activate
venv/bin/python -m pip install -r requirements.txt
```

Now that you are in the local project directory you can use the `quarto preview` command without arguments to continue seeing updates to the local project in your browser:
```
quarto preview

# Alternately, if you forked the project sample from Github, you can use npm...
npm run preview
```

If you did not fork the project sample, make sure to create the [`eda.qmd`](https://do4ds.com/chapters/sec1/1-1-env-as-code.html#eda-in-r) and [`model.qmd`](https://do4ds.com/chapters/sec1/1-1-env-as-code.html#modeling-in-python) files from chapter 1 and add them to the sidebar section of `_quarto.yml`:
```
project:
  type: website

website:
  title: "do4ds_project"
  navbar:
    left:
      - href: index.qmd
        text: Home
  sidebar:
    style: "docked"
    search: true
    contents:
      - eda.qmd
      - model.qmd

```

![](../../img/do4ds_project.png)


### Updates

To complete part 1 of the lab, I had to modify the example code. First, I added a line that would generate a `vetiver` model and assign it to `v` and then I changed the path to the local folder where the model could be stored:
```{.python .cell-code}
from pins import board_folder
from vetiver import vetiver_pin_write
from vetiver import VetiverModel

v = VetiverModel(model, model_name = "penguin_model")

model_board = board_folder(
  "data/model",
  allow_pickle_read = True
)
vetiver_pin_write(model_board, v)
```

In addition to these changes, I created a separate Python file with the code to run the `vetiver` API, called `api.py`, which also required updates to the `VetiverApi` call to ensure that the API server had the correct input params in order to process the prediction:
```{.python .cell-code}
from palmerpenguins import penguins
from pandas import get_dummies
from sklearn.linear_model import LinearRegression
from pins import board_folder
from vetiver import VetiverModel
from vetiver import VetiverAPI

# This is how you would reload the model from disk...
b = board_folder('data/model', allow_pickle_read = True)
v = VetiverModel.from_pin(b, 'penguin_model')

# ... however VertiverAPI also uses the model inputs to define params from the prototype
df = penguins.load_penguins().dropna()
df.head(3)
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']

model = LinearRegression().fit(X, y)

v = VetiverModel(model, model_name = "penguin_model", prototype_data = X)

app = VetiverAPI(v, check_prototype = True)
app.run(port = 8000)
```

... and then used `python api.py` to run the API. Once running, you can navigate to [http://127.0.0.1:8000/__docs__]() in a web browser to see the autogenerated API documentation

![](../../img/vetiver_api.png)