-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path1-2-proj-arch.qmd
128 lines (95 loc) · 4.54 KB
/
1-2-proj-arch.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Data Project Architecture"
execute:
echo: true
output: true
message: false
warning: false
---
## Key Takeaways
This chapter gives an opinionated overview of good design and conceptual layout practices in regards to a data project. The areas of responsibility within the project are broken out into
1) _Presentation_,
2) _Processing_, and
3) _Data_ layers.
The categories that a given data project may fall into our further divided into
1) _jobs_,
2) _apps_,
3) _reports_ and
4) _API's_.
The rest of the chapter discusses how to break a project down into the previously mentioned layers, as well as considerations for optimizing the Processing and Data layers.
## Lab / Project
### Initial Setup
The last chapter, [Environments as Code](./1-1-env-as-code.qmd), introduced the example project that we will use throughout the book. You can either ~~clone a starter template for~~ fork the project repo from <a href="https://github.com/ruralinnovation/do4ds_project/fork" target="_blank">do4ds_project</a> or create the project from scratch yourself using the following Quarto CLI commands (taken from the <a href="https://quarto.org/docs/websites/" target="_blank">Quarto documentation</a>):
```
quarto create project website do4ds_project
# Choose (don't open) when prompted
quarto preview do4ds_project
```
... if the `quarto preview` command loads a new website in your web browser, go back to the terminal and use `Ctrl+C` to terminate the preview server. Change to the project directory and setup a local python virtual environment (you can grab the <a href="https://raw.githubusercontent.com/ruralinnovation/do4ds_project/main/requirements.txt" target="_blank">`requirements.txt` file from here</a>, if needed):
```
cd do4ds_project
# If using python, create and activate a local virtual environment
python -m venv ./venv
source venv/bin/activate
venv/bin/python -m pip install -r requirements.txt
```
Now that you are in the local project directory you can use the `quarto preview` command without arguments to continue seeing updates to the local project in your browser:
```
quarto preview
# Alternately, if you forked the project sample from Github, you can use npm...
npm run preview
```
If you did not fork the project sample, make sure to create the [`eda.qmd`](https://do4ds.com/chapters/sec1/1-1-env-as-code.html#eda-in-r) and [`model.qmd`](https://do4ds.com/chapters/sec1/1-1-env-as-code.html#modeling-in-python) files from chapter 1 and add them to the sidebar section of `_quarto.yml`:
```
project:
type: website
website:
title: "do4ds_project"
navbar:
left:
- href: index.qmd
text: Home
sidebar:
style: "docked"
search: true
contents:
- eda.qmd
- model.qmd
```
![](../../img/do4ds_project.png)
### Updates
To complete part 1 of the lab, I had to modify the example code. First, I added a line that would generate a `vetiver` model and assign it to `v` and then I changed the path to the local folder where the model could be stored:
```{.python .cell-code}
from pins import board_folder
from vetiver import vetiver_pin_write
from vetiver import VetiverModel
v = VetiverModel(model, model_name = "penguin_model")
model_board = board_folder(
"data/model",
allow_pickle_read = True
)
vetiver_pin_write(model_board, v)
```
In addition to these changes, I created a separate Python file with the code to run the `vetiver` API, called `api.py`, which also required updates to the `VetiverApi` call to ensure that the API server had the correct input params in order to process the prediction:
```{.python .cell-code}
from palmerpenguins import penguins
from pandas import get_dummies
from sklearn.linear_model import LinearRegression
from pins import board_folder
from vetiver import VetiverModel
from vetiver import VetiverAPI
# This is how you would reload the model from disk...
b = board_folder('data/model', allow_pickle_read = True)
v = VetiverModel.from_pin(b, 'penguin_model')
# ... however VertiverAPI also uses the model inputs to define params from the prototype
df = penguins.load_penguins().dropna()
df.head(3)
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']
model = LinearRegression().fit(X, y)
v = VetiverModel(model, model_name = "penguin_model", prototype_data = X)
app = VetiverAPI(v, check_prototype = True)
app.run(port = 8000)
```
... and then used `python api.py` to run the API. Once running, you can navigate to [http://127.0.0.1:8000/__docs__]() in a web browser to see the autogenerated API documentation
![](../../img/vetiver_api.png)