Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncing vscode #16

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
e4ac29b
chnging to match the online code
JLJ55 Nov 8, 2024
0a654e4
Update main.py
JLJ55 Nov 11, 2024
a940502
Update config.yaml
JLJ55 Nov 11, 2024
c1d0258
Update MLproject
JLJ55 Nov 11, 2024
35246c1
Update MLproject
JLJ55 Nov 11, 2024
549eab3
Update conda.yml
JLJ55 Nov 11, 2024
0b1b32b
Create run_mlflow.py
JLJ55 Nov 11, 2024
1a90e22
Update MLproject
JLJ55 Nov 11, 2024
32380b5
Update MLproject
JLJ55 Nov 11, 2024
953fe85
Changing packages
JLJ55 Nov 11, 2024
395fdd2
changing packages
JLJ55 Nov 11, 2024
eb9bda1
changing hydra to get mflow to run
JLJ55 Nov 11, 2024
9ad6683
chaging steps to download
JLJ55 Nov 11, 2024
6091049
changing hydra
JLJ55 Nov 11, 2024
cae3a22
changing steps to fix error message
JLJ55 Nov 11, 2024
d9821ee
changing steps
JLJ55 Nov 11, 2024
fcac3ba
changing things to get mflow to work correctly
JLJ55 Nov 11, 2024
93499b5
mlflow not working
JLJ55 Nov 11, 2024
216fa04
making changes to the code to get mlflow to run
JLJ55 Nov 11, 2024
1b7512f
changing steps
JLJ55 Nov 11, 2024
d6d8f4c
making changes to packages
JLJ55 Nov 11, 2024
1c5611c
making changes to hydra
JLJ55 Nov 11, 2024
56cf278
changing steps to all
JLJ55 Nov 11, 2024
70453c3
updating
JLJ55 Nov 12, 2024
20967b7
updating MLproject to fix mflow error
JLJ55 Nov 12, 2024
4af1bac
editing the respository code line
JLJ55 Nov 12, 2024
b972783
updating pyarrow
JLJ55 Nov 12, 2024
39beef6
Making changes per announcements on WGU
JLJ55 Nov 12, 2024
2c88325
editing github link
JLJ55 Nov 12, 2024
d51c2e7
Update conda.yml
JLJ55 Nov 15, 2024
899e2f1
Update conda.yml
JLJ55 Nov 15, 2024
bc8a625
Update config.yaml
JLJ55 Nov 15, 2024
dc03024
Update conda.yml
JLJ55 Nov 15, 2024
d84e8fd
Update conda.yml
JLJ55 Nov 15, 2024
66e8549
Update conda.yml
JLJ55 Nov 15, 2024
72a7bc1
Update run.py
JLJ55 Nov 15, 2024
0efa9b5
juypter notebooks file update
JLJ55 Nov 15, 2024
31e3268
eda file updated
JLJ55 Nov 15, 2024
074a66f
editing main.py per step 2
JLJ55 Nov 15, 2024
f3af590
Update main.py
JLJ55 Nov 15, 2024
58c2761
Update main.py
JLJ55 Nov 15, 2024
9f5ade9
Update test_data.py
JLJ55 Nov 15, 2024
7ff4c84
Update main.py
JLJ55 Nov 17, 2024
bab81b3
updating results after running mflow
JLJ55 Nov 17, 2024
4a3b94f
Update conda.yml
JLJ55 Nov 17, 2024
7086c11
Update conda.yml
JLJ55 Nov 17, 2024
cdfbd36
Update conda.yml
JLJ55 Nov 17, 2024
9581da9
Update main.py
JLJ55 Nov 17, 2024
6355046
Update main.py
JLJ55 Nov 17, 2024
8b1ee03
Update run.py
JLJ55 Nov 17, 2024
7f3f781
Update main.py
JLJ55 Nov 17, 2024
ca5df74
Update main.py
JLJ55 Nov 17, 2024
6296a16
Update conda.yml
JLJ55 Nov 17, 2024
b2cd812
Update main.py
JLJ55 Nov 17, 2024
8a4bd25
Update main.py
JLJ55 Nov 17, 2024
757645e
Update conda.yml
JLJ55 Nov 17, 2024
60616c2
Update conda.yml
JLJ55 Nov 17, 2024
6fc17eb
Update main.py
JLJ55 Nov 17, 2024
5c462c6
eda-checkpoint.ipynb
JLJ55 Nov 17, 2024
db2d2b2
Update main.py
JLJ55 Nov 17, 2024
2b9e33b
Update main.py
JLJ55 Nov 17, 2024
4b55d56
Update run.py
JLJ55 Nov 18, 2024
4a7fcb2
Update run.py
JLJ55 Nov 18, 2024
23093e8
Update run.py
JLJ55 Nov 18, 2024
cc58667
Update run.py
JLJ55 Nov 18, 2024
1561fd9
Update main.py
JLJ55 Nov 18, 2024
694e817
Update main.py
JLJ55 Nov 18, 2024
38c8578
Add test_regression_model component
JLJ55 Nov 18, 2024
62a14a2
Update main.py
JLJ55 Nov 18, 2024
2e0b79c
Update main.py
JLJ55 Nov 18, 2024
8e3dd62
Update conda.yml
JLJ55 Nov 18, 2024
8964c1c
Update conda.yml
JLJ55 Nov 18, 2024
dbdaa46
Update conda.yml
JLJ55 Nov 18, 2024
57a4b3a
Update conda.yml
JLJ55 Nov 18, 2024
69b3017
changes made to get mlflow to work
JLJ55 Nov 18, 2024
f63730f
Resolved merge conflicts in conda files
JLJ55 Nov 18, 2024
062a961
Update run.py
JLJ55 Nov 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions MLproject
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,14 @@ conda_env: conda.yml
entry_points:
main:
parameters:

steps:
description: Comma-separated list of steps to execute (useful for debugging)
description: "Comma-separated list of steps to execute (useful for debugging)"
type: str
default: all

hydra_options:
description: Other configuration parameters to override
description: "Other configuration parameters to override"
type: str
default: ''
default: ""
command: "python main.py main.steps='{steps}' {hydra_options}"


command: "python main.py main.steps=\\'{steps}\\' $(echo {hydra_options})"
13 changes: 13 additions & 0 deletions components/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,17 @@ channels:
- conda-forge
- defaults
dependencies:
<<<<<<< HEAD
- mlflow=2.8.1
- python=3.10
- numpy = 1.24
=======
- python=3.10
- pyyaml
- hydra-core=1.3.2
- pip=23.3.1
- numpy = 1.24
- pip:
- mlflow==2.8.1
- wandb==0.16.0
>>>>>>> 57a4b3afcce7caa68c2a4421e4eb0fd0c923ecb9
4 changes: 3 additions & 1 deletion components/get_data/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ dependencies:
- pip=23.3.1
- requests=2.24.0
- pyarrow
- numpy = 1.24
- python=3.10
- pip:
- mlflow==2.8.1
- wandb==0.16.0
- git+https://github.com/udacity/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
- git+https://github.com/JLJ55/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
6 changes: 3 additions & 3 deletions components/test_regression_model/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ channels:
- conda-forge
- defaults
dependencies:
- python=3.10.0
- python=3.10
- pip=23.3.1
- requests=2.24.0
- scikit-learn=1.3.2
- pandas=2.1.3
- numpy=1.24
- pip:
- mlflow==2.8.1
- wandb==0.16.0
- git+https://github.com/udacity/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
- git+https://github.com/JLJ55/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
5 changes: 3 additions & 2 deletions components/train_val_test_split/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ channels:
- conda-forge
- defaults
dependencies:
- python=3.10.0
- python=3.10
- pip=23.3.1
- requests=2.24.0
- scikit-learn=1.3.2
- numpy=1.24
- pip:
- mlflow==2.8.1
- wandb==0.16.0
- git+https://github.com/udacity/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
- git+https://github.com/JLJ55/Project-Build-an-ML-Pipeline-Starter.git#egg=wandb-utils&subdirectory=components
2 changes: 1 addition & 1 deletion components/train_val_test_split/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,4 @@ def go(args):

args = parser.parse_args()

go(args)
go(args)
6 changes: 6 additions & 0 deletions conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,15 @@ channels:
- defaults
dependencies:
- python=3.10
- numpy=1.24
- pyyaml
- hydra-core=1.3.2
- pip=23.3.1
- numpy = 1.24
- pip:
- mlflow==2.8.1
- wandb==0.16.0
- pyarrow==7.0.0



6 changes: 3 additions & 3 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
main:
components_repository: "https://github.com/udacity/Project-Build-an-ML-Pipeline-Starter.git#components"
components_repository: "https://github.com/JLJ55/Project-Build-an-ML-Pipeline-Starter.git#components"
# All the intermediate files will be copied to this directory at the end of the run.
# Set this to null if you are running in prod
project_name: nyc_airbnb
experiment_name: development
steps: all
steps: download
etl:
sample: "sample1.csv"
min_price: 10 # dollars
Expand Down Expand Up @@ -35,4 +35,4 @@ modeling:
criterion: squared_error
max_features: 0.5
# DO not change the following
oob_score: true
oob_score: true
100 changes: 60 additions & 40 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import json

import mlflow
import tempfile
import os
Expand All @@ -13,84 +12,105 @@
"data_check",
"data_split",
"train_random_forest",
# NOTE: We do not include this in the steps so it is not run by mistake.
# You first need to promote a model export to "prod" before you can run this,
# then you need to run this step explicitly
# "test_regression_model"
"test_regression_model",
]


# This automatically reads in the configuration
@hydra.main(config_name='config')
@hydra.main(config_name="config")
def go(config: DictConfig):

# Setup the wandb experiment. All runs will be grouped under this name
os.environ["WANDB_PROJECT"] = config["main"]["project_name"]
os.environ["WANDB_RUN_GROUP"] = config["main"]["experiment_name"]

# Steps to execute
steps_par = config['main']['steps']
steps_par = config["main"]["steps"]
active_steps = steps_par.split(",") if steps_par != "all" else _steps

# Move to a temporary directory
with tempfile.TemporaryDirectory() as tmp_dir:

if "download" in active_steps:
# Download file and load in W&B
_ = mlflow.run(
f"{config['main']['components_repository']}/get_data",
"main",
version='main',
version="main",
env_manager="conda",
parameters={
"sample": config["etl"]["sample"],
"artifact_name": "sample.csv",
"artifact_type": "raw_data",
"artifact_description": "Raw file as downloaded"
"artifact_description": "Raw file as downloaded",
},
)

if "basic_cleaning" in active_steps:
##################
# Implement here #
##################
pass
_ = mlflow.run(
os.path.join(hydra.utils.get_original_cwd(), "src", "basic_cleaning"),
"main",
parameters={
"input_artifact": "sample.csv:latest",
"output_artifact": "clean_sample.csv",
"output_type": "clean_sample",
"output_description": "Data with outliers and null values removed",
"min_price": config["etl"]["min_price"],
"max_price": config["etl"]["max_price"],
},
)

if "data_check" in active_steps:
##################
# Implement here #
##################
pass
_ = mlflow.run(
os.path.join(hydra.utils.get_original_cwd(), "src", "data_check"),
"main",
parameters={
"csv": "clean_sample.csv:latest",
"ref": "clean_sample.csv:reference",
"kl_threshold": config["data_check"]["kl_threshold"],
"min_price": config["etl"]["min_price"],
"max_price": config["etl"]["max_price"],
},
)

if "data_split" in active_steps:
##################
# Implement here #
##################
pass
_ = mlflow.run(
f"{config['main']['components_repository']}/train_val_test_split",
"main",
parameters={
"input": "clean_sample.csv:latest",
"test_size": config["modeling"]["test_size"],
"random_seed": config["modeling"]["random_seed"],
"stratify_by": config["modeling"]["stratify_by"],
},
)

if "train_random_forest" in active_steps:

# NOTE: we need to serialize the random forest configuration into JSON
rf_config = os.path.abspath("rf_config.json")
with open(rf_config, "w+") as fp:
json.dump(dict(config["modeling"]["random_forest"].items()), fp) # DO NOT TOUCH

# NOTE: use the rf_config we just created as the rf_config parameter for the train_random_forest
# step

##################
# Implement here #
##################
json.dump(dict(config["modeling"]["random_forest"].items()), fp)

pass
_ = mlflow.run(
os.path.join(hydra.utils.get_original_cwd(), "src", "train_random_forest"),
"main",
parameters={
"trainval_artifact": "trainval_data.csv:latest",
"val_size": config["modeling"]["val_size"],
"random_seed": config["modeling"]["random_seed"],
"stratify_by": config["modeling"]["stratify_by"],
"rf_config": rf_config,
"max_tfidf_features": config["modeling"]["max_tfidf_features"],
"output_artifact": "random_forest_export",
},
)

if "test_regression_model" in active_steps:

##################
# Implement here #
##################

pass
_ = mlflow.run(
os.path.join(hydra.utils.get_original_cwd(), "components", "test_regression_model"),
"main",
parameters={
"mlflow_model": "random_forest_export:prod",
"test_dataset": "test_data.csv:latest",
},
)


if __name__ == "__main__":
Expand Down
Empty file added mlflow
Empty file.
Empty file added pytest_results.txt
Empty file.
2 changes: 2 additions & 0 deletions run_mlflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
import os
os.system("mlflow run . -P steps=download")
Loading