Update API to deploy separate endpoints by run ID #2

jeancochrane · 2024-04-23T17:19:57Z

This PR updates api.R to deploy multiple /predict endpoints, one for each of a set of valid run IDs. The new API structure looks like this:

POST /predict: Predicts with a default model, configured via the AWS_S3_MODEL_RUN_ID environment variable
POST /predict/<run_id>: Predicts with a model determined by the run ID passed to the endpoint

Maintaining a POST /predict endpoint that falls back to a default model should allow us to deploy this change without disrupting the operation of the workbooks that Valuations is currently using for desk review.

Closes #1.

Deployment steps

jeancochrane · 2024-04-23T17:21:51Z

api.R

-} else {
+} else if (file.exists("secrets/ENV_FILE")) {
  readRenviron("secrets/ENV_FILE")
+} else {
+  readRenviron(".env")
 }
-readRenviron(".env")


The logic here was a bit confusing for local development, where neither /run/secrets/ENV_FILE nor secrets/ENV_FILE exist. I think the new conditional structure should make local development easier, but let me know if I'm misinterpreting something.

I agree this is pretty confusing and underdocumented. If I recall, the file secrets/ENV_FILE is mounted to /run/secrets/ENV_FILE when using compose. This file contains AWS creds specific to the API. During development, these creds would be unnecessary as the user would likely have active AWS creds via aws-mfa.

The .env file is separate and not related. It contains the rest of the setup vars used by compose (API_PORT, CCAO_REGISTRY_URL, etc.) and the final model ID and year. In other words, it's just config stuff, not actually secret. This file is necessary to load during development, but NOT when deployed via compose (compose adds all vars in a .env file to the deployed container). So, the logic here makes sense if you remove the change on line 15.

That makes sense, I refactored in abe5403 to always load this file and only load secrets/ENV_FILE if it exists (similar to /run/secrets/ENV_FILE).

api.R

jeancochrane · 2024-04-23T17:24:34Z

api.R

 api_port <- as.numeric(Sys.getenv("API_PORT", unset = "3636"))
+default_run_id_var_name <- "AWS_S3_MODEL_RUN_ID"


It might be nice to change the name of this env var to something that more clearly marks it as the default (like AWS_S3_DEFAULT_MODEL_RUN_ID), but keeping the same name for now means one fewer thing we have to change during deployment. I'm open to changing it now if you feel strongly about it, however.

Let's change it now and make a list of things we actually need to change when redeploying.

Done in abe5403, with an updated list of deploy steps in the PR body.

jeancochrane · 2024-04-23T17:25:03Z

api.R

+  "2024-02-06-relaxed-tristan",
+  "2024-03-17-stupefied-maya"


Any other run IDs that we should include in this vector for now?

Let's actually include the final 2022 and 2023 models. We can just reproduce and replace the old workbooks for those years.

Cool, done in abe5403. Are the old workbooks even still operational, given that the model version in the (currently static) API has changed since 2022 and 2023?

api.R

renv.lock

dfsnow

@jeancochrane Like the simplicity here, this looks good but for the env management stuff. Let's add an endpoint for all final models we have the necessary artefacts for, that should be 2022 onward.

We'll also need to update the API workbooks to handle the change (pointing to the endpoint with run_id as a param, rather than the default).

dfsnow · 2024-04-23T18:53:09Z

api.R

-} else {
+} else if (file.exists("secrets/ENV_FILE")) {
  readRenviron("secrets/ENV_FILE")
+} else {
+  readRenviron(".env")
 }
-readRenviron(".env")


I agree this is pretty confusing and underdocumented. If I recall, the file secrets/ENV_FILE is mounted to /run/secrets/ENV_FILE when using compose. This file contains AWS creds specific to the API. During development, these creds would be unnecessary as the user would likely have active AWS creds via aws-mfa.

The .env file is separate and not related. It contains the rest of the setup vars used by compose (API_PORT, CCAO_REGISTRY_URL, etc.) and the final model ID and year. In other words, it's just config stuff, not actually secret. This file is necessary to load during development, but NOT when deployed via compose (compose adds all vars in a .env file to the deployed container). So, the logic here makes sense if you remove the change on line 15.

api.R

dfsnow · 2024-04-23T18:56:57Z

api.R

 api_port <- as.numeric(Sys.getenv("API_PORT", unset = "3636"))
+default_run_id_var_name <- "AWS_S3_MODEL_RUN_ID"


Let's change it now and make a list of things we actually need to change when redeploying.

dfsnow · 2024-04-23T18:58:32Z

api.R

+  "2024-02-06-relaxed-tristan",
+  "2024-03-17-stupefied-maya"


Let's actually include the final 2022 and 2023 models. We can just reproduce and replace the old workbooks for those years.

dfsnow · 2024-04-23T19:04:02Z

docker-compose.yaml

I don't remember why we're running privileged: true here, but we should try to remove it.

Done in d8bf46f, let's see what happens!

renv.lock

….yaml

api.R

jeancochrane · 2024-04-25T22:09:30Z

api.R

+  all_endpoints[[i]] <- list(
+    path = glue::glue("{base_url_prefix}/{run$run_id}"),
+    model = model,
+    is_default = run$run_id == default_run$run_id


In a similar vein, it would make a lot more sense to just append another entry to all_endpoints for the default run rather than have to check the is_default bool in all of the iteration blocks that follow, but I couldn't figure out a good way to do this given the way append operations in R seem to require index references (i.e. the all_endpoints[[i]] assignment on line 167).

You should be able to use

all_endpoints <- c(all_endpoints, default_endpoint)`

or:

all_endpoints <- append(all_endpoints, default_endpoint)`

If that simplifies things.

Very nice, that's much clearer! I made this change in 2a770c0. Sadly I needed to upgrade the entire set of dependencies in order to get the environment working with R 4.4.0 and test this, which blew up the diff in bc8c7f5; I'll go in and comment the extra changes I made to make it a bit clearer.

api.R

generics.R

jeancochrane · 2024-04-25T22:19:18Z

Ready for another look @dfsnow!

dfsnow

Looks great @jeancochrane. One potential simplification and then this is set to deploy.

api.R

dfsnow · 2024-04-30T16:25:41Z

api.R

+  all_endpoints[[i]] <- list(
+    path = glue::glue("{base_url_prefix}/{run$run_id}"),
+    model = model,
+    is_default = run$run_id == default_run$run_id


You should be able to use

all_endpoints <- c(all_endpoints, default_endpoint)`

or:

all_endpoints <- append(all_endpoints, default_endpoint)`

If that simplifies things.

api.R

jeancochrane · 2024-05-01T18:15:44Z

api.R

+    use_path_in_nav_bar = TRUE,
+    show_method_in_nav_bar = "as-plain-text"


It seems like the rapidoc UI layout changed with the version update, so we need these two params in order to properly display all of the available endpoints (docs).

jeancochrane · 2024-05-01T18:17:03Z

generics.R

-handler_startup._lgb.Booster <- function(vetiver_model) {
-  attach_pkgs(vetiver_model$metadata$required_pkgs)
-}


Turns out vetiver_create_description and vetiver_create_meta are in fact necessary for calling vetiver_model on the model, so I added them back in and only removed handler_startup, which we no longer call after moving from vetiver to raw Plumber.

renv.lock

jeancochrane · 2024-05-01T18:18:53Z

renv/.gitignore

@@ -1,7 +1,7 @@
-sandbox/
+library/


renv/.gitignore, renv/activate.R, and renv/settings.dcf all got automatically updated as part of the upgrade to renv 1.0.x and R 4.4.x.

jeancochrane · 2024-05-01T20:33:52Z

@dfsnow I think we should get one last look at this, unfortunately I had to make a lot of changes since the last round of review!

dfsnow

Great work @jeancochrane. I'm excited to see if we can recreate workbooks/create tools to cover all the previous years.

dfsnow · 2024-05-02T15:54:14Z

api.R

+dvc_bucket_pre_2024 <- "s3://ccao-data-dvc-us-east-1"
+dvc_bucket_post_2024 <- "s3://ccao-data-dvc-us-east-1/files/md5"


Whoa, I didn't realize that DVC had changed the file locations :/

dfsnow · 2024-05-02T16:01:18Z

api.R

+default_run_id <- Sys.getenv(default_run_id_var_name)
+
+# The list of runs that will be deployed as possible model endpoints
+valid_runs <- rbind(


suggestion (non-blocking): It might be worth just updating the model.final_model dbt seed to capture this list and its attributes. That seed only gets updated at the end of each year, so we could just basically down > up the compose stack to get the new endpoint, rather than modifying the code.

If this sounds like a good idea, let's push it to another separate PR.

Good call, issue opened here: ccao-data/data-architecture#428

See ccao-data/api-res-avm#2 for actual API refactor. Workbooks now select their run ID/target endpoint from cell B1.

jeancochrane added 3 commits April 23, 2024 17:10

Update R, assessr, ccao, and tibble versions

eb1f53e

Update api.R to deploy multiple models by run ID

472c528

Remove deprecated AWS_S3_MODEL_YEAR env var from docker-compose.yaml

7518222

jeancochrane linked an issue Apr 23, 2024 that may be closed by this pull request

Fixup existing res API backend #1

Closed

jeancochrane commented Apr 23, 2024

View reviewed changes

Appease pre-commit

d5f943a

jeancochrane marked this pull request as ready for review April 23, 2024 17:54

jeancochrane requested a review from dfsnow April 23, 2024 17:54

dfsnow reviewed Apr 23, 2024

View reviewed changes

jeancochrane added 4 commits April 25, 2024 21:23

Support multiple years of models by switching from vetiver to plumber

abe5403

Remove privileged: true and deprecated env vars from docker-compose…

d8bf46f

….yaml

Appease pre-commit

67e36ab

Clean up comments for api.R

f23e58b

jeancochrane commented Apr 25, 2024

View reviewed changes

jeancochrane requested a review from dfsnow April 25, 2024 22:19

dfsnow approved these changes Apr 30, 2024

View reviewed changes

jeancochrane mentioned this pull request May 1, 2024

Update lgbm_load() to only load record_evals if they exist ccao-data/lightsnip#13

Merged

jeancochrane added 3 commits May 1, 2024 18:02

Update dependencies for renv 1.x and R 4.4.x

bc8c7f5

Simplify endpoint creation and clean up docs styles in api.R

2a770c0

Better function spacing in generics.R

ec7c692

jeancochrane commented May 1, 2024

View reviewed changes

jeancochrane requested a review from dfsnow May 1, 2024 20:33

dfsnow approved these changes May 2, 2024

View reviewed changes

jeancochrane mentioned this pull request May 2, 2024

Update model.final_model seed to include API metadata ccao-data/data-architecture#428

Open

jeancochrane and others added 2 commits May 2, 2024 17:09

Merge branch 'master' into jeancochrane/1-fixup-existing-res-api-backend

723e9ef

Reset lightsnip to master

e6a69a9

jeancochrane merged commit 36b21d4 into master May 3, 2024
2 checks passed

jeancochrane deleted the jeancochrane/1-fixup-existing-res-api-backend branch May 3, 2024 17:41

jeancochrane mentioned this pull request May 3, 2024

Generate historical model API workbooks ccao-data/model-res-avm#238

Open

dfsnow added a commit to ccao-data/model-res-avm that referenced this pull request May 14, 2024

Update API WB VBA to point to new endpoints

7b36c7b

See ccao-data/api-res-avm#2 for actual API refactor. Workbooks now select their run ID/target endpoint from cell B1.

dfsnow mentioned this pull request May 14, 2024

Update API workbook VBA to point to new endpoints ccao-data/model-res-avm#241

Merged

dfsnow added a commit to ccao-data/model-res-avm that referenced this pull request May 14, 2024

Update API WB VBA to point to new endpoints (#241)

1738017

See ccao-data/api-res-avm#2 for actual API refactor. Workbooks now select their run ID/target endpoint from cell B1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update API to deploy separate endpoints by run ID #2

Update API to deploy separate endpoints by run ID #2

jeancochrane commented Apr 23, 2024 •

edited

Loading

jeancochrane Apr 23, 2024

dfsnow Apr 23, 2024

jeancochrane Apr 25, 2024

jeancochrane Apr 23, 2024

dfsnow Apr 23, 2024

jeancochrane Apr 25, 2024

jeancochrane Apr 23, 2024

dfsnow Apr 23, 2024

jeancochrane Apr 25, 2024

dfsnow left a comment

dfsnow Apr 23, 2024

dfsnow Apr 23, 2024

dfsnow Apr 23, 2024

dfsnow Apr 23, 2024

jeancochrane Apr 25, 2024

jeancochrane Apr 25, 2024

dfsnow Apr 30, 2024

jeancochrane May 1, 2024

jeancochrane commented Apr 25, 2024

dfsnow left a comment

dfsnow Apr 30, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane commented May 1, 2024

dfsnow left a comment

dfsnow May 2, 2024

dfsnow May 2, 2024

jeancochrane May 2, 2024

		api_port <- as.numeric(Sys.getenv("API_PORT", unset = "3636"))
		default_run_id_var_name <- "AWS_S3_MODEL_RUN_ID"

		use_path_in_nav_bar = TRUE,
		show_method_in_nav_bar = "as-plain-text"

		dvc_bucket_pre_2024 <- "s3://ccao-data-dvc-us-east-1"
		dvc_bucket_post_2024 <- "s3://ccao-data-dvc-us-east-1/files/md5"

		@@ -1,7 +1,7 @@
		sandbox/
		library/

Update API to deploy separate endpoints by run ID #2

Update API to deploy separate endpoints by run ID #2

Conversation

jeancochrane commented Apr 23, 2024 • edited Loading

Deployment steps

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfsnow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane commented Apr 25, 2024

dfsnow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane commented May 1, 2024

dfsnow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane commented Apr 23, 2024 •

edited

Loading