ML Questions and Thoughts #3

chopchop505 · 2019-09-06T00:20:48Z

Which AWS Account should we deploy (FireCARES/StatEngine)?
What is the preferred dump format for training data from Elasticsearch. The easiest is a line delimited JSON file via elasticdump. It would be an entire dump (all fields, all departments). In your notebook (in the section Upload the data for traning section), you could then retrieve this data dump and do pre-processing/subsetting for the model in question.
Do you want to continuously updated training data? If you'll be tweaking models frequently, is it best practice to use the same static training data set or continuous add to the training dataset. Doesn't matter to me, we can export up to daily, but that might be overkill.
Do you plan on using different models for each departments, or single model that takes the FireCARES ID as a heavily weighted feature? A single model obviously makes deployment easier, but probably complicates the model significantly (I don't know enough about ML).
For deployment, its cheaper to do batch predictions, but easier to do on-demand (especially for future models). Not a question, but just something we should chat about.
We probably want to think about how to manage multiple experiments sooner rather than later. See example here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb

This was a great example, and we could literally have this in production tomorrow!

Deploying your custom model is going to take a bit of lifting (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html), but should be doable. It may be easier to translate your model to a supported framework like SciKitLearn/TensorFlow, instead of building a custom container?

The text was updated successfully, but these errors were encountered:

chopchop505 · 2019-09-06T19:00:55Z

@garnertb
Joe to load into pandas dataframe
When enough new data is available. For incidents per day - maybe every couple of months. Depends on the model.
Assume 1 model per department for simple models like this one
Probably best to assume batch jobs to save $

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML Questions and Thoughts #3

ML Questions and Thoughts #3

chopchop505 commented Sep 6, 2019 •

edited

Loading

chopchop505 commented Sep 6, 2019

ML Questions and Thoughts #3

ML Questions and Thoughts #3

Comments

chopchop505 commented Sep 6, 2019 • edited Loading

chopchop505 commented Sep 6, 2019

chopchop505 commented Sep 6, 2019 •

edited

Loading