Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Questions and Thoughts #3

Open
chopchop505 opened this issue Sep 6, 2019 · 1 comment
Open

ML Questions and Thoughts #3

chopchop505 opened this issue Sep 6, 2019 · 1 comment

Comments

@chopchop505
Copy link

chopchop505 commented Sep 6, 2019

  1. Which AWS Account should we deploy (FireCARES/StatEngine)?

  2. What is the preferred dump format for training data from Elasticsearch. The easiest is a line delimited JSON file via elasticdump. It would be an entire dump (all fields, all departments). In your notebook (in the section Upload the data for traning section), you could then retrieve this data dump and do pre-processing/subsetting for the model in question.

  3. Do you want to continuously updated training data? If you'll be tweaking models frequently, is it best practice to use the same static training data set or continuous add to the training dataset. Doesn't matter to me, we can export up to daily, but that might be overkill.

  4. Do you plan on using different models for each departments, or single model that takes the FireCARES ID as a heavily weighted feature? A single model obviously makes deployment easier, but probably complicates the model significantly (I don't know enough about ML).

  5. For deployment, its cheaper to do batch predictions, but easier to do on-demand (especially for future models). Not a question, but just something we should chat about.

  6. We probably want to think about how to manage multiple experiments sooner rather than later. See example here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb

This was a great example, and we could literally have this in production tomorrow!

Deploying your custom model is going to take a bit of lifting (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html), but should be doable. It may be easier to translate your model to a supported framework like SciKitLearn/TensorFlow, instead of building a custom container?

@chopchop505
Copy link
Author

  1. @garnertb

  2. Joe to load into pandas dataframe

  3. When enough new data is available. For incidents per day - maybe every couple of months. Depends on the model.

  4. Assume 1 model per department for simple models like this one

  5. Probably best to assume batch jobs to save $

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant