This repository creates an ML pipeline to learn gravity wave drag from MiMA data and couples the deployed model with MiMA simulations.
- Create virtual environment
$ python -m venv env
- Clone repo into env
$ git clone https://github.com/zacespinosa/Learning-GWD-with-MIMA.git
- Install
lrgwd
$ pip install lrgwd
- Install depenencies as needed (I haven't got around to updating
setup.py
with all requirements yet)
This outline does not list all flag for each command. To see all flags use --help
.
$ python lrgwd <command> --help
$ python lrgwd --help
ingestor
:
The ingestor consumes raw CDF data from MiMA and converts it to a compressed npz file.
If the visualize flag is set to true ingestor
also creates histograms of each feature and the gwd at each pressure.
$ python lrgwd ingestor \
--source-path <File Path to raw CDF data> \
--save-path <File Path to save NPZ data> \
--convert <Bool to convert from CDF to NPZ> \
--visualize <Bool to create visualizations>
extractor
:
The extractor consumes npz files and creates feature tensors and labels for the full dataset and saves them in csv files.
$ python lrgwd extractor \
--source-path <File path to raw dataset as npz> \
--save-path <File path to save extracted dataset> \
--plevels-included <Number of top plevels to use in feature tensors> \
--num-samples <Number of samples to extract from the source-path>
split
:
Splits raw csv data into train, validation, and test datasets and creates StandardScalers to use when training the data
$ lrgwd split \
---source-path <File Path to extracted tensors> \
--save-path <File Path to save splits> \
--val-split <Float determining how much to allocate to train> \
--test-split <Float determining how much to allocate to train>
train
:
Train pulls from the models
folder. In this step the given model is trained and hyperparameter tuning is done using the validation dataset.
$ lrgwd train \
--save-path <File Path to save trained data> \
--source-path <File Path to train and validate datasets> \
--model <Name of model to train>
evaluate
:
Evaluates trained model and produces a performance report
$ lrgwd evaluate \
--save-path <File Path to save performance report> \
--source-path <File Path to test and labels datasets> \
--model-path <File Path to trained model>
This section outlines the api that Fortran will use to interact with this python model.