Project | |
Quality | |
Tools | |
Community | |
Maintainers |
This repository contains the analysis of the ECMWF seasonal forecasts, done in collaboration with MapAction.
This project's pipeline, centered around the European Centre for Medium-Range Weather Forecasts (ECMWF) data, involves three main stages: Data Acquisition, Data Processing, and Data Analysis, tailored specifically for climate forecasting.
We automate the fetching of climate datasets like monthly seasonal forecasts through APIs such as the Climate Data Store (CDS), with flexibility in data storage options, including local and Azure Blob Storage.
Utilising Python scripts the data processing involves geospatial matching, re-gridding, ensemble handling to optimise memory use, and other pre-processing steps. Bias-correction mechanisms are also employed to adjust ECMWF both in respect to different leadtimes and against ERA5 datasets, ensuring accurate precipitation forecasts. A jupyter notebook is provided that can be used to run the pipeline
Post-processing, the data analysis phase computes probabilistic forecasts and performance metrics. The focus is on quantile probabilities and bias metrics between ECMWF and ERA5 datasets, with in-depth analyses such as forecasting skill maps and error dependency evaluations for different lead times.
We use Poetry for package management. Poetry is production tested dependency management tool with exact version locking and support for packaging and virtual environments.
📖 Install Poetry on Linux, macOS, Windows (WSL) using the official installer
curl -sSL https://install.python-poetry.org | python3 -
if necessary, add poetry location to PATH
echo 'export POETRY_HOME="$HOME/.local/bin"' >> ~/.bashrc
echo 'export PATH="$POETRY_HOME:$PATH"' >> ~/.bashrc
. ~/.bashrc
test that installation was successful
poetry --version
Before you start developing in this repository,
you will need to install project dependencies and pre-commit Git hooks.
navigate to the project directory
cd ds-mapaction-ecmwf
and run
make .venv hooks
or if you do not have make
on your OS (i.e. Windows), you can run
# first install all dependencies
poetry install --no-root
# then install Git hooks
poetry run pre-commit install
NOTE: any new package can be added to the project by running
poetry add [package-name]
All code is formatted according to black, flake8, and PyMarkdown guidelines.
The repo is set-up to trigger lint tests automatically on each commit using pre-commit.
You can also run lint tests manually using
make lint
or if you do not have make
on your OS (i.e. Windows), you can run
poetry run pre-commit run --all-files
This is especially useful if you try to resolve some failed test.
Once you passed all tests, you should see something like this
$ make lint
Running lint tests..
black....................................................................Passed
isort....................................................................Passed
flake8...................................................................Passed
pymarkdown...............................................................Passed
Below is the directory and file structure of the project, providing a quick overview of the key components:
-
.github/
: The directory is a special folder used by GitHub to store GitHub Actions workflows and other GitHub-specific configuration files. -
.flake8
: The .flake8 file is a configuration file for theflake8
tool, which is used to enforce coding style and standards in Python projects. -
.gitignore
: This .gitignore file is comprehensive and covers a wide range of files that are typically not needed in version control for Python projects, ensuring that only relevant source code and resources are included in the repository. -
.pre-commit-config.yaml
: The file .pre-commit-config.yaml configures pre-commit hooks for the project, specifying tools likeblack
,isort
,flake8
, andpymarkdown
for code formatting and linting, as well asnbqa-black
for Jupyter Notebook formatting. It is set to fail fast, stopping at the first encountered error. -
docs/
: Houses documentation for the project. This includes detailed information on various aspects of the project, including data storage, processing, and retrieval methods. -
notebooks/
: Contains Jupyter notebooks such asecmwf_pipeline.ipynb
,ecmwf_analysis.ipynb
, and others, which are used for interactive data analysis, visualisation, and demonstrating the project's results. -
src/
: The source code directory where the project's main Python code resides. It is organised into subdirectories each focusing on different aspects of data handling and analysis.data_retrieval/
: Features scripts likeazure_blob_utils.py
for interacting with Azure Blob Storage, acds/
subdirectory with modules (common.py
,ecmwf.py
,era5.py
,mars.py
) for retrieving climate data from different sources, andutil.py
for utility functions. Themain.py
serves as the entry point for data retrieval operations, and static_data/country_bbox.csv stores geographical bounding boxes for countries.data_processing/
: Includescustom_python_package.py
for custom data processing tasks and an__init__.py
file indicating this directory is a Python package.data_analysis
: Containsecmwf_data_analysis.py
for analysing data from the European Centre for Medium-Range Weather Forecasts (ECMWF). -
tests/
: Contains test code for the project, ensuring that the software functions as expected. -
Makefile
: This Makefile is designed to facilitate various operations such as dependency management, testing, linting, and data retrieval in a consistent and reproducible manner.
.
├── .flake8
├── .github/
│ └── workflows/ci-test.yml
├── .gitignore
├── .pre-commit-config.yaml
├── docs/
│ ├── azure-blob-storage.md
│ ├── copernicus-cds.md
│ ├── data-processing.md
│ ├── ecmwf-mars.md
│ ├── images/
│ │ └── ...
│ └── README.md
├── LICENSE
├── Makefile
├── notebooks/
│ ├── .ipynb_checkpoints/
│ │ └── ...
│ ├── bounding-box-chad.ipynb
│ ├── bounding-box-tool.ipynb
│ ├── ecmwf_analysis.ipynb
│ ├── ecmwf_pipeline.ipynb
│ └── ecmwf_sandbox.ipynb
├── poetry.lock
├── pyproject.toml
├── README.md
├── src/
│ ├── data_analysis/
│ │ ├── ecmwf_data_analysis.py
│ │ └── pycache/
│ ├── data_processing/
│ │ ├── custom_python_package.py
│ │ ├── init.py
│ │ └── pycache/
│ └── data_retrieval/
│ ├── azure_blob_utils.py
│ ├── cds/
│ │ ├── common.py
│ │ ├── ecmwf.py
│ │ ├── era5.py
│ │ ├── init.py
│ │ └── mars.py
│ ├── init.py
│ ├── main.py
│ ├── pycache/
│ ├── static_data/country_bbox.csv
│ └── util.py
└── tests/
├── init.py
└── data_retrieval/
├── init.py
└── cds/
├── init.py
├── test_common.py
├── test_ecmwf.py
├── test_era5.py
└── test_mars.py