Skip to content

Latest commit

 

History

History
214 lines (158 loc) · 9.95 KB

README.md

File metadata and controls

214 lines (158 loc) · 9.95 KB

ECMWF Seasonal Forecast Historical Analysis

Project Python Versions License GitHub top language
Quality Issues
Tools pre-commit Poetry Jupyter
Community Maintenance Stars Forks Contributors Commit activity
Maintainers UN-OCHA MapAction

Overview

This repository contains the analysis of the ECMWF seasonal forecasts, done in collaboration with MapAction.

Pipeline Summary of Work

This project's pipeline, centered around the European Centre for Medium-Range Weather Forecasts (ECMWF) data, involves three main stages: Data Acquisition, Data Processing, and Data Analysis, tailored specifically for climate forecasting.

Data Acquisition

We automate the fetching of climate datasets like monthly seasonal forecasts through APIs such as the Climate Data Store (CDS), with flexibility in data storage options, including local and Azure Blob Storage.

Data Processing

Utilising Python scripts the data processing involves geospatial matching, re-gridding, ensemble handling to optimise memory use, and other pre-processing steps. Bias-correction mechanisms are also employed to adjust ECMWF both in respect to different leadtimes and against ERA5 datasets, ensuring accurate precipitation forecasts. A jupyter notebook is provided that can be used to run the pipeline

Data Analysis

Post-processing, the data analysis phase computes probabilistic forecasts and performance metrics. The focus is on quantile probabilities and bias metrics between ECMWF and ERA5 datasets, with in-depth analyses such as forecasting skill maps and error dependency evaluations for different lead times.

Development

We use Poetry for package management. Poetry is production tested dependency management tool with exact version locking and support for packaging and virtual environments.

Installing Poetry

📖 Install Poetry on Linux, macOS, Windows (WSL) using the official installer

curl -sSL https://install.python-poetry.org | python3 -

if necessary, add poetry location to PATH

echo 'export POETRY_HOME="$HOME/.local/bin"' >> ~/.bashrc

echo 'export PATH="$POETRY_HOME:$PATH"' >> ~/.bashrc

. ~/.bashrc

test that installation was successful

poetry --version

Installing dependencies

Before you start developing in this repository,
you will need to install project dependencies and pre-commit Git hooks.

navigate to the project directory

cd ds-mapaction-ecmwf

and run

make .venv hooks

or if you do not have make on your OS (i.e. Windows), you can run

# first install all dependencies
poetry install --no-root

# then install Git hooks
poetry run pre-commit install

NOTE: any new package can be added to the project by running

poetry add [package-name]

Lint and format

All code is formatted according to black, flake8, and PyMarkdown guidelines.
The repo is set-up to trigger lint tests automatically on each commit using pre-commit.

You can also run lint tests manually using

make lint

or if you do not have make on your OS (i.e. Windows), you can run

poetry run pre-commit run --all-files

This is especially useful if you try to resolve some failed test.
Once you passed all tests, you should see something like this

$ make lint
Running lint tests..
black....................................................................Passed
isort....................................................................Passed
flake8...................................................................Passed
pymarkdown...............................................................Passed

Project Structure

Below is the directory and file structure of the project, providing a quick overview of the key components:

Directory Structure

  • .github/: The directory is a special folder used by GitHub to store GitHub Actions workflows and other GitHub-specific configuration files.

  • .flake8: The .flake8 file is a configuration file for the flake8 tool, which is used to enforce coding style and standards in Python projects.

  • .gitignore: This .gitignore file is comprehensive and covers a wide range of files that are typically not needed in version control for Python projects, ensuring that only relevant source code and resources are included in the repository.

  • .pre-commit-config.yaml: The file .pre-commit-config.yaml configures pre-commit hooks for the project, specifying tools like black, isort, flake8, and pymarkdown for code formatting and linting, as well as nbqa-black for Jupyter Notebook formatting. It is set to fail fast, stopping at the first encountered error.

  • docs/: Houses documentation for the project. This includes detailed information on various aspects of the project, including data storage, processing, and retrieval methods.

  • notebooks/: Contains Jupyter notebooks such as ecmwf_pipeline.ipynb, ecmwf_analysis.ipynb, and others, which are used for interactive data analysis, visualisation, and demonstrating the project's results.

  • src/: The source code directory where the project's main Python code resides. It is organised into subdirectories each focusing on different aspects of data handling and analysis. data_retrieval/: Features scripts like azure_blob_utils.py for interacting with Azure Blob Storage, a cds/ subdirectory with modules (common.py, ecmwf.py, era5.py, mars.py) for retrieving climate data from different sources, and util.py for utility functions. The main.py serves as the entry point for data retrieval operations, and static_data/country_bbox.csv stores geographical bounding boxes for countries. data_processing/: Includes custom_python_package.py for custom data processing tasks and an __init__.py file indicating this directory is a Python package. data_analysis: Contains ecmwf_data_analysis.py for analysing data from the European Centre for Medium-Range Weather Forecasts (ECMWF).

  • tests/: Contains test code for the project, ensuring that the software functions as expected.

  • Makefile: This Makefile is designed to facilitate various operations such as dependency management, testing, linting, and data retrieval in a consistent and reproducible manner.

Directory Tree

.
├── .flake8
├── .github/
│ └── workflows/ci-test.yml
├── .gitignore
├── .pre-commit-config.yaml
├── docs/
│ ├── azure-blob-storage.md
│ ├── copernicus-cds.md
│ ├── data-processing.md
│ ├── ecmwf-mars.md
│ ├── images/
│ │ └── ...
│ └── README.md
├── LICENSE
├── Makefile
├── notebooks/
│ ├── .ipynb_checkpoints/
│ │ └── ...
│ ├── bounding-box-chad.ipynb
│ ├── bounding-box-tool.ipynb
│ ├── ecmwf_analysis.ipynb
│ ├── ecmwf_pipeline.ipynb
│ └── ecmwf_sandbox.ipynb
├── poetry.lock
├── pyproject.toml
├── README.md
├── src/
│ ├── data_analysis/
│ │ ├── ecmwf_data_analysis.py
│ │ └── pycache/
│ ├── data_processing/
│ │ ├── custom_python_package.py
│ │ ├── init.py
│ │ └── pycache/
│ └── data_retrieval/
│   ├── azure_blob_utils.py
│   ├── cds/
│   │ ├── common.py
│   │ ├── ecmwf.py
│   │ ├── era5.py
│   │ ├── init.py
│   │ └── mars.py
│   ├── init.py
│   ├── main.py
│   ├── pycache/
│   ├── static_data/country_bbox.csv
│   └── util.py
└── tests/
├── init.py
└── data_retrieval/
  ├── init.py
  └── cds/
    ├── init.py
    ├── test_common.py
    ├── test_ecmwf.py
    ├── test_era5.py
    └── test_mars.py