Skip to content

Commit

Permalink
exhaustive user documentation (#441)
Browse files Browse the repository at this point in the history
* fix headers

* add cli to doc

* modify scripts to build cli doc

* use book theme

* order most recent prods  first

* fix issue doc requirements

* add autoprogam to requirements

* add mystparder

* restructure user doc

* reorder doc menus
  • Loading branch information
vuillaut authored Jan 26, 2024
1 parent a88da5e commit 1d628b9
Show file tree
Hide file tree
Showing 5 changed files with 541 additions and 179 deletions.
182 changes: 6 additions & 176 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
lstMCpipe
=========

|code| |documentation| |slack| |CI| |coverage| |conda| |pypi| |zenodo| |fair|
|code| |documentation| |slack| |CI| |coverage| |conda| |pypi| |zenodo| |fair|

.. |code| image:: https://img.shields.io/badge/lstmcpipe-code-green
:target: https://github.com/cta-observatory/lstmcpipe/
Expand All @@ -23,8 +23,8 @@ lstMCpipe
:alt: Static Badge


Scripts to ease the reduction of MC data on the LST cluster at La Palma.

Scripts to ease the reduction of MC data on the LST cluster at La Palma.
With this package, the analysis/creation of R1/DL0/DL1/DL2/IRFs can be orchestrated.

Contact:
Expand All @@ -41,7 +41,7 @@ If lstMCpipe was used for your analysis, please cite:
.. code-block::
@misc{garcia2022lstmcpipe,
title={The lstMCpipe library},
title={The lstMCpipe library},
author={Enrique Garcia and Thomas Vuillaume and Lukas Nickel},
year={2022},
eprint={2212.00120},
Expand All @@ -51,7 +51,7 @@ If lstMCpipe was used for your analysis, please cite:
in addition to the exact lstMCpipe version used from https://doi.org/10.5281/zenodo.6460727


You may also want to include the config file with your published code for reproducibility.


Expand Down Expand Up @@ -82,8 +82,6 @@ This will setup a new enviroment with lstchain and other needed tools available
If you already have your lstchain conda environment, you may simply activate it and install lstmcpipe there using `pip install lstmcpipe`.


HIPERTA (referred to as rta in the following) support is builtin, but no installation instructions can be provided as of now.

Alternatively, you can install `lstmcpipe` in your own enviroment to use different versions of the
analysis pipelines.
WARNING: Due to changing APIs and data models, we cannot support other versions than the ones specified in
Expand Down Expand Up @@ -114,7 +112,7 @@ As a LST member, you may require a MC analysis with a specific configuration, fo

To do so, please:

#. Make sure to be part of the `github cta-observatory/lst-dev team <https://github.com/orgs/cta-observatory/teams/lst-dev>`__. If not, ask one of the admins.
#. Make sure to be part of the `github cta-observatory/lst-dev team <https://github.com/orgs/cta-observatory/teams/lst-dev>`__. If not, ask one of the admins.
> note that you can also fork the repository and open the pull request from your fork, but the tests will fail because they need the private LST test data
#. Clone the repository in the cluster at La Palma.
#. Create a new branch named with you ``prodID``
Expand Down Expand Up @@ -163,171 +161,3 @@ but **please note** that it still requires a lot of resources to process a full
production. Think about other LP-IT cluster users.


Stages ⚙️
--------
After launching of the pipeline all selected tasks will be performed in order.
These are referred to as *stages* and are collected in ``lstmcpipe/stages``.
Following is a short overview over each stage, that can be specified in the configuration.

**r0_to_dl1**

In this stage simtel-files are processed up to datalevel 1 and separated into files for training
and for testing.
For efficiency reasons files are processed in batches: N files (depending on paricle type
as that influences the averages duration of the processing) are submitted as one job in a jobarray.
To group the files together, the paths are saved in files that are passed to
python scripts in ``lstmcpipe/scripts`` which then call the selected pipelines
processing tool. These are:

- lstchain: lstchain_mc_r0_to_dl1
- ctapipe: ctapipe-stage1
- rta: lstmcpipe_hiperta_r0_to_dl1lstchain (``lstmcpipe/hiperta/hiperta_r0_to_dl1lstchain.py``)


**dl1ab**

As an alternative to the processing of simtel r0 files, existing dl1 files can be reprocessed.
This can be useful to apply different cleanings or alter the images by adding noise etc.
For this to work the old files have to contain images, i.e. they need to have been processed
using the ``no_image: False`` flag in the config.
The config key ``dl1_reference_id`` is used to determine the input files.
Its value needs to be the full prod_id including software versions (i.e. the name of the
directories directly above the dl1 files).
For lstchain the dl1ab script is used, ctapipe can use the same script as for simtel
processing. There is no support for hiperta!


**merge_dl1**

In this stage the previously created dl1 files are merged so that you end up with
train and test datesets for the next stages.


**train_test_split**

Split the dataset into training and testing datasets, performing a random selection of files with the specified ratio
(default=0.5).

**train_pipe**

IMPORTANT: From here on out only ``lstchain`` tools are available. More about that at the end.

In this stage the models to reconstruct the primary particles properties are trained
on the gamma-diffuse and proton train data.
At present this means that random forests are created using lstchains
``lstchain_mc_trainpipe``
Models will be stored in the ``models`` directory.


**dl1_to_dl2**

The previously trained models are evaluated on the merged dl1 files using ``lstchain_dl1_to_dl2`` from
the lstchain package.
DL2 data can be found in ``DL2`` directory.

**dl2_to_irfs**

Point-like IRFs are produced for each set of offset gammas.
The processing is performed by calling ``lstchain_create_irf_files``.


**dl2_to_sensitivity**
A sensitivity curve is estimated using a script based on pyirf which performs a cut optimisation
similar to EventDisplay.
The script can be found in ``lstmcpipe/scripts/script_dl2_to_sensitivity.py``.
This does not use the IRFs and cuts computed in dl2_to_irfs, so this can not be compared to observed data.
It is a mere benchmark for the pipeline.


Logs and data output 📈
-----------------------
**NOTE**: ``lstmcpipe`` expects the data to be located in a specific structure on the cluster.
Output will be written in a stanardized way next to the input data to make sure everyone can access it.
Analysing a custom dataset requires replicating parts of the directory structure and is not the
intended use case for this package.

All the ```r0_to_dl1`` stage job logs are stored ``/fefs/aswg/data/mc/running_analysis/.../job_logs`` and later
moved to ``/fefs/aswg/data/mc/analysis_logs/.../``.

Every time a full MC production is launched, two files with logging information are created:

- ``log_reduced_Prod{3,5}_{PROD_ID}.yml``
- ``log_onsite_mc_r0_to_dl3_Prod{3,5}_{PROD_ID}.yml``

The first one contains a reduced summary of all the scheduled `job ids` (to which particle the job corresponds to),
while the second one contains the same plus all the commands passed to slurm.

Steps explanation 🔍
--------------------

The directory structure and the stages to run are determined by the config stages.
After that, the job dependency between stages is done automatically.
- If the full workflow is launched, directories will not be verified as containing data. Overwriting will only happen when a MC prods sharing the same ``prod_id`` and analysed the same day is run
- If each step is launched independently (advanced users), no overwriting directory will take place prior confirmation from the user

Example of default directory structure for a prod5 MC prod:

.. code-block::
/fefs/aswg/data/
├── mc/
| ├── DL0/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
| | └── simtel files
| |
| ├── running_analysis/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
| | └── YYYYMMDD_v{lstchain}_{prod_id}/
| | └── temporary dir for r0_to_dl1 + merging stages
| |
| ├── analysis_logs/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
| | └── YYYYMMDD_v{lstchain}_{prod_id}/
| | ├── file_lists_training/
| | ├── file_lists_testing/
| | └── job_logs/
| |
| ├── DL1/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
| | └── YYYYMMDD_v{lstchain}_{prod_id}/
| | ├── dl1 files
| | ├── training/
| | └── testing/
| |
| ├── DL2/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
| | └── YYYYMMDD_v{lstchain}_{prod_id}/
| | └── dl2 files
| |
| └── IRF/20200629_prod5_trans_80/zenith_20deg/south_pointing/
| └── YYYYMMDD_v{lstchain}_{prod_id}/
| ├── off0.0deg/
| ├── off0.4deg/
| └── diffuse/
|
└── models/
└── 20200629_prod5_trans_80/zenith_20deg/south_pointing/
└── YYYYMMDD_v{lstchain}_{prod_id}/
├── reg_energy.sav
├── reg_disp_vector.sav
└── cls_gh.sav
Real Data analysis 💀
---------------------

Real data analysis is not supposed to be supported by these scripts. Use at your own risk.


Pipeline Support 🛠️
-------------------

So far the reference pipeline is ``lstchain`` and only with it a full analysis is possible.
There is however support for ``ctapipe`` and ``hiperta`` as well.
The processing up to dl1 is relatively agnostic of the pipeline; working implementations exist for all of them.

In the case of ``hiperta`` a custom script converts the dl1 output to ``lstchain`` compatible files and the later stages
run using ``lstchain`` scripts.

In the case of ``ctapipe`` dl1 files can be produced using ``ctapipe-stage1``. Once the dependency issues are solved and
ctapipe 0.12 is released, this will most likely switch to using ``ctapipe-process``. We do not have plans to keep supporting older
versions longer than necessary currently.
Because the files are not compatible to ``lstchain`` and there is no support for higher datalevels in ``ctapipe`` yet, it is not possible
to use any of the following stages. This might change in the future.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"sphinxcontrib.mermaid",
"nbsphinx",
"sphinxcontrib.autoprogram",
"myst_parser",
]

autosummary_generate = True
Expand Down
5 changes: 2 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
contain the root `toctree` directive.
.. include:: ../README.rst

lstmcpipe API documentation
Expand All @@ -15,10 +14,10 @@ lstmcpipe API documentation

productions
pipeline
lstmcpipe_user_doc.md
examples/configs_pointings
api/lstmcpipe
cli

api/lstmcpipe


Indices and tables
Expand Down
Loading

0 comments on commit 1d628b9

Please sign in to comment.