exhaustive user documentation (#441)

* fix headers * add cli to doc * modify scripts to build cli doc * use book theme * order most recent prods first * fix issue doc requirements * add autoprogam to requirements * add mystparder * restructure user doc * reorder doc menus
cta-observatory · Jan 26, 2024 · 1d628b9 · 1d628b9
1 parent a88da5e
commit 1d628b9
Show file tree

Hide file tree

Showing 5 changed files with 541 additions and 179 deletions.
diff --git a/README.rst b/README.rst
@@ -1,7 +1,7 @@
 lstMCpipe
 =========
 
-|code| |documentation| |slack| |CI| |coverage| |conda| |pypi| |zenodo| |fair| 
+|code| |documentation| |slack| |CI| |coverage| |conda| |pypi| |zenodo| |fair|
 
 .. |code| image:: https://img.shields.io/badge/lstmcpipe-code-green
   :target: https://github.com/cta-observatory/lstmcpipe/
@@ -23,8 +23,8 @@ lstMCpipe
    :alt: Static Badge
 
 
-   
-Scripts to ease the reduction of MC data on the LST cluster at La Palma.   
+
+Scripts to ease the reduction of MC data on the LST cluster at La Palma.
 With this package, the analysis/creation of R1/DL0/DL1/DL2/IRFs can be orchestrated.
 
 Contact:
@@ -41,7 +41,7 @@ If lstMCpipe was used for your analysis, please cite:
 .. code-block::
 
   @misc{garcia2022lstmcpipe,
-        title={The lstMCpipe library}, 
+        title={The lstMCpipe library},
         author={Enrique Garcia and Thomas Vuillaume and Lukas Nickel},
         year={2022},
         eprint={2212.00120},
@@ -51,7 +51,7 @@ If lstMCpipe was used for your analysis, please cite:
 
 in addition to the exact lstMCpipe version used from https://doi.org/10.5281/zenodo.6460727
 
-  
+
 You may also want to include the config file with your published code for reproducibility.
 
 
@@ -82,8 +82,6 @@ This will setup a new enviroment with lstchain and other needed tools available
 If you already have your lstchain conda environment, you may simply activate it and install lstmcpipe there using `pip install lstmcpipe`.
 
 
-HIPERTA (referred to as rta in the following) support is builtin, but no installation instructions can be provided as of now.
-
 Alternatively, you can install `lstmcpipe` in your own enviroment to use different versions of the
 analysis pipelines.
 WARNING: Due to changing APIs and data models, we cannot support other versions than the ones specified in
@@ -114,7 +112,7 @@ As a LST member, you may require a MC analysis with a specific configuration, fo
 
 To do so, please:
 
-#. Make sure to be part of the `github cta-observatory/lst-dev team <https://github.com/orgs/cta-observatory/teams/lst-dev>`__. If not, ask one of the admins. 
+#. Make sure to be part of the `github cta-observatory/lst-dev team <https://github.com/orgs/cta-observatory/teams/lst-dev>`__. If not, ask one of the admins.
     > note that you can also fork the repository and open the pull request from your fork, but the tests will fail because they need the private LST test data
 #. Clone the repository in the cluster at La Palma.
 #. Create a new branch named with you ``prodID``
@@ -163,171 +161,3 @@ but **please note** that it still requires a lot of resources to process a full
 production. Think about other LP-IT cluster users.
 
 
-Stages ⚙️
---------
-After launching of the pipeline all selected tasks will be performed in order.
-These are referred to as *stages* and are collected in ``lstmcpipe/stages``.
-Following is a short overview over each stage, that can be specified in the configuration.
-
-**r0_to_dl1**
-
-In this stage simtel-files are processed up to datalevel 1 and separated into files for training
-and for testing.
-For efficiency reasons files are processed in batches: N files (depending on paricle type
-as that influences the averages duration of the processing) are submitted as one job in a jobarray.
-To group the files together, the paths are saved in files that are passed to
-python scripts in ``lstmcpipe/scripts`` which then call the selected pipelines 
-processing tool. These are:
-
-- lstchain: lstchain_mc_r0_to_dl1
-- ctapipe: ctapipe-stage1
-- rta: lstmcpipe_hiperta_r0_to_dl1lstchain (``lstmcpipe/hiperta/hiperta_r0_to_dl1lstchain.py``)
-
-
-**dl1ab**
-
-As an alternative to the processing of simtel r0 files, existing dl1 files can be reprocessed.
-This can be useful to apply different cleanings or alter the images by adding noise etc.
-For this to work the old files have to contain images, i.e. they need to have been processed
-using the ``no_image: False`` flag in the config.
-The config key ``dl1_reference_id`` is used to determine the input files.
-Its value needs to be the full prod_id including software versions (i.e. the name of the
-directories directly above the dl1 files).
-For lstchain the dl1ab script is used, ctapipe can use the same script as for simtel
-processing. There is no support for hiperta!
-
-
-**merge_dl1**
-
-In this stage the previously created dl1 files are merged so that you end up with
-train and test datesets for the next stages.
-
-
-**train_test_split**
-
-Split the dataset into training and testing datasets, performing a random selection of files with the specified ratio
-(default=0.5).
-
-**train_pipe**
-
-IMPORTANT: From here on out only ``lstchain`` tools are available. More about that at the end.
-
-In this stage the models to reconstruct the primary particles properties are trained
-on the gamma-diffuse and proton train data.
-At present this means that random forests are created using lstchains
-``lstchain_mc_trainpipe``
-Models will be stored in the ``models`` directory.
-
-
-**dl1_to_dl2**
-
-The previously trained models are evaluated on the merged dl1 files using ``lstchain_dl1_to_dl2`` from
-the lstchain package.
-DL2 data can be found in ``DL2`` directory.
-
-**dl2_to_irfs**
-
-Point-like IRFs are produced for each set of offset gammas.
-The processing is performed by calling ``lstchain_create_irf_files``. 
-
-
-**dl2_to_sensitivity**
-A sensitivity curve is estimated using a script based on pyirf which performs a cut optimisation
-similar to EventDisplay.
-The script can be found in ``lstmcpipe/scripts/script_dl2_to_sensitivity.py``.
-This does not use the IRFs and cuts computed in dl2_to_irfs, so this can not be compared to observed data.
-It is a mere benchmark for the pipeline.
-
-
-Logs and data output 📈
------------------------
-**NOTE**: ``lstmcpipe`` expects the data to be located in a specific structure on the cluster.
-Output will be written in a stanardized way next to the input data to make sure everyone can access it.
-Analysing a custom dataset requires replicating parts of the directory structure and is not the
-intended use case for this package.
-
-All the ```r0_to_dl1`` stage job logs are stored ``/fefs/aswg/data/mc/running_analysis/.../job_logs`` and later
-moved to ``/fefs/aswg/data/mc/analysis_logs/.../``.
-
-Every time a full MC production is launched, two files with logging information are created:
-
-- ``log_reduced_Prod{3,5}_{PROD_ID}.yml``
-- ``log_onsite_mc_r0_to_dl3_Prod{3,5}_{PROD_ID}.yml``
-
-The first one contains a reduced summary of all the scheduled `job ids` (to which particle the job corresponds to),
-while the second one contains the same plus all the commands passed to slurm.
-
-Steps explanation 🔍
---------------------
-
-The directory structure and the stages to run are determined by the config stages.
-After that, the job dependency between stages is done automatically.
-    - If the full workflow is launched, directories will not be verified as containing data. Overwriting will only happen when a MC prods sharing the same ``prod_id`` and analysed the same day is run
-    - If each step is launched independently (advanced users), no overwriting directory will take place prior confirmation from the user
-
-Example of default directory structure for a prod5 MC prod:
-
-.. code-block::
-
-
-   /fefs/aswg/data/
-    ├── mc/
-    |   ├── DL0/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
-    |   |   └── simtel files
-    |   |
-    |   ├── running_analysis/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
-    |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
-    |   |       └── temporary dir for r0_to_dl1 + merging stages
-    |   |
-    |   ├── analysis_logs/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
-    |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
-    |   |       ├── file_lists_training/
-    |   |       ├── file_lists_testing/
-    |   |       └── job_logs/
-    |   |
-    |   ├── DL1/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
-    |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
-    |   |       ├── dl1 files
-    |   |       ├── training/
-    |   |       └── testing/
-    |   |
-    |   ├── DL2/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
-    |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
-    |   |       └── dl2 files
-    |   |
-    |   └── IRF/20200629_prod5_trans_80/zenith_20deg/south_pointing/
-    |       └── YYYYMMDD_v{lstchain}_{prod_id}/
-    |           ├── off0.0deg/
-    |           ├── off0.4deg/
-    |           └── diffuse/
-    |
-    └── models/
-        └── 20200629_prod5_trans_80/zenith_20deg/south_pointing/
-            └── YYYYMMDD_v{lstchain}_{prod_id}/
-                ├── reg_energy.sav
-                ├── reg_disp_vector.sav
-                └── cls_gh.sav
-
-
-
-Real Data analysis 💀
----------------------
-
-Real data analysis is not supposed to be supported by these scripts. Use at your own risk.
-
-
-Pipeline Support 🛠️
--------------------
-
-So far the reference pipeline is ``lstchain`` and only with it a full analysis is possible.
-There is however support for ``ctapipe`` and ``hiperta`` as well.
-The processing up to dl1 is relatively agnostic of the pipeline; working implementations exist for all of them.
-
-In the case of ``hiperta`` a custom script converts the dl1 output to ``lstchain`` compatible files and the later stages
-run using ``lstchain`` scripts.
-
-In the case of ``ctapipe`` dl1 files can be produced using ``ctapipe-stage1``. Once the dependency issues are solved and
-ctapipe 0.12 is released, this will most likely switch to using ``ctapipe-process``. We do not have plans to keep supporting older
-versions longer than necessary currently.
-Because the files are not compatible to ``lstchain`` and there is no support for higher datalevels in ``ctapipe`` yet, it is not possible
-to use any of the following stages. This might change in the future.
diff --git a/docs/conf.py b/docs/conf.py
@@ -48,6 +48,7 @@
     "sphinxcontrib.mermaid",
     "nbsphinx",
     "sphinxcontrib.autoprogram",
+    "myst_parser",
 ]
 
 autosummary_generate = True

diff --git a/docs/index.rst b/docs/index.rst
@@ -4,7 +4,6 @@
    contain the root `toctree` directive.
 
 
-
 .. include:: ../README.rst
 
 lstmcpipe API documentation
@@ -15,10 +14,10 @@ lstmcpipe API documentation
 
    productions
    pipeline
+   lstmcpipe_user_doc.md
    examples/configs_pointings
-   api/lstmcpipe
    cli
-
+   api/lstmcpipe
 
 
 Indices and tables