Update documentation for postprocessing

bencardoen · Oct 29, 2024 · 2bf99bf · 2bf99bf
1 parent 60a0130
commit 2bf99bf
Show file tree

Hide file tree

Showing 4 changed files with 50 additions and 11 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,6 +1,6 @@
 using Documenter
 push!(LOAD_PATH,"../src/")
 makedocs(sitename="SubPrecisionContactDetection Documentation",  pages=[ "Tutorial" => "tutorial.md", "Parameter selection and tuning" => "parameters.md", "Generated output" => "output.md", 
-"Cluster Usage" => "clustercomputing.md", "Installation" => "installation.md", "Postprocessing" => "Postprocessing.md", "Help and FAQ" => "faq.md"])
+"Cluster Usage" => "clustercomputing.md", "Installation" => "installation.md", "Postprocessing" => "postprocessing.md", "Help and FAQ" => "faq.md"])
 
 deploydocs(repo = "github.com/bencardoen/SubPrecisionContactDetection.jl.git")
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -23,4 +23,4 @@ The below 3D rendering shows the software predicting ER-Mitochondria contacts in
 
 ![example.png](./assets/example.png)
 
-Mitochondria are in red, ER in green translucent, the contact zones in white.
+Mitochondria are in red, ER in green translucent, the contact zones in white.
diff --git a/docs/src/installation.md b/docs/src/installation.md
@@ -27,4 +27,7 @@ See the [build](https://github.com/bencardoen/SubPrecisionContactDetection.jl/bu
 ```bash
 export PYTHON=""
 julia --project=. -e 'using Pkg; Pkg.build'
-```
+```
+
+!!! note "Attention"
+    For the remainder of this document we assume all commands are run inside the cloned directory, e.g. `SubPrecisionContactDetection.jl`.
diff --git a/docs/src/postprocessing.md b/docs/src/postprocessing.md
@@ -3,17 +3,20 @@
 Once the contact maps have been computed, you often need quantification and additional filtering. 
 For example, coverage, features descriptors, and so forth.
 
-## Aggregating CSV files
+There are three key processing steps disjoint from the actual algorithm output:
+- bleedthrough filter
+- CSV curation
+- Sampling
 
+In the remaining of this document, let us assume `DIR` is the directory where the algorithm saved its output on a full dataset.
 
-## Sampling contacts
-In [scripts/run_cube_sampling_on_dataset.jl](https://github.com/bencardoen/SubPrecisionContactDetection.jl/scripts/run_cube_sampling_on_dataset.jl) you'll find a script that samples contacts with a sliding window, to avoid long tail statistics dominating the conclusion of any analysis. The paper goes into more depth why this is beneficial.
-
-
-
-## Preprocessing and filtering
+## Bleedthrough filter
 The background filter removes ghost effects (bleedthrough).
-If you want to tune this without invoking the full pipeline, you can do so:
+It is run as part of the pipeline, but you can invoke it separately.
+
+!!! note "This is Optional"
+    This is entirely optional, but useful if you want to optimize this filter independently.
+```
 
 Suppose we want to filter all tif files ending with "1.tif" or "2.tif" , for z=1 to 1.1 in 0.25 steps, and then compute the object properties.
 ```julia
@@ -43,4 +46,37 @@ For all the files, it will generate a CSV with columns, where each row is an obj
 
 !!! warning "Shape features"
     The ``\lambda`` values are disabled by default due given that for very large objects they can stall the pipeline (1e6 voxels).
+
+## CSV Curation
+You can run our Python script to aggregate and curate the processed CSV files.
+
+```python
+python3 scripts/csvcuration.py --inputdirectory <where you saved the output> --outputdirectory <where you want the new CSV files saved>
+```
+By default this will look for output produced with ``\alpha`` 0.05, you can override this as needed with `--alpha 0.01` for example.
+
+This will produce:
+
+```
+contacts_aggregated.csv             # Contacts aggregated per cell, so 1 row = 1 cell, use this for e.g. mean height, Q95 Volume
+contacts_filtered_novesicles.csv    # All contacts, without vesicles
+contacts_unfiltered.csv             # All contacts, no filtering
+```
+
+## Sampling contacts
+In [scripts/run_cube_sampling_on_dataset.jl](https://github.com/bencardoen/SubPrecisionContactDetection.jl/scripts/run_cube_sampling_on_dataset.jl) you'll find a script that samples contacts with a sliding window, to avoid long tail statistics dominating the conclusion of any analysis. The paper goes into more depth why this is beneficial.
+
+```julia
+julia --project=. scripts/run_cube_sampling_on_dataset.jl  --inpath DIR --outpath  <where to save your output>
+```
+
+A convenience script is provided to further aggregate the output of this stage.
+
+```python
+python3 scripts/coverage.py  --inputdirectory DIR --outputdirectory <where to save your ouput>
+```
+
+This will print summary output and save a file `coverage_aggregated.csv`. The columns Coverage % mito by contacts, mean per cell and ncontacts mean are the columns you'll be most interested in.
+
+They report the coverage of contacts on mitochondria (minus MDVs), and the number of contacts per sliding window of 5x5x5 voxels.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -23,4 +23,4 @@ The below 3D rendering shows the software predicting ER-Mitochondria contacts in

		![example.png](./assets/example.png)

		Mitochondria are in red, ER in green translucent, the contact zones in white.
		Mitochondria are in red, ER in green translucent, the contact zones in white.