v1.0.0 - stable refactorized and enhanced version with complete docs and examples
Enhancements
- Refactorize pipeline, reduce scope/remove redundancy to dedicated downstream analysis modules, and integrate into MR.PARETO for v1.0.0.
- Abstract everything CeMM/BSF specific to the configuration level to make the pipeline more accessible.
- Revamp the MultiQC report to improve clarity and utility.
- Adapt result folder structure and reporting to conform with MR.PARETO standards.
- Simplify pipeline configuration to only two files (instead of 4) for ease of use.
- Ensure compatibility with Snakemake version 7.
- Implement quick aggregation of all quantifications, reducing processing time from mutliple hours to a few minutes.
- Make the
slop_extension
parameter configurable to accommodate different user needs. - Aggregate sample-wise HOMER known motif enrichment results for easier downstream analysis.
- Add promoter and TSS region quantification features to the pipeline.
- Extract and present MultiQC statistics in a more accessible format.
Documentation
- Provide a tutorial and tips for quality control (QC).
- Add example microdata for human (hg38) and mouse (mm10) genomes to assist new users.
- Include a guide for resource downloading for required hg38 and mm10 data.
- Enable sharing of the genome browser track hub for collaborative work.
- Document the configuration process for UROPA within the pipeline.
- Point to downstream analysis modules.
- Adapt and extend Methods accordingly to refactorization, reduced scope and new features.
Small improvements
- Fix the mitochondrial fraction metric in the pipeline report.
- Create and add versioning to all environment YAML files.
- Make the installation of Homer a dedicated rule/job within the pipeline.
- Fix the
multiqc.yaml
installation error to ensure smooth setup. - Remove temporary BAM files before Bowtie execution to save disk space and improve performance.
- Provide a consensus region annotation file, expanded by nucelotide content information.
- Clean up bash commands in all rules to improve code quality and maintainability.
- Split the UROPA region annotation rule to allow for parallel processing and increased efficiency.
- Tested the pipeline on 100 ATAC-seq samples as a loaded module within another Snakemake workflow project to ensure compatibility and robustness.
Full Changelog: v0.1.2...v1.0.0