This pipeline is designed to download .fastq.gz
files from a Synapse repository, run quality control (QC) using FastQC, and generate a combined report using MultiQC.
- Nextflow: Install from nextflow.io.
- Python: Ensure Python 3.x is installed.
- SynapseClient: Install using
pip install synapseclient
. - FastQC and MultiQC: Ensure these tools are installed and available in your
PATH
, or use containers if preferred. - Docker/Singularity (optional): To use containers for reproducibility.
-
Clone this repository:
git clone https://github.com/ncihtan/nf-htan-qc.git cd project
-
Ensure dependencies are installed:
pip install synapseclient
-
Log in to Synapse:
- Ensure you are logged in using
synapse login
or have a.synapseConfig
file set up in your home directory.
- Ensure you are logged in using
Run the pipeline with a prepared sample sheet (csv):
nextflow run main.nf --input_csv /absolute/path/to/input.csv
input_csv: Absolute path to the input CSV file. This file should have two columns:
- filename: The exact name of the file to download (e.g., file1.fastq.gz).
- synapse_id: The Synapse ID containing the file (e.g., syn12345678)
- The pipeline generates a
multiqc_report.html
file that combines all the QC outputs in a single report. - Individual FastQC result files are stored in the
fastqc_results/
directory.
- The pipeline only processes
.fastq.gz
files. Ensure your Synapse data folder contains these files. - If using Docker or Singularity, adjust the
nextflow.config
to include container paths.
- Ensure Synapse authentication is set up properly.
- Verify that
FastQC
andMultiQC
are installed or available in your container.
Use input.csv'
in /test
to test the pipeline
nextflow run main.nf --input_csv /path/to/input.csv
This will download all .fastq.gz
files from the specified Synapse ID, run QC on them, and produce a multiqc_report.html
summarizing the QC results.