- Introduction
- Install the pipeline
- Running the pipeline
- Main arguments
- Mandatory arguments
- Job resources
- Other command line parameters
Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through screen
/ tmux
or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler.
It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in ~/.bashrc
or ~./bash_profile
):
NXF_OPTS='-Xms1g -Xmx4g'
How to install FLORA:
git clone https://gitlab.ifremer.fr/cn7ab95/FLORA.git
The most simple command for running the workflow is to use the provided PBS script as follows:
nextflow run main.nf
This will launch the workflow using local configurations.
For our usage, we adapt configuration to our supercomputer and we launch the workflow with our scheduler:
qsub run-main.nf
Note that the pipeline will create the following files in your working directory:
$SCRACTH/flora_workdir # Directory containing the nextflow working files
$PWD/results # Finished results (configurable, see below)
$PWD/.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
When you run the above command, Nextflow automatically runs the pipeline code from your git clone - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the version of the pipeline:
cd FLORA
git pull
It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
First, go to the FLORA releases page and find the latest version number (eg. v1.0.0
).
cd FLORA
git checkout v1.0.0
Path to the RNAseq raw data files in FASTQ format.
Path to the Bowtie2 index of the SILVA rRNA database.
Path to text file a that describes the data (condition, replicate) like the following example:
cond_A cond_A_rep1 reads_A_rep1_R1.fq reads_A_rep1_R2.fq
cond_A cond_A_rep2 reads_A_rep2_R1.fq reads_A_rep2_R2.fq
cond_B cond_B_rep1 reads_B_rep1_R1.fq reads_B_rep1_R2.fq
cond_B cond_B_rep2 reads_B_rep2_R1.fq reads_B_rep2_R2.fq
The minimum length of kept reads after quality trimming.
The minimum quality of bases in each reads.
The overlap with adapter sequence required to trim a sequence.
The maximum allowed error rate.
Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of 143
(exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
The output directory where the results will be published.
The temporary directory where intermediate data will be written.
Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic.
Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.
You can also supply a run name to resume a specific run: -resume [run-name]
. Use the nextflow log
command to show previous run names.
NB: Single hyphen (core Nextflow option)
Use to set a top-limit for the default memory requirement for each process.
Should be a string in the format integer-unit. eg. --max_memory '8.GB'
Use to set a top-limit for the default time requirement for each process.
Should be a string in the format integer-unit. eg. --max_time '2.h'
Use to set a top-limit for the default CPU requirement for each process.
Should be a string in the format integer-unit. eg. --max_cpus 1