Skip to content

Latest commit

 

History

History
70 lines (50 loc) · 3.47 KB

README.md

File metadata and controls

70 lines (50 loc) · 3.47 KB

VanishingGlacierMAGs generation and analysis

DOI

This pipeline is used to generate MAGs from different individual assemblies and their respective reads for the VanishingGlaciers project. The pipeline is based on the Snakemake workflow management system and is designed to be run on a high-performance computing cluster.

Pipeline description

  • This pipeline starts with different individual assemblies (fasta files) and their respective reads (mg.r{1,2}.preprocessed.fq files).
  • To reduce computational time, the reads are subsampled to 10% reads per sample and the contigs less than 1.5 kbp are removed.
  • The subsampled reads are then mapped against the assemblies using BWA.
  • The mapped reads are then used to bin the contigs using MetaBAT2, CONCOCT and MetaBinner.
  • The bins are then optimized using DAS_Tool.
  • CheckM2 is used to estimate the quality of the bins and only the ones that are 50% complete are kept.
  • MDMCleaner reduces contamination from those bins.
  • Next, bins are dereplicated with dRep to form MAGs and only bins with >70% completeness and < 10% contamination are kept.
  • Read mapping against all the MAGs is done using BWA.
  • And GtdbTk is used for the taxonomy.
  • MGThermometer is used to measure the optimal growth rate based on the relative abundance of FIVYWREL aminoacids
    • Optimal growth rate is measured as follows,
$$OGT = 937 * F_{IVYWREL} − 335$$

Setup

Conda

Conda user guide

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions

Getting the repository including sub-modules

git clone --recurse-submodules [email protected]:michoug/SnakemakeBinning.git
git checkout busi

Create the main snakemake environment

# create venv
conda env create -f requirements.yaml -n "snakemake"

Run Setup

  • Place your preprocessed/trim reads (e.g. sample_r1.fastq.gz and sample_r2.fastq.gz files) in a reads folder
  • Place the individual assemblies (e.g. sample.fa) into an assembly folder
  • Modify the config/config.yaml file to change the different paths and eventually the different options
  • Modify the config/all_samples.txt file to include your samples

Without Slurm

snakemake -s workflow/Snakefile --configfile config/config.yaml --cores 28 --use-conda -rp

With Slurm

This part was mainly taken from @susheelbhanu nomis_pipeline

  • Modify the slurm.yaml file by checking partition, qos and account that heavily depends on your system
  • Modify the sbatch.sh file by checking #SBATCH -p, #SBATCH --qos= and #SBATCH -A options that heavily depends on your system

sbatch config/sbatch.sh