-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated to containerized nextflow pipeline
- Loading branch information
Showing
8 changed files
with
410 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Build dockerfile on change | ||
name: Build Docker (env/beagle.Dockerfile) | ||
|
||
on: | ||
push: | ||
paths: | ||
- 'env/beagle.Dockerfile' | ||
- '.github/workflows/build_docker.yml' | ||
pull_request: | ||
paths: | ||
- 'env/beagle.Dockerfile' | ||
- '.github/workflows/build_docker.yml' | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
# Build Tools | ||
- name: Build and Publish | ||
uses: elgohr/Publish-Docker-Github-Action@master | ||
with: | ||
name: andersenlab/beagle | ||
tag: "${{ steps.current-time.formattedTime }}" | ||
username: ${{ secrets.KSE_DOCKER_USER }} | ||
password: ${{ secrets.KSE_DOCKER_PASS }} | ||
snapshot: true | ||
dockerfile: beagle.Dockerfile | ||
workdir: "env" | ||
tags: "latest" | ||
cache: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,135 @@ | ||
# impute-nf | ||
# VCF Imputation | ||
|
||
The [impute-nf](https://github.com/AndersenLab/impute-nf) pipeline subsets isotype reference strains from a hard-filter vcf file, creates a SNV-only VCF, and imputes a new VCF. This step is required for fine-mapping with NemaScan. | ||
|
||
This page details how to run the pipeline. | ||
|
||
# Pipeline overview | ||
|
||
``` | ||
########## # ## | ||
## ##### # # | ||
## # # | ||
## # ## ## ### # # # ## # ## #### | ||
## # # # # # # # # # # ### # # # | ||
## # # # # # # # # ### # # # | ||
## # # # # # # # # # # # # | ||
########## # # # ### ### # ### # # # | ||
# | ||
# | ||
# | ||
parameters description Set/Default | ||
========== =========== ======================== | ||
--debug Set to 'true' to test (optional) | ||
--species Species: 'c_elegans', 'c_tropicalis' or 'c_briggsae' (required) | ||
--vcf hard filtered vcf to calculate variant density (required) | ||
--out output folder name (optional) | ||
--chrI | chrII | chrIII... Window and overlap for each chromosome (optional) | ||
``` | ||
|
||
## Software Requirements | ||
|
||
* The latest update requires Nextflow version 23+. On Rockfish, you can access this version by loading the `nf23_env` conda environment prior to running the pipeline command: | ||
|
||
``` | ||
module load python/anaconda | ||
source activate /data/eande106/software/conda_envs/nf23_env | ||
``` | ||
|
||
### Relevant Docker Images | ||
|
||
*Note: Before 20220301, this pipeline was run using existing conda environments on QUEST. However, these have since been migrated to docker imgaes to allow for better control and reproducibility across platforms. If you need to access the conda version, you can always run an old commit with `nextflow run andersenlab/post-gatk-nf -r 20220216-Release`* | ||
|
||
* `andersenlab/beagle:5.2` ([link](https://hub.docker.com/r/andersenlab/beagle)): Docker image is created within this pipeline using GitHub actions. Whenever a change is made to `env/beagle.Dockerfile` or `.github/workflows/build_beagle_docker.yml` GitHub actions will create a new docker image and push if successful | ||
|
||
Make sure that you add the following code to your `~/.bash_profile`. This line makes sure that any singularity images you download will go to a shared location on `/vast/eande106` for other users to take advantage of (without them also having to download the same image). | ||
|
||
``` | ||
# add singularity cache | ||
export SINGULARITY_CACHEDIR='/vast/eande106/singularity/' | ||
``` | ||
|
||
>[!Note] | ||
>If you need to work with the docker container, you will need to create an interactive session as singularity can't be run on Rockfish login nodes. | ||
> | ||
>``` | ||
>interact -n1 -pexpress | ||
>module load singularity | ||
>singularity shell [--bind local_dir:container_dir] /vast/eande106/singularity/<image_name> | ||
>``` | ||
# Usage | ||
*Note: if you are having issues running Nextflow or need reminders, check out the [Nextflow](http://andersenlab.org/dry-guide/rockfish/rf-nextflow/) page.* | ||
## Testing on Rockfish | ||
*This command uses a test dataset* | ||
``` | ||
nextflow run -latest andersenlab/impute-nf --debug | ||
``` | ||
## Running on Rockfish | ||
You should run this in a screen or tmux session. | ||
``` | ||
nextflow run -latest andersenlab/impute-nf --vcf <path_to_vcf> --species <species> | ||
``` | ||
# Parameters | ||
## -profile | ||
There are three configuration profiles for this pipeline. | ||
* `rockfish` - Used for running on Rockfish (default). | ||
* `quest` - Used for running on Quest. | ||
* `local` - Used for local development. | ||
>[!Note] | ||
>If you forget to add a `-profile`, the `rockfish` profile will be chosen as default | ||
## --debug | ||
You should use `--debug` for testing/debugging purposes. This will run the debug test set (located in the `test_data` folder). | ||
For example: | ||
``` | ||
nextflow run -latest andersenlab/impute-nf --debug | ||
``` | ||
## --species | ||
Options: c_elegans, c_briggsae, or c_tropicalis | ||
## --vcf | ||
Path to the hard-filtered vcf output from [`wi-gatk`](https://github.com/AndersenLab/wi-gatk). VCF should contain **ALL** strains. | ||
## --chrI|chrII|chrIII|chrIV|chrV|chrX|MtDNA (optional) | ||
The window size and overlap to use as inputs to Beagle. These parameters have been checked and decided on by previous lab members and Erik. Some chromosomes might require a window size of 3 and an overlap of 1. In recent conversation with the person who manages Beagle, they mentioned we should probably use default values unless we have done simulations to show these values are better. Note for the future maybe. | ||
## --out (optional) | ||
__default__ - `impute-YYYYMMDD` | ||
A directory in which to output results. If you have set `--debug`, the default output directory will be `impute-YYYYMMDD-debug`. | ||
# Output | ||
``` | ||
└── variation | ||
├── WI.20240718.hard-filter.isotype.SNV.vcf.gz | ||
├── WI.20240718.hard-filter.isotype.SNV.vcf.gz.tbi | ||
├── WI.20240718.impute.isotype.SNV.vcf.gz | ||
└── WI.20240718.impute.isotype.SNV.vcf.gz.tbi | ||
|
||
``` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
|
||
/* | ||
LOCAL | ||
|
||
For running the pipeline locally | ||
*/ | ||
|
||
process { | ||
|
||
withLabel: xs { | ||
cpus = 2 | ||
memory = 1.GB | ||
} | ||
|
||
withLabel: sm { | ||
cpus = 2 | ||
memory = 2.GB | ||
} | ||
|
||
withLabel: md { | ||
cpus = 4 | ||
memory = 2.GB | ||
} | ||
|
||
withLabel: lg { | ||
cpus = 4 | ||
memory = 2.GB | ||
} | ||
|
||
withLabel: ml { | ||
cpus = 4 | ||
memory = 2.GB | ||
} | ||
|
||
withLabel: xl { | ||
cpus = 4 | ||
memory = 4.GB | ||
} | ||
|
||
} | ||
|
||
docker { | ||
enabled = true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
/* | ||
Quest Configuration | ||
*/ | ||
|
||
params{ | ||
baseDir = '/projects/b1042/AndersenLab' | ||
workDir = '/projects/b1042/AndersenLab/work' | ||
dataDir = '/projects/b1042/AndersenLab/data' | ||
softwareDir = '/projects/b1042/AndersenLab/software' | ||
} | ||
|
||
process { | ||
executor = 'slurm' | ||
queue = 'genomicsguestA' | ||
errorStrategy='retry' | ||
maxRetries=3 | ||
|
||
withLabel: xs { | ||
clusterOptions = '-A b1042 -t 4:00:00 -e errlog.txt' | ||
cpus = 1 | ||
memory = "4.GB" | ||
} | ||
|
||
withLabel: sm { | ||
clusterOptions = '-A b1042 -t 4:00:00 -e errlog.txt' | ||
cpus = 2 | ||
memory = "8.GB" | ||
} | ||
|
||
withLabel: md { | ||
clusterOptions = '-A b1042 -t 4:00:00 -e errlog.txt' | ||
cpus = 4 | ||
memory = "16.GB" | ||
} | ||
|
||
withLabel: ml { | ||
clusterOptions = '-A b1042 -t 12:00:00 -e errlog.txt' | ||
cpus = 16 | ||
memory = "64.GB" | ||
} | ||
|
||
withLabel: lg { | ||
clusterOptions = '-A b1042 -t 24:00:00 -e errlog.txt' | ||
cpus = 48 | ||
memory = "190.GB" | ||
} | ||
|
||
withLabel: xl { | ||
clusterOptions = '-A b1042 -t 24:00:00 -e errlog.txt' | ||
cpus = 48 | ||
memory = "1500.GB" | ||
} | ||
} | ||
|
||
executor { | ||
queueSize=500 | ||
submitRateLimit=10 | ||
} | ||
|
||
singularity { | ||
enabled = true | ||
autoMounts = true | ||
cacheDir = "${params.baseDir}/singularity" | ||
pullTimeout = '20 min' | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
/* | ||
Rockfish Configuration | ||
*/ | ||
|
||
params { | ||
baseDir = '/vast/eande106' | ||
workDir = '/vast/eande106/work' | ||
dataDir = '/vast/eande106/data' | ||
softwareDir = '/data/eande106/software' | ||
} | ||
|
||
process { | ||
executor = 'slurm' | ||
queueSize = 100 | ||
|
||
withLabel: xs { | ||
clusterOptions = '-A eande106 -t 2:00:00 -e errlog.txt -N 1' | ||
cpus = 1 | ||
memory = "4G" | ||
queue = "shared" | ||
} | ||
|
||
withLabel: sm { | ||
clusterOptions = '-A eande106 -t 2:00:00 -e errlog.txt -N 1' | ||
cpus = 2 | ||
memory = "8G" | ||
queue = "shared" | ||
} | ||
|
||
withLabel: md { | ||
clusterOptions = '-A eande106 -t 2:00:00 -e errlog.txt -N 1' | ||
cpus = 4 | ||
memory = "16G" | ||
queue = "shared" | ||
} | ||
|
||
withLabel: ml { | ||
clusterOptions = '-A eande106 -t 30:00:00 -e errlog.txt -N 1' | ||
cpus = 16 | ||
memory = "64G" | ||
queue = "shared" | ||
} | ||
|
||
withLabel: lg { | ||
clusterOptions = '-A eande106 -t 2:00:00 -e errlog.txt -N 1 --ntasks-per-node 1 --cpus-per-task 48' | ||
// cpus = 48 | ||
//memory = "190G" | ||
queue = "parallel" | ||
} | ||
|
||
withLabel: xl { | ||
clusterOptions = '-A eande106_bigmem -t 4:00:00 -e errlog.txt -N 1' | ||
cpus = 48 | ||
memory = "1500G" | ||
queue = "bigmem" | ||
} | ||
|
||
} | ||
|
||
executor { | ||
queueSize=500 | ||
submitRateLimit=10 | ||
} | ||
|
||
singularity { | ||
enabled = true | ||
autoMounts = true | ||
cacheDir = "${params.baseDir}/singularity" | ||
pullTimeout = '20 min' | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
FROM openjdk:8-jre | ||
MAINTAINER Mike Sauria <[email protected]> | ||
|
||
RUN wget https://faculty.washington.edu/browning/beagle/beagle.28Jun21.220.jar -O /beagle.28Jun21.220.jar && \ | ||
echo "#!/bin/bash" > /usr/local/sbin/beagle && \ | ||
echo "java -Xmx98g -jar /beagle.28Jun21.220.jar \$*" >> /usr/local/sbin/beagle && \ | ||
chmod a+rx /usr/local/sbin/beagle | ||
|
||
RUN apt-get --allow-releaseinfo-change update && \ | ||
apt-get install -y libbz2-dev libvcflib-tools libvcflib-dev procps autoconf automake make gcc \ | ||
perl zlib1g-dev libbz2-dev liblzma-dev libcurl4-gnutls-dev libssl-dev libncurses5-dev && \ | ||
rm -rf /var/lib/apt/lists/* && \ | ||
wget https://github.com/samtools/bcftools/releases/download/1.3.1/bcftools-1.3.1.tar.bz2 -O bcftools.tar.bz2 && \ | ||
tar -xjvf bcftools.tar.bz2 && \ | ||
cd bcftools-1.3.1 && \ | ||
make && \ | ||
make prefix=/usr/local/bin install && \ | ||
mv /usr/local/bin/bin/bcftools /usr/bin/bcftools && \ | ||
rm -rf /usr/local/bin/bin && \ | ||
cd /usr/local/bin && \ | ||
wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 && \ | ||
tar -vxjf htslib-1.9.tar.bz2 && \ | ||
cd htslib-1.9 && \ | ||
make && \ | ||
mv bgzip ../ && \ | ||
cd ../ && \ | ||
rm -rf htslib-1.9 |
Oops, something went wrong.