Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New monitoring framework #3

Open
wants to merge 9 commits into
base: Smooth-installation-procedure
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
annotations/*
23 changes: 11 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# nanocrop

This is a helper repository supporting a toolchain used for real-time monitoring of sequencing runs. The toolchain consists of [deepnano-blitz](https://github.com/fmfi-compbio/deepnano-blitz) basecaller used for MinKnow-compatible `.fastq` files production and [RAMPART](https://artic.network/rampart) for sequencing runs analysis. The repository contains RAMPART protocol and configuration for SARS-CoV-2 virus sequencing as well as some helper scripts. RAMPART protocol is slightly adjusted version of the one located [here](https://github.com/artic-network/artic-ncov2019), although another primer scheme using 2000bp long amplicons is provided in nanocrop and available for use.
This is a helper repository supporting a toolchain used for real-time monitoring of sequencing runs. The toolchain consists of [deepnano-blitz](https://github.com/fmfi-compbio/deepnano-blitz) basecaller used for MinKnow-compatible `.fastq` files production and [RAMPART](https://artic.network/rampart) for sequencing runs analysis. The repository contains RAMPART protocol and configuration for SARS-CoV-2 virus sequencing as well as some helper scripts. RAMPART protocol is slightly adjusted version of the one located [here](https://github.com/artic-network/artic-ncov2019), although other primer schemes using 2000bp long amplicons and 2500bp long amplicons are provided in nanocrop and available for use.

## Toolchain installation

Expand Down Expand Up @@ -28,26 +28,25 @@ Web browser needs to be installed on workstation performing RAMPART analysis.

## Toolchain execution

Toolchain will use `basecall_continuous_reads.sh` script to watch a designated input folder for `.fast5` files being continuously created during sequencing run. For every `.fast5` file `deepnano-blitz` basecaller is invoked. Basecaller parameters in the script are fixed and set rather for short basecalling times than output accuracy. Please adjust basecaller parameters in the script manually according to your needs. Basecaller output is stored in `output folder`, which should be also an input folder for RAMPART pipeline. This is by default `rampart/SARS-CoV-2/data/fastq/pass/`. RAMPART will watch input folder configured for its pipeline and process `.fastq` files as they are created. Using its configuration and protocol RAMPART will demultiplex obtained reads and align them to the reference sequence provided. Thus monitoring current results of sequencing run such as reference genome coverage in real time.
Toolchain will watch a designated input folder for `fast5` files being continuously created during sequencing run. For every `fast5` file `deepnano-blitz` basecaller is invoked. If more than one `fast5` file is created while basecaller was busy, next batch composed of those files will be processed in parallel assuming more than one CPU core is enabled for basecalling in configuration. Basecaller output is stored in `output folder`, which should be an input folder for the RAMPART pipeline at the same time. This is by default `rampart/SARS-CoV-2/SARS-CoV-2-400bp/data/fastq/pass/`. RAMPART will watch input folder configured for its pipeline and process any `fastq` files as they are created. Using its configuration and protocol RAMPART will demultiplex obtained reads and align them to the reference sequence provided. Thus monitoring current results of sequencing run such as reference genome coverage per barcoded sample in real time.

Start the basecaller watchdog:
General toolchain parameters and basecaller parameters are specified in configuration file. Default configuration is stored in `config/run_configuration.cfg`. Rampart configuration is performed via its protocol and run configuration both found in rampart directory containing separate configuration per experiment.

Initialize the toolchain:

```
cd <nanocrop-project-dir>

conda activate deepnano-blitz
./basecall_continuous_reads <input-directory> rampart/SARS-CoV-2/data/fastq/pass/
conda activate nanocrop
./scripts/monitor-start.sh config/run_configuration.cfg
```

Start RAMPART analysis in new terminal window:
`monitor-start.sh` initializer starts the toolchain components in background and returns. To terminate the toolchain once experiment is over, run:

```
cd <nanocrop-project-dir>/rampart/SARS-CoV-2/data/

conda activate artic-rampart
rampart --protocol ../protocol/
scripts/monitor-stop.sh
```

RAMPART graphical output is available at `http://localhost:3000`.
Visualization of sequencing-run monitoring is done by RAMPART graphical output and available at `http://localhost:3000`.

Now the toolchain is prepared for the sequencing run.
Report issues at `<[email protected]>`.
26 changes: 0 additions & 26 deletions basecall_continuous_reads.sh

This file was deleted.

55 changes: 55 additions & 0 deletions config/run_configuration.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#########################
# GENERAL CONFIGURATION #
#########################
#
# This is a central configuration of sequencing-run monitoring.
# It provides a way to specify general parameters and basecaller parameters.
# In order to configure RAMPART monitoring pipeline, please refer to
# SARS-CoV-2 protocol and run configuration.
#
# Absolute path to directory containing MinKnow sequencing runs
seq_dir=/var/lib/minknow/data/

# Absolute path to basecaller input directory being watched for .fast5 files to be processed.
# If not set, input directory will be determined automatically during initialization.
input_dir=

# Absolute path to basecaller output directory;
# (will be a RAMPART pipeline input directory at the same time).
output_dir=

# Relative path to directory containing RAMPART protocols
protocol_dir=../rampart/SARS-CoV-2/

# Relative path to RAMPART protocol configuration directory.
# If not set, user will be prompted with options during monitoring setup.
# Selected directory should contain 'run/' folder with RAMPART run_configuration.json
# and 'protocol' folder with RAMPART protocol.
protocol_conf_dir=

# Relative path to RAMPART annotations. RAMPART annotations are by default stored
# in 'annotations/' folder and are cleared before every RAMPART analysis.
# Annotations can be stored and used to display sequencing stats of a particular run
# without any computation necessary on the RAMPART side.
annotations_dir=

############################
# BASECALLER CONFIGURATION #
############################
#
# Number of CPU cores available for basecalling
cpu_cores=1

# The size of RNN model used for raw signal classification {48,56,64,80,96,256}
# Select small size for high speed and acceptable precision; maximum size for
# best precision but significantly reduced speed
network_type=48

# Default is 5 for network sizes {48,56,64,80,96} and 20 for network size {256}
beam_size=5

# Default is 0.1 for network sizes {48,56,64,80,96} and 0.0001 for network size {256}
beam_cut_threshold=0.1

# RAMPART pipeline can process both "fasta" and "fastq" formats
output_format="fastq"
22 changes: 22 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-2000bp/protocol/primers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "SARS-CoV-2 primer scheme 2000bp",
"amplicons": [
[30, 2079],
[1925, 3737],
[3580, 5548],
[5394, 7255],
[7092, 9123],
[8976, 10837],
[10676, 12679],
[12519, 14328],
[14176, 15978],
[15827, 17754],
[17571, 19485],
[19310, 21241],
[21075, 22996],
[22850, 24812],
[24649, 26542],
[26386, 28351],
[27914, 29790]
]
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
{
"title": "SARS-CoV-2 Sequencing Run",
"basecalledPath": "fastq/pass",
"title": "SARS-CoV-2 Sequencing Run 2000bp",
"simulateRealTime": true,
"clearAnnotated": true,
"displayOptions": {
Expand Down
46 changes: 46 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-2500bp/protocol/genome.json

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-2500bp/protocol/primers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "SARS-CoV-2 primer scheme 2500bp",
"amplicons": [
[30, 2592],
[1875, 4450],
[4294, 6873],
[6286, 8851],
[8595, 11074],
[10362, 12802],
[12710, 15246],
[14545, 17152],
[16846, 19278],
[18896, 21455],
[21357, 23847],
[23122, 25673],
[25601, 28172],
[27446, 29866]
]
}
5 changes: 5 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-2500bp/protocol/protocol.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"name": "SARS-CoV-2 virus protocol v0.1",
"description": "Amplicon based sequencing of SARS-CoV-2 virus.",
"url": "http://artic.network/"
}

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-2500bp/run/run_configuration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"title": "SARS-CoV-2 Sequencing Run 2500bp",
"simulateRealTime": true,
"clearAnnotated": true,
"displayOptions": {
"coverageThresholds": {
">200x": 200, ">100x": 100, ">20x": 20, "0x": 0
}
}
}
46 changes: 46 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-400bp/protocol/genome.json

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-400bp/protocol/protocol.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"name": "SARS-CoV-2 virus protocol v0.1",
"description": "Amplicon based sequencing of SARS-CoV-2 virus.",
"url": "http://artic.network/"
}
2 changes: 2 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-400bp/protocol/references.fasta

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions rampart/SARS-CoV-2/SARS-CoV-2-400bp/run/run_configuration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"title": "SARS-CoV-2 Sequencing Run 400bp",
"simulateRealTime": true,
"clearAnnotated": true,
"displayOptions": {
"coverageThresholds": {
">200x": 200, ">100x": 100, ">20x": 20, "0x": 0
}
}
}
Empty file.

This file was deleted.

Loading