Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
cheny19 authored Jun 9, 2020
1 parent f5865dc commit a640384
Showing 1 changed file with 17 additions and 7 deletions.
24 changes: 17 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[![Release](https://img.shields.io/github/v/release/bcgsc/nanosim?include_prereleases)](https://github.com/bcgsc/NanoSim/releases)
[![Downloads](https://img.shields.io/github/downloads/bcgsc/Nanosim/total?logo=github)](https://github.com/bcgsc/NanoSim/archive/v2.5.0.zip)
[![Downloads](https://img.shields.io/github/downloads/bcgsc/Nanosim/total?logo=github)](https://github.com/bcgsc/NanoSim/archive/v2.6.0.zip)
[![Conda](https://img.shields.io/conda/dn/bioconda/nanosim?label=Conda)](https://anaconda.org/bioconda/nanosim)
[![Stars](https://img.shields.io/github/stars/bcgsc/NanoSim.svg)](https://github.com/bcgsc/NanoSim/stargazers)

![NanoSim](https://github.com/bcgsc/NanoSim/blob/master/NanoSim%20logo.png)
![NanoSim](https://github.com/bcgsc/NanoSim/blob/master/NanoSim_logo.png)

NanoSim is a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology.

Expand Down Expand Up @@ -246,7 +246,7 @@ usage: simulator.py genome [-h] -rg REF_G [-c MODEL_PREFIX] [-o OUTPUT]
[-med MEDIAN_LEN] [-sd SD_LEN] [--seed SEED]
[-k KMERBIAS] [-b {albacore,guppy,guppy-flipflop}]
[-s STRANDNESS] [-dna_type {linear,circular}]
[--perfect] [-t NUM_THREADS]
[--perfect] [--fastq] [-t NUM_THREADS]
optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -285,6 +285,7 @@ optional arguments:
Specify the dna type: circular OR linear (Default =
linear)
--perfect Ignore error profiles and simulate perfect reads
--fastq Output fastq files instead of fasta files
-t NUM_THREADS, --num_threads NUM_THREADS
Number of threads for simulation (Default = 1)
Expand All @@ -298,10 +299,10 @@ __transcriptome mode usage:__
usage: simulator.py transcriptome [-h] -rt REF_T [-rg REF_G] -e EXP
[-c MODEL_PREFIX] [-o OUTPUT] [-n NUMBER]
[-max MAX_LEN] [-min MIN_LEN] [--seed SEED]
[-k KMERBIAS] [-b {albacore, guppy}]
[-k KMERBIAS] [-b {albacore,guppy}]
[-r {dRNA,cDNA_1D,cDNA_1D2}] [-s STRANDNESS]
[--no_model_ir] [--perfect] [-t NUM_THREADS]
[--uracil]
[--no_model_ir] [--perfect] [--polya POLYA]
[--fastq] [-t NUM_THREADS] [--uracil]
optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -340,14 +341,18 @@ optional arguments:
0 and 1
--no_model_ir Simulate intron retention events
--perfect Ignore profiles and simulate perfect reads
--polya POLYA Simulate polyA tails for given list of transcripts
--fastq Output fastq files instead of fasta files
-t NUM_THREADS, --num_threads NUM_THREADS
Number of threads for simulation (Default = 1)
--uracil Converts the thymine (T) bases to uracil (U) in the
output fasta format
```


\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.
\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.

\* Notice: the transcript name in the expression tsv file and the ones in th polyadenylated transcript list has to be consistent with the ones in the reference transcripts, otherwise the tool won't recognize them and don't know where to find them to extract reads for simulation.

__Example runs:__
1 If you want to simulate _E. coli_ genome, then circular command must be chosen because it's a circular genome
Expand All @@ -371,6 +376,9 @@ __Example runs:__
7 If you want to simulate five thousands cDNA/directRNA reads from mouse reference transcriptome without modeling intron retention
`./simulator.py transcriptome -rt Mus_musculus.GRCm38.cdna.all.fa -c mouse_cdna -e abundance.tsv -n 5000 --no_model_ir`

8 If you want to simulate two thousands cDNA/directRNA reads from human reference transcriptome with polya tails, mimicking homopolymer bias (starting from homopolymer length >= 6) and reads in fastq format
`./simulator.py transcriptome -rt Homo_sapiens.GRCh38.cdna.all.fa -c Homo_sapiens_model -e abundance.tsv -rg Homo_sapiens.GRCh38.dna.primary.assembly.fa --polya transcripts_with_polya_tails --fastq -k 6 --basecaller guppy -r dRNA`

## Explanation of output files
### 1. Characterization stage
#### 1.1 Characterization stage (genome)
Expand Down Expand Up @@ -425,6 +433,8 @@ __Example runs:__

The information in the header can help users to locate the read easily.

__Specific to transcriptome simulation__: for reads that include retained introns, the header contains the information starting from `Retained_intron`, each genomic interval is separated by `;`.

2. `simulated_error_profile`
Contains all the information of errors introduced into each reads, including error type, position, original bases and current bases.

Expand Down

0 comments on commit a640384

Please sign in to comment.