Skip to content

RNA-seq analysis pipeline for non-model organism

Notifications You must be signed in to change notification settings

Lisijun-m/pipeone-nm

 
 

Repository files navigation

PipeOne-NM

PipeOne-NM is a tool for genome functional annotation, non-coding RNA identification, transcripts alternative splicing analysis and gene differential expression in non-model organisms.

PipeOne-NM is applicable to species with reference genome and can provide comprehensive insights into transcriptome of non-model organisms. It integrates twenty-one different tools to extract information from RNA-seq data and utilizes five R packages to carry out downstream analysis and data visualization.

Index

Prerequisites and Configuration

  1. Docker or conda
  2. Java (version >= 1.7)
  3. Nextflow (version >= 20.07.1.5413) (can be installed via conda)
  • Operation System: Linux
  • CPU Core Number: 33
  • RAM: 55G

Installation

1. Download PipeOne-NM

git clone https://github.com/Lisijun-m/pipeone-nm.git

2. Setup

2.1 pull docker image or install conda environments

Option 1: pull docker images run by docker

docker pull sijunli/pipeone_nm:latest

Option 2: install conda environments

cd pipeone-nm/INSTALL
bash ./install.sh

To ensure a successful installation, we recommend that you run each command step by step in the shell script install.sh

After conda enviroments are settled, install Trinotate, trinityrnaseq, miRanda and CIRI2 in pipeone directory.

An example may be like this:

## Trinotate
cd pipeone
git clone https://github.com/Trinotate/Trinotate.git

## set up trinotate databases
/path/to/pipeone/Trinotate/admin/Build_Trinotate_Boilerplate_SQLite_db.pl  Trinotate

## trinityrnaseq
cd pipeone
wget https://github.com/trinityrnaseq/trinityrnaseq/archive/refs/tags/Trinity-v2.13.2.tar.gz
tar -xzvf Trinity-v2.13.2.tar.gz

## CIRI2
cd pipeone
wget https://sourceforge.net/projects/ciri/files/CIRI-full/CIRI-full_v2.0.zip
unzip CIRI-full_v2.0.zip

## miRanda
cd pipeone
wget http://cbio.mskcc.org/microrna_data/miRanda-aug2010.tar.gz
tar zxvf miRanda-aug2010.tar.gz
mv miRanda-3.3a  miRanda
./configure --prefix=$HOME/software/miRanda
make
make install
echo 'export PATH=/path/to/pipeone/miRanda-3.3a/src:$PATH' >> ~/.bashrc
source ~/.bashrc

The docker images contains Trinotate, trinityseq, miRanda and CIRI2, so the installation is needed only in option 2.

2.2 softwares need to be registerated

The above four softwares are not included in the docker container nor the conda install.sh script and need to be installed by users.

In Docker Images

Run the docker container in interaction mode, and install tmhmm, RNAMMER and signalP in directory /pipeone. The following codes can be used as a guide.

docker run -it sijunli/pipeone_nm
cd /pipeone
tar -zxvf signalp-4.1g.Linux.tar.gz
tar -zxvf rnammer-1.2.src.tar.gz
tar -zxvf tmhmm-2.0c.Linux.tar.gz
echo 'export PATH=/pipeone/signalp-4.1:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone/rnammer-1.2:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone/tmhmm-2.0c/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
exit

In conda environments

Install tmhmm, RNAMMER and signalP in /path/to/pipeone. The following codes can be used as a guide.

cd /path/to/pipeone
tar -zxvf signalp-4.1g.Linux.tar.gz
tar -zxvf rnammer-1.2.src.tar.gz
tar -zxvf tmhmm-2.0c.Linux.tar.gz
echo 'export PATH=/pipeone/signalp-4.1:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone/rnammer-1.2:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone:$PATH' >> ~/.bashrc
echo 'export PATH=/pipeone/tmhmm-2.0c/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

3. Input Preparation

sample information file

sample information (including biological conditions and repetitions) of your project in txt format, for example:

Condition_A    sample1    sample1_left.fq    sample1_right.fq
Condition_A    sample2    sample2_left.fq    sample2_right.fq
Condition_B    sample3    sample3_left.fq    sample3_right.fq
Condition_B    sample4    sample4_left.fq    sample4_right.fq

Condition_A and Condition_B stand for biological conditions (treatments), sample1 stands for sample name and sample1_left.fq stands for sequencing file in fastq format.

Alternative Splicing file

For alternative splicing analysis, provide information file for two biological conditions in txt format and name the two file as AS_con1.txt and AS_con2.txt.

For example, AS_con1.txt is file that contains sample alignment file path for the first biological condition. Each bam file path is separated by comma.

/result/hisat2/sample1.bam,/result/hisat2/SRR16151828.bam,/result/hisat2/sample2.bam

Run the Pipeline

Option 1: run with nextflow (Docker image required)

basic configuration for running with nextflow

Modify the program configuration file pipeone-nm/conf/genomes.config, change the line below and store preparation files for PipeOne-NM:

params {
  genomes {
  "test" {
	refgenome = "/path/to/genome/genomic.fna"
	sample = "/path/to/genome/samples.txt"
	ref_miRNA = '/path/to/genome/miRNA_ref.fasta'	// for miranda
	uniprot = "/path/to/genome/uniprot_sprot.pep"                                                     // for Trinotate
	pfam = "/path/to/genome/Pfam-A.hmm"                                                               // for Trinotate
	sqlite = "/path/to/genome/Trinotate.sqlite"                                         	  // for Trinotate 
	min_length = 150
    }
  }
}
  • Replace test with the name of your project
  • refgenome is the file path of reference genome (fasta format), replace it with your path of reference genome
  • ref_miRNA is the miRNA reference for non-model species. Reference from genetic similar model organisms can be used as alternative. Replace it with your path of miRNA reference. In the test data of grass carp, the miRNA reference was downloaded from fishmirna
  • uniprot is a input database, downloading of uniprot can be found in Trinotate
  • pfam is a input database, downloading of pfam can be found in Trinotate
  • sqlite is input sqlite file, downloading of sqlite can be found in Trinotate
  • sample is the file mentioned in Input Preparation part

Open the pipeone-nm/main.nf and replace line 25 with corresponding file path

as_files = tuple('/path/to/AS_con1.txt', '/path/to/AS_con2.txt')

Then run the pipeline with nextflow:

docker pull nongbaoting/pipeone_nm:latest
nextflow run /path/to/pipeOne-nm  -profile docker --sra './sra/*.sra' --genome test
  • Required Input Files
    • RNA-seq data
      • --sra, RNA-seq data in sra format downloaded from GEO, store in directory sra, e.g. SRR16151823.sra
      • or --fastq, RNA-seq data in fastq format
    • --genome, genome defined in conf/genomes.config

Option 2: run with python script (Docker image required)

docker run -it -v /Users/project:/work_dir sijunli/pipeone_nm /bin/bash 
python basicPipeline.py -i /work_dir/path/to/sra -o /work_dir/path/to/output
exit
  • directory /Users/project is the directory contains all the raw data and will store the output, it will be projected in docker images as directory /work_dir
  • After the execution is done, exit the docker images and the analysis result will be stored in directory /Users/project/

Option 2: run with python script (Conda enviroments required)

The python script of pipeline has multiple parameters, you can see a full description of the parameters typing "python basicPipeline.py -h".

An example might be this:

python /path/to/pipeone/basicPipeline.py -i /path/to/sra -o /path/to/output

Parameters

The parameters set in the pipeline can be viewed in Parameters.md. Other parameters are default parameters for each tool.

Supplementary scripts

Compared with above process, differential expression analysis needs to be tailored to different datasets. It may involve removing of batch effect, differential analysis between multiple biological conditions and retrieving differential expression genes according to individualized threshold.

Therefore, the differential analysis pipeline is not included in our PipeOne-NM, but here we present an example of performing differential analysis on grass carp circRNA host genes in folder Supplementary-example-scripts.

About

RNA-seq analysis pipeline for non-model organism

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 51.0%
  • Java 34.6%
  • Nextflow 5.1%
  • Python 5.0%
  • Roff 1.7%
  • HTML 1.3%
  • Other 1.3%