Skip to content
/ RFSC Public
forked from cobilab/RFSC

Reference-Free Sequence Classification Tool for DNA sequences in metagenomic samples

License

Notifications You must be signed in to change notification settings

exonbits/RFSC

 
 

Repository files navigation

License: GPL v3

RFSC is a Reference-Free Sequence Classification Tool that using machine learning classifiers relies on an ensemble of experts in order to provide efficient classification in metagenomic contexts.

Instalation

git clone https://github.com/cobilab/RFSC
cd RFSC
./RFSC.sh --install

Using Docker

git clone https://github.com/cobilab/RFSC
cd RFSC
docker-compose build
docker-compose up -d && docker exec -it rfsc bash && docker-compose down

Build NCBI Reference Databases

./RFSC.sh --build-ref-virus --build-ref-bacteria --build-ref-archaea --build-ref-protozoa \ --build-ref-fungi --build-ref-plant --build-ref-mitochondrial --build-ref-plastid

Running Examples

✨ Generate a synthetic sequence and subsequently proceed to a Reference-Free Reconstruction of the same:

 

./RFSC.sh --clean y
./RFSC.sh --threads 8 --gen-adapters
./RFSC.sh --efetch-fasta 155971 Input_Data/EntrezGenomes 
./RFSC.sh --efetch-fasta EF491856.1 Input_Data/EntrezGenomes 
./RFSC.sh --efetch-fasta MT682520 Input_Data/EntrezGenomes
./RFSC.sh -synt Input_Data/EntrezGenomes/155971.fna Input_Data/EntrezGenomes/EF491856.1.fna Input_Data/EntrezGenomes/MT682520.fna
./RFSC.sh -trim TT PE --run-de-novo
✨ Reference-Based Classification, usign FALCON-meta:
(If the reference databases have already been built and the Reference Free Reconstruction stage is finished)

 

./RFSC.sh --threads 8 --set-len-cov 100 3 --set-threshold-max-min 70 1 --run-falcon SO Viral
✨ Reference-Free Classification, using XBoost

 

./RFSC.sh --threads 8 --efetch-fasta 155971 RefFree
./RFSC.sh --run-xgboost

System Requirements

Laptop computer running Linux Ubuntu (for example, 18.04 LTS or higher) with GCC (https://gcc.gnu.org), Conda (https://docs.conda.io) and CMake (https://cmake.org) installed. The hardware must contain at least 8 GB of RAM, and a 800 GB disk. In the case of the this, if the database is not re-built, it is only needed near 10 GB of space.

Tools Integrated in RFSC

Tool URL
Trimmomatic http://www.usadellab.org/cms/?page=trimmomatic
FASTP https://github.com/OpenGene/fastp
metaSPAdes https://cab.spbu.ru/software/meta-spades/
GTO https://cobilab.github.io/gto/
Entrez https://www.ncbi.nlm.nih.gov/genome
FALCON-meta https://github.com/cobilab/falcon
Cryfa https://github.com/cobilab/cryfa
Blastn https://blast.ncbi.nlm.nih.gov/Blast.cgi
ORFfinder https://www.ncbi.nlm.nih.gov/orffinder/
ORFM https://github.com/wwood/OrfM
GeCo3 https://github.com/cobilab/geco3
AC https://github.com/cobilab/ac

License

GNU GPL

✨Developed to make a change!✨

About

Reference-Free Sequence Classification Tool for DNA sequences in metagenomic samples

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 58.7%
  • Python 41.2%
  • Dockerfile 0.1%