Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should be aware that, the index subdirectory should be set up as default or make one manually #14

Open
BioWu opened this issue Jan 15, 2018 · 3 comments

Comments

@BioWu
Copy link

BioWu commented Jan 15, 2018

Linda, thanks for your effort for develop this tool. But I think there may exist some space to make this tool more friendly to use.
For instance, I tried several times when build one novel reference for my own. But they all failed. I then look at the createJunctionIndex.sh and found there no any command to check if the index subdirectory exsit or is it a pre-existed file ? I think may be additional sentences needed here:
67 # put the regular junction fasta file into the pipeline index directory 68 mv ${OUT_DIR}/fastas/${FILE_ID}_junctions_reg.fa ${PIPELINE_DIR}/index.
Besides, I can not understand where should be the true file path for your index that could be identitied by you pipeline. Is this circularRNApipeline/denovo_scripts/index or just this circularRNApipeline/index/ ? I think may be you should set one as default and update the README file in the createJunctionIndex directory.

Hope this could be help to make KNIFE be a little better.
:)

@BioWu BioWu changed the title Is should be aware that, the index subdirectory should be set up as default Should be aware that, the index subdirectory should be set up as default or make one manually Jan 15, 2018
@Vivianstats
Copy link

Vivianstats commented Oct 31, 2018

Hello BioWu,

Did you figure out what are the correct locations to store the index files?

I got the error messages:

Could not locate a Bowtie index corresponding to basename "mm10_junctions_scrambled"

But I stored the files as specified below:

Annotated junction indices are available for Human (hg19), Mouse (mm10), Rat (rn5) and Drosophila (dm3).
We have also packaged up all of the the transcriptome, genome, and ribosomal index, fasta, and gtf files
for each of these genomes named as required for use with our scripts. You will need to unpack the Bowtie2
tar (use: tar zxvf genomeId_BT2index.tar.gz) and the fastas tar (use: tar zxvf genomeId_fastas.tar.gz)
into circularRNApipeline_/index. You will need to unpack the Bowtie1 tar (use: tar zxvf genomeId_BT1index.tar.gz)
into circularRNApipeline_
/denovo_scripts/index. The gtf file must be downloaded and uncompressed
(gunzip genomeId_genes.gtf.gz) into the circularRNApipeline_*/denovo_scripts directory.

@BioWu
Copy link
Author

BioWu commented Nov 1, 2018

Preparing those required files

#Be careful, the dmel_r6.17 (or anything else) should be used in all fasta files
cd ~/database/dmel/
cp dmel-all-r6.17.gtf dmel_r6.17_gene.gtf
cp dmel-all-chromosome-r6.17.fasta dmel_r6.17_genome.fasta
bowtie-build  dmel-all-miscRNA-r6.17.fasta  dmel_r6.17_ribosomal.fasta >/dev/null&
bowtie-build  dmel-all-transcript-r6.17.fasta  dmel_r6.17_transcriptome.fasta >/dev/null&
bowtie2-build  dmel-all-transcript-r6.17.fasta  dmel_r6.17_transcriptome.fasta >/dev/null&
bowtie2-build dmel_r6.17_genome.fasta dmel_r6.17_genome >/dev/null&
python ~/software/KNIFE/createJunctionIndex/makeExonDB.py -f dmel_r6.17_genome.fasta -a dmel_r6.17_gene.gtf -o ./
cp *bt2 ~/software/KNIFE/circularRNApipeline_Cluster/index/
cp *ebwt ~/software/KNIFE/circularRNApipeline_Cluster/index/
cp dmel_r6.17_genome.fasta ~/software/KNIFE/circularRNApipeline_Cluster/index/

cd ~/software/KNIFE/createJunctionIndex
bash ./createJunctionIndex.sh ~/software/KNIFE/circularRNApipeline_Cluster/ ~/software/KNIFE/circularRNApipeline_Cluster/denovo_scripts/index
 ~/database/dmel/ dmel_r6.17

createJunctionIndex.sh

#!/bin/sh

# This assumes that makeExonDB.py has already been called, creating a directory OUT_DIR that contains exons, genes, and records directories.
# This code now parses that exon info and creates junction fasta files for scrambled and linear junctions and bowtie2 indices and places these
# into the pipeline code so they can be used by specifying ${INDEX_MODE_ID} in the MODE parameter when calling findCircularRNA.sh.

# pipelineDirectory: path to the circularRNApipeline directory. Created files will be placed inside the index directory.
# outputDirectory: the output directory specified in the call to createExonDB.sh for this species
# fileIdentifier: should be short String to help distinguish it from other genomes. This will be included in all of the fasta file names and bowtie index file names
# primaryGeneName: field in gtf used to assign gene name, default: gene_name
# secondaryGeneName: field in gtf used to assign gene name if primary does not exist, default: gene_id
# windowSize: size of sliding window to include exon pairs in junction database, default: 1000000

# example usage: ./createJunctionIndex.sh /home/linda/circularRNApipeline
#                                        /home/linda/index/pombe
#                                        ASM294v2_23_test
#                                        1000000
#                                        gene_name
#                                        gene_id
 
PIPELINE_DIR=$1
OUT_DIR=$2
FILE_ID=$3

if [ $# -ge 4 ]
then
  WINDOW=${4}  
else
  WINDOW=1000000  
fi

if [ $# -ge 5 ]
then
  GENE_NAME_1=${5}
else
  GENE_NAME_1=gene_name
fi

if [ $# -ge 6 ]
then
  GENE_NAME_2=${6}
else
  GENE_NAME_2=gene_id
fi

python makeJunctionsAndWriteFasta.py -w ${WINDOW} -e ${OUT_DIR}/exons -r ${OUT_DIR}/records -f ${OUT_DIR}/fastas -n1 ${GENE_NAME_1} -n2 ${GENE_NAME_2} -v

# combine into single file, using xargs method to avoid argument list too long error in bash

echo "find ${OUT_DIR}/fastas/ -size 0 -delete"
find ${OUT_DIR}/fastas/ -size 0 -delete

echo "cat ${OUT_DIR}/fastas/*.fa > ${OUT_DIR}/${FILE_ID}.fa"
cat ${OUT_DIR}/fastas/*.fa > ${OUT_DIR}/${FILE_ID}.fa  # combine into single file

#ls ${OUT_DIR}/fastas | xargs -n 32 -P 8 cat >> ${OUT_DIR}/${FILE_ID}.fa  

# split the junctions into files containing only reg, only rev, and only dup junctions
python limitFasta.py -s ${OUT_DIR}/${FILE_ID}.fa -o ${OUT_DIR}/fastas/ -t reg -p _junctions_reg
python limitFasta.py -s ${OUT_DIR}/${FILE_ID}.fa -o ${OUT_DIR}/fastas/ -t dup -p _junctions_dup
python limitFasta.py -s ${OUT_DIR}/${FILE_ID}.fa -o ${OUT_DIR}/fastas/ -t rev -p _junctions_rev

# combine rev and dup junctions into scrambled fasta file and put in the pipeline index
echo "cat ${OUT_DIR}/fastas/${FILE_ID}_junctions_rev.fa ${OUT_DIR}/fastas/${FILE_ID}_junctions_dup.fa > ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled.fa"
cat ${OUT_DIR}/fastas/${FILE_ID}_junctions_rev.fa ${OUT_DIR}/fastas/${FILE_ID}_junctions_dup.fa > ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled.fa

# put the regular junction fasta file into the pipeline index directory
mv ${OUT_DIR}/fastas/${FILE_ID}_junctions_reg.fa ${PIPELINE_DIR}/index

# remove the temp fastas created along the way (files will be kept and should be manually removed)
#rm ${OUT_DIR}/${FILE_ID}.fa
#rm -r ${OUT_DIR}/fastas/

# create scrambled junction bowtie2 index in index directory
echo "bowtie2-build ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled.fa ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled"
bowtie2-build ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled.fa ${PIPELINE_DIR}/index/${FILE_ID}_junctions_scrambled

# create linear junction bowtie2 index in index directory 
echo "bowtie2-build ${PIPELINE_DIR}/index/${FILE_ID}_junctions_reg.fa ${PIPELINE_DIR}/index/${FILE_ID}_junctions_reg"
bowtie2-build ${PIPELINE_DIR}/index/${FILE_ID}_junctions_reg.fa ${PIPELINE_DIR}/index/${FILE_ID}_junctions_reg

In the index directory, all below files should be provided

dmel_r6.17_genome.1.bt2
dmel_r6.17_genome.2.bt2
dmel_r6.17_genome.3.bt2
dmel_r6.17_genome.4.bt2
dmel_r6.17_genome.fasta
dmel_r6.17_genome.rev.1.bt2
dmel_r6.17_genome.rev.2.bt2
dmel_r6.17_junctions_reg.1.bt2l
dmel_r6.17_junctions_reg.2.bt2l
dmel_r6.17_junctions_reg.3.bt2l
dmel_r6.17_junctions_reg.4.bt2l
dmel_r6.17_junctions_reg.fa
dmel_r6.17_junctions_reg.rev.1.bt2l
dmel_r6.17_junctions_reg.rev.2.bt2l
dmel_r6.17_junctions_scrambled.1.bt2l
dmel_r6.17_junctions_scrambled.2.bt2l
dmel_r6.17_junctions_scrambled.3.bt2l
dmel_r6.17_junctions_scrambled.4.bt2l
dmel_r6.17_junctions_scrambled.fa
dmel_r6.17_junctions_scrambled.rev.1.bt2l
dmel_r6.17_junctions_scrambled.rev.2.bt2l
dmel_r6.17_ribosomal.1.bt2
dmel_r6.17_ribosomal.1.ebwt
dmel_r6.17_ribosomal.2.bt2
dmel_r6.17_ribosomal.2.ebwt
dmel_r6.17_ribosomal.3.bt2
dmel_r6.17_ribosomal.3.ebwt
dmel_r6.17_ribosomal.4.bt2
dmel_r6.17_ribosomal.4.ebwt
dmel_r6.17_ribosomal.rev.1.bt2
dmel_r6.17_ribosomal.rev.1.ebwt
dmel_r6.17_ribosomal.rev.2.bt2
dmel_r6.17_ribosomal.rev.2.ebwt
dmel_r6.17_transcriptome.1.bt2
dmel_r6.17_transcriptome.1.ebwt
dmel_r6.17_transcriptome.2.bt2
dmel_r6.17_transcriptome.2.ebwt
dmel_r6.17_transcriptome.3.bt2
dmel_r6.17_transcriptome.3.ebwt
dmel_r6.17_transcriptome.4.bt2
dmel_r6.17_transcriptome.4.ebwt
dmel_r6.17_transcriptome.rev.1.bt2
dmel_r6.17_transcriptome.rev.1.ebwt
dmel_r6.17_transcriptome.rev.2.bt2
dmel_r6.17_transcriptome.rev.2.ebwt

@Vivianstats
Copy link

Thank you very much for your reply!

I‘m using the mouse indexes provided by the KNIFE authors and I have all the files in the index/ directory. But the program still complains. Maybe I should try building a new index on my own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants