Link to correct companion fasta file #2

hiraksarkar · 2017-01-03T23:07:48Z

To simulate the reads, i.e to execute this command,
~/RSEM-1.2.29/bin/rsem-prepare-reference --bowtie2 --bowtie2-path ~/bowtie2-2.2.4 ~/annotation/refseq_hg19_unique.fa ~/tra2_simulated_reads/rsem_reference

If I ran the above with an added --gtf option and the gtf given here , then it failed with an error According to the GTF file given, transcript NR_073460 has exons from different orientations.

I tried to generate a fasta file with the refseq gtf file given here and hg19 genome file curated from ucsc site. But I am unable to reproduce the results.

Also the tpm files corresponding to simulated dataset contains 48606 transcripts, where as the gtf file contains 56443 transcripts.

The link to the correct pair (genome fasta, gtf) or link to the transcript file would be great.

The text was updated successfully, but these errors were encountered:

JLTrincado · 2017-01-09T14:10:39Z

Hi Hirak,

thanks again for your interest.

We didn't try to run RSEM with the gtf annotation, so we didn't realize this error.
We obtained the RefSeq annotation files (fasta and gtf) from the UCSC genome browser: https://genome.ucsc.edu/cgi-bin/hgTables

Since Salmon/Sailfish needs the fasta transcriptome for the alignment, we filter from the fasta all the duplicated ids, leaving just unique ids, and just transcripts falling in chr1-22, X and Y. The resulting file is refseq_hg19_unique.fa. So, for downstream analysis we have used this same annotation. I have uploaded it here (https://github.com/comprna/SUPPA_supplementary_data/blob/master/annotation/refseq_hg19_unique.fa.gz)

I tried today to generate this same table downloading again the original info from UCSC, but now there are more transcripts. It is possible that, since the last time I created this table, they have updated this info. I have uploaded the code for generating this file, if you are interested (https://github.com/comprna/SUPPA/blob/master/scripts/format_unique_fasta_RefSeq_annotation.py). In spite of the difference in the number, I would hope that the significant changes that we have generated for the 317 cassete exons for the benchmark will be similar no matter which version of the hg19 refseq annotation are you using. But, for the sake of reproducibility, I would use the initial ones (refseq_hg19_unique.fa).

Let me know if you are able to find the same results with this table,
Thanks again.

hiraksarkar · 2017-01-10T03:54:02Z

Hi Juanlu,

Thanks a lot for the file. I completely understand the frustration with reference change, which is really haphazard.

I posted an error message a while ago. That was something I did wrong.. so never mind, thanks again for the clean fasta.

I ran the whole pipeline with the clean fasta file. The pipeline is uploaded in this gist. I only experimented with 60M reads. But unfortunately the final dpsi file does not contain the 317 cassette events with significant pvalue (i.e. less than 0.05).

Can you please take a look at the gist file, and let me know I am not doing any mistake.

JLTrincado · 2017-01-10T11:45:27Z

Hi Hirak,

I've checked your pipeline and it seems to be ok.

We have generated a theoretical change in the relative abundance of the transcripts for 317 cassette events, but neither SUPPA nor any of the other methods are able to detect significance for all the events. With 60M reads, we observed that SUPPA generated a dPSI for all of them, but only significant for 302. I hope that you find the same number.

If you want to know the numbers for the rest of the executions, it's available in the supplementary data on our preprint (Table S3): http://biorxiv.org/content/early/2016/11/10/086876.

Thanks again

hiraksarkar · 2017-01-10T14:09:43Z

Hi Juanlu

we observed that SUPPA generated a dPSI for all of them, but only significant for 302. I hope that you find the same number.

What is the criteria for statistical significance here ? Is it pvalue < 0.05 !

JLTrincado · 2017-01-10T17:05:42Z

Yes, you're right. pvalue < 0.05.

hiraksarkar · 2017-01-10T17:07:28Z

Unfortunately none of my p-values are below 0.05

hiraksarkar · 2017-01-10T17:48:00Z

I emailed with the exact doubt and created a repo with the files. Hope that would help to figure out any mistakes I might be committing while running diffSplice.

EduEyras · 2018-05-30T10:04:19Z

Hi Hirak,

thanks for your message. The GTF for the RefSeq annotation is in general useless because the given the same ID, e.g. NR_073460, to different gene loci. This happens in general with non-standard chromosomes (DNA not mapped to chr1-22, chrX,Y), but it also happens in standard chromosomes. This is because RefSeq is originally a database of mRNA/protein sequences for genes, rather than a database of gene loci (like for instance Ensembl). As a consequence, when they map RefSeq sequences to the genome, they have multimappers, due to paralogy.

Removing duplicates from the Fasta file is a solution for that.
When working with the GTF, we also usually run the option -p "pool genes" with suppa.py generateEvents. This will first identify gene loci independently of the RefSeq label, and then will generate events per gene loci, and will solve this issue.

The script make_genes.pl that I put here https://github.com/comprna/humming does the same clustering but in a stand alone way. It will give you a GTF that should not give you the same type of error.

I hope this helps
best

Eduardo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link to correct companion fasta file #2

Link to correct companion fasta file #2

hiraksarkar commented Jan 3, 2017 •

edited

Loading

JLTrincado commented Jan 9, 2017 •

edited

Loading

hiraksarkar commented Jan 10, 2017 •

edited

Loading

JLTrincado commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017 •

edited

Loading

JLTrincado commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017 •

edited

Loading

EduEyras commented May 30, 2018

Link to correct companion fasta file #2

Link to correct companion fasta file #2

Comments

hiraksarkar commented Jan 3, 2017 • edited Loading

JLTrincado commented Jan 9, 2017 • edited Loading

hiraksarkar commented Jan 10, 2017 • edited Loading

JLTrincado commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017 • edited Loading

JLTrincado commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017

hiraksarkar commented Jan 10, 2017 • edited Loading

EduEyras commented May 30, 2018

hiraksarkar commented Jan 3, 2017 •

edited

Loading

JLTrincado commented Jan 9, 2017 •

edited

Loading

hiraksarkar commented Jan 10, 2017 •

edited

Loading

hiraksarkar commented Jan 10, 2017 •

edited

Loading

hiraksarkar commented Jan 10, 2017 •

edited

Loading