tra2 analysis #3

beretta · 2018-05-30T09:51:45Z

I'm trying to replicate the analysis on the tra2 datasets with suppa.
I run the commands in the "commands.txt" file, but I've got an error in the psiPerEvent step.
Here's the command:
suppa.py psiPerEvent -i ./annotation/hg19_ensembl_events_all_events.ioe -e ./quantification/iso_tpm.txt -o ./quantification/events
Here's the error:
ERROR:psiCalculator:transcript ENST00000594223 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51208333:51208444-51214200:-.
ERROR:psiCalculator:transcript ENST00000596004 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599992 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599617 not found in the "expression file".

As a result, the events.psi output file contains all NA values.
We used annotation and fasta sequences in the "annotation" folder.

EduEyras · 2018-05-30T10:06:21Z

Hi Stefano,
thanks for your message. This must be because the IDs are not exactly the same. In one of the files the IDs may contain the version (e.g. ENST00000599992.1 ) but not in the other. Is that the case?
E.

beretta · 2018-05-30T10:19:43Z

Thanks for the fast reply.
Unfortunately, this is not the case.
I started from the two files from this repository:

annotation/hg19_EnsenmblGenes_sequence_ensenmbl.fasta.gz for the FASTA sequences of the transcripts;
annotation/Homo_sapiens.GRCh37.75.formatted.gtf.gz for the annotation.
They seem to contain the same IDs.
The problem could be produced by the quantification of Salmon, since the iso_tpm.txt file does not include some IDs.
Could you help me in understanding the reason?
I'm using salmon v0.9.1.

EduEyras · 2018-05-30T10:31:00Z

In that case you should see NA's only for some of the events but not all. It could be that the IDs for one file include those from pseudogenes and/or non-standard chromosomes (other than autosomes and sex chromosomes) but not the other. In that case this is expected, and you should still see values for PSIs. E.

…

On Wed, May 30, 2018 at 12:19 PM, Stefano Beretta ***@***.***> wrote: Thanks for the fast reply. Unfortunately, this is not the case. I started from the two files from this repository: - annotation/hg19_EnsenmblGenes_sequence_ensenmbl.fasta.gz for the FASTA sequences of the transcripts; - annotation/Homo_sapiens.GRCh37.75.formatted.gtf.gz for the annotation. They seem to contain the same IDs. The problem could be produced by the quantification of Salmon, since the iso_tpm.txt file does not include some IDs. Could you help me in understanding the reason? I'm using salmon v0.9.1. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMWVB2seqX2A908iJAOq9i6whcws8nOHks5t3nJDgaJpZM4US9Ie> .

-- Dr E Eyras ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 E08003 Barcelona, Spain Fax: +34 93 316 0550 http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

beretta · 2018-05-30T10:43:51Z

I thought that some of them could be missing, too.
But, I checked the psi file and all the entries in all the samples are NA.
I report, as an example, gene ENSG00000003137.
Here's the hg19_ensembl_events_all_events.ioe file:
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- ENST00000412253 ENST00000412253,ENST00000546307
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- ENST00000412253 ENST00000412253,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- ENST00000412253 ENST00000412253,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- ENST00000461519 ENST00000546307,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- ENST00000461519 ENST00000461519,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- ENST00000001146 ENST00000474509,ENST00000546307,ENST00000001146

And the resulting events.psi file:
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- NA NANA NA NA NA
ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- NA NA NA NANA NA

EduEyras · 2018-05-30T10:46:37Z

Hi, thanks for the info. The expression file has to contain only the transcript ID that is reported in the event (ioe) file, otherwise the link cannot be made. Thus, it should have "ENST00000001146" rather than "ENSG00000003137|ENST00000001146|CYP26B1|protein_coding|protein_coding" It could be anything, but it should be the same one (and unique). Perhaps this is not clear enough from the documentation? I hope this helps! best E.

…

On Wed, May 30, 2018 at 12:43 PM, Stefano Beretta ***@***.***> wrote: I thought that some of them could be missing, too. But, I checked the psi file and all the entries in all the samples are NA. I report, as an example, gene ENSG00000003137. Here's the hg19_ensembl_events_all_events.ioe file: chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- ENST00000412253 ENST00000412253,ENST00000546307 chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- ENST00000412253 ENST00000412253,ENST00000474509 chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- ENST00000412253 ENST00000412253,ENST00000461519 chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- ENST00000461519 ENST00000546307,ENST00000461519 chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- ENST00000461519 ENST00000461519,ENST00000474509 chr2 ENSG00000003137 ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- ENST00000001146 ENST00000474509,ENST00000546307,ENST00000001146 Here's the iso_tpm.txt file: ENSG00000003137|ENST00000001146|CYP26B1|protein_coding|protein_coding 0.266260 0.717471 0.530288 0.379208 0.265172 0.148014 ENSG00000003137|ENST00000412253|CYP26B1|protein_coding|protein_coding 2.636286 2.131173 2.585867 1.813580 2.227969 2.228912 ENSG00000003137|ENST00000461519|CYP26B1|protein_coding|protein_coding 1.890367 2.105222 2.056620 2.039359 1.757142 1.462679 ENSG00000003137|ENST00000474509|CYP26B1|protein_coding|protein_coding 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ENSG00000003137|ENST00000546307|CYP26B1|protein_coding|protein_coding 0.000000 0.000008 0.329767 0.000000 0.000000 0.000000 And the resulting events.psi file: ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- NA NANA NA NA NA ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- NA NANA NA NA NA ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- NA NANA NA NA NA ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- NA NANA NA NA NA ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- NA NANA NA NA NA ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- NA NA NA NANA NA — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMWVB7FIKRKaFWEe9shIQNalxfYR9myLks5t3nfngaJpZM4US9Ie> .

-- Dr E Eyras ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 E08003 Barcelona, Spain Fax: +34 93 316 0550 http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

beretta · 2018-05-30T14:21:37Z

Thanks for the suggestion.
I actually tried to change the expression file and rerun the psi step, but it wasn't working.
Anyway, I changed the fasta header of the transcripts according to what you suggested and rerun all the pipeline, and it worked.
Thanks.

EduEyras · 2018-05-30T14:24:50Z

Great! Thanks for letting us know. It is strange that just changing the first column of the expression file did not work. Perhaps the headers (sample names) were missing? E.

…

On Wed, May 30, 2018 at 4:21 PM, Stefano Beretta ***@***.***> wrote: Thanks for the suggestion. I actually tried to change the expression file and rerun the psi step, but it wasn't working. Anyway, I changed the fasta header of the transcripts according to what you suggested and rerun all the pipeline, and it worked. Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMWVB2TucznD9_JE0uBw3R4-Jy0lBU8Kks5t3qr8gaJpZM4US9Ie> .

-- Dr E Eyras ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 E08003 Barcelona, Spain Fax: +34 93 316 0550 http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

beretta · 2018-05-30T14:33:29Z

That one was of my first attempts, so I probably messed up something else...
The result was a psi file with just one column instead of one for each sample.
Thanks again for the support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tra2 analysis #3

tra2 analysis #3

beretta commented May 30, 2018

EduEyras commented May 30, 2018

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018

tra2 analysis #3

tra2 analysis #3

Comments

beretta commented May 30, 2018

EduEyras commented May 30, 2018

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018

EduEyras commented May 30, 2018 via email

beretta commented May 30, 2018