-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tra2 analysis #3
Comments
Hi Stefano, |
Thanks for the fast reply.
|
In that case you should see NA's only for some of the events but not all.
It could be that the IDs for one file include those from pseudogenes and/or
non-standard chromosomes (other than autosomes and sex chromosomes) but not
the other. In that case this is expected, and you should still see values
for PSIs.
E.
…On Wed, May 30, 2018 at 12:19 PM, Stefano Beretta ***@***.***> wrote:
Thanks for the fast reply.
Unfortunately, this is not the case.
I started from the two files from this repository:
- annotation/hg19_EnsenmblGenes_sequence_ensenmbl.fasta.gz for the
FASTA sequences of the transcripts;
- annotation/Homo_sapiens.GRCh37.75.formatted.gtf.gz for the
annotation.
They seem to contain the same IDs.
The problem could be produced by the quantification of Salmon, since
the iso_tpm.txt file does not include some IDs.
Could you help me in understanding the reason?
I'm using salmon v0.9.1.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMWVB2seqX2A908iJAOq9i6whcws8nOHks5t3nJDgaJpZM4US9Ie>
.
--
Dr E Eyras
ICREA Research Professor
Universitat Pompeu Fabra
PRBB, Dr Aiguader 88 Tel: +34 93 316 0502
E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ
http://www.researcherid.com/rid/L-1053-2014
http://regulatorygenomics.upf.edu/
|
I thought that some of them could be missing, too. Here's the iso_tpm.txt file: And the resulting events.psi file: |
Hi,
thanks for the info. The expression file has to contain only the transcript
ID that is reported in the event (ioe) file, otherwise the link cannot be
made.
Thus, it should have "ENST00000001146" rather than
"ENSG00000003137|ENST00000001146|CYP26B1|protein_coding|protein_coding"
It could be anything, but it should be the same one (and unique).
Perhaps this is not clear enough from the documentation?
I hope this helps!
best
E.
…On Wed, May 30, 2018 at 12:43 PM, Stefano Beretta ***@***.***> wrote:
I thought that some of them could be missing, too.
But, I checked the psi file and all the entries in all the samples are NA.
I report, as an example, gene ENSG00000003137.
Here's the hg19_ensembl_events_all_events.ioe file:
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:-
ENST00000412253 ENST00000412253,ENST00000546307
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:-
ENST00000412253 ENST00000412253,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:-
ENST00000412253 ENST00000412253,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:-
ENST00000461519 ENST00000546307,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:-
ENST00000461519 ENST00000461519,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:-
ENST00000001146 ENST00000474509,ENST00000546307,ENST00000001146
Here's the iso_tpm.txt file:
ENSG00000003137|ENST00000001146|CYP26B1|protein_coding|protein_coding
0.266260 0.717471 0.530288 0.379208 0.265172 0.148014
ENSG00000003137|ENST00000412253|CYP26B1|protein_coding|protein_coding
2.636286 2.131173 2.585867 1.813580 2.227969 2.228912
ENSG00000003137|ENST00000461519|CYP26B1|protein_coding|protein_coding
1.890367 2.105222 2.056620 2.039359 1.757142 1.462679
ENSG00000003137|ENST00000474509|CYP26B1|protein_coding|protein_coding
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSG00000003137|ENST00000546307|CYP26B1|protein_coding|protein_coding
0.000000 0.000008 0.329767 0.000000 0.000000 0.000000
And the resulting events.psi file:
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:-
NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:-
NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:-
NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:-
NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:-
NA NANA NA NA NA
ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- NA NA NA
NANA NA
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMWVB7FIKRKaFWEe9shIQNalxfYR9myLks5t3nfngaJpZM4US9Ie>
.
--
Dr E Eyras
ICREA Research Professor
Universitat Pompeu Fabra
PRBB, Dr Aiguader 88 Tel: +34 93 316 0502
E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ
http://www.researcherid.com/rid/L-1053-2014
http://regulatorygenomics.upf.edu/
|
Thanks for the suggestion. |
Great! Thanks for letting us know.
It is strange that just changing the first column of the expression file
did not work. Perhaps the headers
(sample names) were missing?
E.
…On Wed, May 30, 2018 at 4:21 PM, Stefano Beretta ***@***.***> wrote:
Thanks for the suggestion.
I actually tried to change the expression file and rerun the psi step, but
it wasn't working.
Anyway, I changed the fasta header of the transcripts according to what
you suggested and rerun all the pipeline, and it worked.
Thanks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMWVB2TucznD9_JE0uBw3R4-Jy0lBU8Kks5t3qr8gaJpZM4US9Ie>
.
--
Dr E Eyras
ICREA Research Professor
Universitat Pompeu Fabra
PRBB, Dr Aiguader 88 Tel: +34 93 316 0502
E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ
http://www.researcherid.com/rid/L-1053-2014
http://regulatorygenomics.upf.edu/
|
That one was of my first attempts, so I probably messed up something else... |
I'm trying to replicate the analysis on the tra2 datasets with suppa.
I run the commands in the "commands.txt" file, but I've got an error in the psiPerEvent step.
Here's the command:
suppa.py psiPerEvent -i ./annotation/hg19_ensembl_events_all_events.ioe -e ./quantification/iso_tpm.txt -o ./quantification/events
Here's the error:
ERROR:psiCalculator:transcript ENST00000594223 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51208333:51208444-51214200:-.
ERROR:psiCalculator:transcript ENST00000596004 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599992 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599617 not found in the "expression file".
As a result, the events.psi output file contains all NA values.
We used annotation and fasta sequences in the "annotation" folder.
The text was updated successfully, but these errors were encountered: