Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tra2 analysis #3

Open
beretta opened this issue May 30, 2018 · 8 comments
Open

tra2 analysis #3

beretta opened this issue May 30, 2018 · 8 comments

Comments

@beretta
Copy link

beretta commented May 30, 2018

I'm trying to replicate the analysis on the tra2 datasets with suppa.
I run the commands in the "commands.txt" file, but I've got an error in the psiPerEvent step.
Here's the command:
suppa.py psiPerEvent -i ./annotation/hg19_ensembl_events_all_events.ioe -e ./quantification/iso_tpm.txt -o ./quantification/events
Here's the error:
ERROR:psiCalculator:transcript ENST00000594223 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51208333:51208444-51214200:-.
ERROR:psiCalculator:transcript ENST00000596004 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599992 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000269671;SE:chrHG497_PATCH:51207980-51214200:51214279-51215098:-.
ERROR:psiCalculator:transcript ENST00000599617 not found in the "expression file".

As a result, the events.psi output file contains all NA values.
We used annotation and fasta sequences in the "annotation" folder.

@EduEyras
Copy link
Member

Hi Stefano,
thanks for your message. This must be because the IDs are not exactly the same. In one of the files the IDs may contain the version (e.g. ENST00000599992.1 ) but not in the other. Is that the case?
E.

@beretta
Copy link
Author

beretta commented May 30, 2018

Thanks for the fast reply.
Unfortunately, this is not the case.
I started from the two files from this repository:

  • annotation/hg19_EnsenmblGenes_sequence_ensenmbl.fasta.gz for the FASTA sequences of the transcripts;
  • annotation/Homo_sapiens.GRCh37.75.formatted.gtf.gz for the annotation.
    They seem to contain the same IDs.
    The problem could be produced by the quantification of Salmon, since the iso_tpm.txt file does not include some IDs.
    Could you help me in understanding the reason?
    I'm using salmon v0.9.1.

@EduEyras
Copy link
Member

EduEyras commented May 30, 2018 via email

@beretta
Copy link
Author

beretta commented May 30, 2018

I thought that some of them could be missing, too.
But, I checked the psi file and all the entries in all the samples are NA.
I report, as an example, gene ENSG00000003137.
Here's the hg19_ensembl_events_all_events.ioe file:
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- ENST00000412253 ENST00000412253,ENST00000546307
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- ENST00000412253 ENST00000412253,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- ENST00000412253 ENST00000412253,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- ENST00000461519 ENST00000546307,ENST00000461519
chr2 ENSG00000003137 ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- ENST00000461519 ENST00000461519,ENST00000474509
chr2 ENSG00000003137 ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- ENST00000001146 ENST00000474509,ENST00000546307,ENST00000001146

Here's the iso_tpm.txt file:
ENSG00000003137|ENST00000001146|CYP26B1|protein_coding|protein_coding 0.266260 0.717471 0.530288 0.379208 0.265172 0.148014
ENSG00000003137|ENST00000412253|CYP26B1|protein_coding|protein_coding 2.636286 2.131173 2.585867 1.813580 2.227969 2.228912
ENSG00000003137|ENST00000461519|CYP26B1|protein_coding|protein_coding 1.890367 2.105222 2.056620 2.039359 1.757142 1.462679
ENSG00000003137|ENST00000474509|CYP26B1|protein_coding|protein_coding 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSG00000003137|ENST00000546307|CYP26B1|protein_coding|protein_coding 0.000000 0.000008 0.329767 0.000000 0.000000 0.000000

And the resulting events.psi file:
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72371118:72371544:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374963:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72370133:72370213:72362548-72374760:72374991:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374963:- NA NANA NA NA NA
ENSG00000003137;AF:chr2:72362548-72371118:72371544:72362548-72374760:72374991:- NA NANA NA NA NA
ENSG00000003137;SE:chr2:72362548-72371118:72371342-72374760:- NA NA NA NANA NA

@EduEyras
Copy link
Member

EduEyras commented May 30, 2018 via email

@beretta
Copy link
Author

beretta commented May 30, 2018

Thanks for the suggestion.
I actually tried to change the expression file and rerun the psi step, but it wasn't working.
Anyway, I changed the fasta header of the transcripts according to what you suggested and rerun all the pipeline, and it worked.
Thanks.

@EduEyras
Copy link
Member

EduEyras commented May 30, 2018 via email

@beretta
Copy link
Author

beretta commented May 30, 2018

That one was of my first attempts, so I probably messed up something else...
The result was a psi file with just one column instead of one for each sample.
Thanks again for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants