-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to add more identifiers to GeneProducts in polish
#53
Comments
We already stumbled upon this in another issue (relates to #52), I think that working with the stuff that is implemented in |
Overview on Identifiers of interest:
|
If |
- Added filtering for VMH to BiGG identifier mapping - Added filtering for NaN values in annotations - Added additional annotation of GeneProducts with KEGG, UniProt & RefSeq identifiers - Improved handling of RefSeq identifiers
As I had already written a script to enhance the GeneProduct annotations here I decided not to create a table for the annotation enhancement but integrated my script into refineGEMs. The adjustments were tested on two VMH models created with AGORA2 and three models which were previously polished with refineGEMs' former |
The changes described here are all merged into |
Feature:
Extend the
polish
module to add KEGG gene, UniProt as well as RefSeq identifiers.Implementation:
config.yaml
(E.g. For Staphylococcus haemolyticus JCSC1435 this would besha
.)→ From this code & the locus tags of the model the KEGG gene identifier can be obtained
Retrieve ncbiprotein identifier additionally from KEGG & Compare ncbiprotein identifier from KEGG with the one in the model→ No mapping from KEGG Gene to NCBI Protein identifier possible
.gff
file (Findable at the Genome Assembly site of NCBI, For example implementation see:get_ids_from_gff
)Maybe, also get the name & locus tag from the→ Not possible for lab_strain.gff
file 🤔Get UniProt identifiers with RefSeq identifiers→ Result of
u = UniProt(), u.mapping('RefSeq_Protein', 'UniProtKB', refseq_id)
yields no feasible results. Comparing this result with the result from the KEGG mapping for a protein of Staphylococcus haemolyticus JCSC1435 showed that the resulting identifier is not the same.The text was updated successfully, but these errors were encountered: