Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different interaction counts for different URIs of the same compound #376

Open
danidi opened this issue Nov 11, 2016 · 14 comments
Open

Different interaction counts for different URIs of the same compound #376

danidi opened this issue Nov 11, 2016 · 14 comments

Comments

@danidi
Copy link

danidi commented Nov 11, 2016

Does the /pathways/interactions/byEntity/count API call use the IMS? The two examples mentioned here are connected in the IMS, but retrieve different counts.

@egonw
Copy link
Member

egonw commented Nov 11, 2016

"You spying Basterds"? :)

@egonw
Copy link
Member

egonw commented Nov 11, 2016

@danidi yeah, I'm thinking in that direction too... I will explore this before the next MSCPiLS meeting this Thursday...

Oh, BTW, I check the map/ function in the API, and there both are given as "equivalent"... but, yes, I think too it must have to do with lenses not correctly used or so...

@danidi
Copy link
Author

danidi commented Nov 11, 2016

I guess one of your students ;)
Thank you for looking into it!

@randykerber
Copy link
Member

7 months ago the counts were 163 and 279. Now they are 326 and 489.

For the first query the set of URIs inserted into the SPARQL query is:

 <http://www.hmdb.ca/metabolites/HMDB01206>
 <https://www.surechembl.org/chemical/SCHEMBL6086>
 <http://info.identifiers.org/hmdb/HMDB01206>
 <http://www.chemspider.com/Chemical-Structure.392413>
 <http://bio2rdf.org/chebi:15351>
 <http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
 <http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
 <http://www.conceptwiki.org/web-ws/concept/get?uuid=25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://ops.rsc-us.org/OPS1769651>
 <http://ops.rsc.org/Compounds/Get/1769651>
 <http://www.chemspider.com/392413>
 <http://www.conceptwiki.org/concept/25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://www.chemspider.com/Chemical-Structure.392413.html>
 <http://ops.rsc.org/OPS1769651>
 <http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>
 <http://ops.rsc.org/OPS1769651/rdf>
 <http://info.identifiers.org/chebi/CHEBI:15351>
 <http://purl.obolibrary.org/obo/CHEBI_15351>
 <http://www.chemspider.com/Chemical-Structure.392413.rdf>
 <http://info.identifiers.org/chemspider/392413>
 <http://identifiers.org/obo.chebi/CHEBI:15351>
 <http://purl.org/obo/owl/CHEBI#CHEBI_15351>
 <http://identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/chemspider/392413>
 <http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
 <http://rdf.chemspider.com/392413>
 <http://www.conceptwiki.org/concept/index/25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://identifiers.org/chebi/CHEBI:15351>

For the second query the list of URIs is:

<http://identifiers.org/wikipedia.en/Acetyl-CoA>
 <http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
 <http://www.chemspider.com/Chemical-Structure.392413.rdf>
 <http://purl.obolibrary.org/obo/CHEBI_15351>
 <http://purl.org/obo/owl/CHEBI#CHEBI_15351>
 <http://dbpedia.org/page/Acetyl-CoA>
 <http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=444493>
 <http://dbpedia.org/resource/Acetyl-CoA>
 <http://www.chemspider.com/Chemical-Structure.392413.html>
 <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/444493>
 <http://info.identifiers.org/kegg.compound/C00024>
 <http://identifiers.org/chemspider/392413>
 <http://info.identifiers.org/pubchem.compound/444493>
 <http://www.chemspider.com/Chemical-Structure.392413>
 <https://www.surechembl.org/chemical/SCHEMBL6086>
 <http://www.genome.jp/dbget-bin/www_bget?cpd:C00024>
 <http://identifiers.org/cas/72-89-9>
 <http://info.identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/obo.chebi/CHEBI:15351>
 <http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID444493>
 <http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
 <http://identifiers.org/pubchem.compound/444493>
 <http://ops.rsc.org/Compounds/Get/1769651>
 <http://info.identifiers.org/cas/72-89-9>
 <http://identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/kegg.compound/C00024>
 <http://info.identifiers.org/wikipedia.en/Acetyl-CoA>
 <http://en.wikipedia.org/wiki/Acetyl-CoA>
 <http://info.identifiers.org/chemspider/392413>
 <http://ops.rsc-us.org/OPS1769651>
 <http://www.chemspider.com/392413>
 <http://ops.rsc.org/OPS1769651/rdf>
 <http://info.identifiers.org/chebi/CHEBI:15351>
 <http://rdf.chemspider.com/392413>
 <http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
 <http://www.hmdb.ca/metabolites/HMDB01206>
 <http://www.kegg.jp/entry/C00024>
 <http://bio2rdf.org/cpd:C00024>
 <http://ops.rsc.org/OPS1769651>
 <http://bio2rdf.org/chebi:15351>
 <http://identifiers.org/chebi/CHEBI:15351>
 <http://commonchemistry.org/ChemicalDetail.aspx?ref=72-89-9>
 <http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>

@egonw
Copy link
Member

egonw commented Jun 24, 2017

Yes, the problem seems to be in the fact that the IMS instances do not properly handle directionality it seems... since both input IRIs are equivalent (the IMS says so), it should not matter which one you start with and you should get the same number of mappings.

@egonw
Copy link
Member

egonw commented Jun 24, 2017

@Christian-B, is there anything you can think of why the two IRIs do not give the same number of matches?

@Christian-B
Copy link
Member

Without looking in any detail or at the particular example I think this may well be related to transative mappings and the choice of where to stop
Especially as most mappings are near mappings .

The IMS will not keep going back to the same type of URL
For example there are often the cases
A1 -> B2
B2 -> A3
A3 -> B4
B4 -> A5
The IMS has to stop somewhere otherwise you get A1 -> A5 which usually is incorrect.

So if the IMS is hit with one of the middle URLs (A3) in the above example it may return more results than given A1

As A3 may be close enough to A1 and A5 while they are not close enough to each other

===
This gets (at least when I was in OPS) even messier when the URLs in the chain point to slightly different types of things. Again the IMS has to choose when to stop transitivity,

@egonw
Copy link
Member

egonw commented Jun 24, 2017

@Christian-B, OK, that makes a lot of sense... do you have a script that calculates all transitive link sets, so that we can reproduce that?

PS. thanks for your quick response and your response in the first place!

@Christian-B
Copy link
Member

For speed all links in the IMS where loaded unidirection.
This allows only one side of the maping to be searched and index.

Most predicates where considered Bidirectional so each mapping was loaded twice.
But there was the abilty to handle unidirectional mappings.
This was not yet used when I left three years ago,

@Christian-B
Copy link
Member

Sorry Egon too long ago for me to remember.

@egonw
Copy link
Member

egonw commented Jun 24, 2017

Yeah, no worries... but I had to ask :)

@danidi
Copy link
Author

danidi commented Jun 24, 2017

There seem to be some mappings from HMDB to other sources, e.g. KEGG (http://alpha.openphacts.org:3004/QueryExpander/mappingSet/189), which are not created via the CRS. If HMDB is no allowed middle source for transitive calculation (not sure where to check that), this could explain why you find these additional mappings only when you start with HMDB directly.
I'm assuming that the KEGG URIs are used in several pathways, so this could make a difference in the pathway counts.

@egonw
Copy link
Member

egonw commented Jun 24, 2017

We're working on making proper links sets for compounds in pathways... @valt is working on (or finished) parsing the WikiPathways SDF so that we can drop the HMDB link sets.

@egonw egonw removed their assignment Sep 15, 2018
@egonw
Copy link
Member

egonw commented Sep 15, 2018

There are multiple issues now... this bug depends on a redevelopment of a streamlined data loading pipeline (well, redeveloped is likely not the right word: Paul tried to put this on the agenda, but it never was prioritized...). For now, I'll unassign myself, as I cannot do much to fix this at this moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants