Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 3.43 KB

README.md

File metadata and controls

26 lines (17 loc) · 3.43 KB

Proof of concept: bioschemas-protein-render

Example

See the prototype in action here with the default accession P12272 or with a different accession P05067. This is just a proof of concept so not all UniProt accessions will work, we have not checked for possible exceptions. We have tested with some human proteins with available 3D structures.

Details

One of the advantages of Bioschemas is the possibility of retrieving mark up from multiple complimentary sources in order to create a quick and short summary, i.e., infobox. In the protein case, this can be easily achieved as UniProt, InterPro and PDBe link and complement each other. Ideally a crawler would be used to retrieve information from these three sources. A Bioschemas crawler is under construction but not yet finalized; thus we use here a minimalistic approach to show how an infobox could work.

We have developed two web components capable of generating Bioschemas markup from web services. The first one, bioschemas-uniprot-adapter, serves Bioschemas markup for a UniProt entry, while the second, bioschemas-pdbe-adapter does it for a PDBe 3D structure entry. Both web components are put together by a third one, bioschemas-protein-render.

bioschemas-protein-render takes a hard-coded UniProt entry, P12272, and retrieves its mark up via bioschemas-uniprot-adapter. Any other valid UniProt accession is also possible via parameters, just add ?<accession> after the URL. It then takes from this markup the first PDBe 3D structure, and uses bioschemas-pdbe-adapter to get the corresponding PDBe 3D structure markup. It finally renders the JSON-LD for both.

A more elaborated infobox would render not the JSON-LD as it is, but would use it to create something more human-readable and visual-appealing summary. That would be the next step once a crawler is available (so the markup is directly taken from the web pages).

About

This work was conducted by the Bioschemas Protein working group led by Maria Martin, with participation from UniProt (Maria Martin, Leyla Garcia), InterPro (Rob Finn, Aurelièn Luciani and Gustavo Salazar) and PDBe (Sameer Verlanka and Joseph Anyango). The Protein working group aimed to define, prototype, and test using [schema.org] markup to represent protein sequences as well as their functional annotations and structures. The main goals included (i) definition of a use case and scope of the WP, (ii) definition of a draft schema.org data model for protein annotations involving relevant protein resources, and (iii) test and evaluation of this model including identification of pros and cons of using schema.org and Bioschemas.