-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new function getUnichemSources() with optional parameter all_colu…
…mns. Modify queryUnichem() to queryUnichemCompound(). Update function and variable names for consistency. Update documentation and examples.
- Loading branch information
Showing
6 changed files
with
154 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
title: "Querying Unichem Database" | ||
author: | ||
- name: Jermiah Joseph, Shahzada Muhammad Shameel Farooq, and Christopher Eeles | ||
output: | ||
BiocStyle::html_document: | ||
self_contained: yes | ||
toc: true | ||
toc_float: true | ||
toc_depth: 2 | ||
code_folding: show | ||
date: "`r doc_date()`" | ||
package: "`r pkg_ver('AnnotationGx')`" | ||
vignette: > | ||
%\VignetteIndexEntry{Querying Unichem Database} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
<style type="text/css"> | ||
.main-container { | ||
max-width: 1800px; | ||
margin-left: auto; | ||
margin-right: auto; | ||
} | ||
</style> | ||
|
||
```{r setup, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html | ||
) | ||
``` | ||
|
||
|
||
# Introduction to the Unichem API | ||
|
||
The UniChem database provides a publicly available REST API for | ||
programmatic retrieval of mappings from standardized structural compound | ||
identifiers to unique compound IDs across a range of large online | ||
cheminformatic databases such as PubChem, ChEMBL, DrugBank and many more. | ||
The service accepts POST requests to two different end-points: | ||
`/compound` and `/connectivity`. Both endpoints accept query parameters via the | ||
POST body in JSON format. The `/compound` API returns exact matches for the | ||
queried compound, while the `/connectivity` API uses layers of the International | ||
Chemical Identifier (InChI) of the query compound to return exact matches as | ||
well as structurally related compounds such as isomers, salts, ionizations | ||
and more. | ||
[@UniChemBeta; @chambersUniChemUnifiedChemical2013] | ||
|
||
The functions in `AnnotationGx` have been designed to allow package users to | ||
easily query UniChem resources without any pre-existing knowledge of | ||
HTTP requests or the API specifications. In doing so we hope to provide an | ||
R native interface for mapping between various cheminformatic databases, | ||
accessible to anyone familar with using R functions! | ||
|
||
```{r load_pkg_example} | ||
library(AnnotationGx) | ||
``` | ||
|
||
|
||
# Available Databases | ||
|
||
To see a table of database identifiers available via UniChem, you can call | ||
the `getUniChemSources` function. | ||
By default, just the database shortname ("Name") and UniChem's ID for it ("SourceID") columns | ||
are returned. | ||
To return all columns, pass the `all_columns = TRUE` argument | ||
|
||
```{r get_sources_short_echo} | ||
getUnichemSources() | ||
``` | ||
|
||
When mapping using the `queryUnichemCompound` function, these are the sources that can be used from, | ||
and the databases to which the compound mappings will be returned. | ||
|
||
# Querying UniChem Compound API | ||
|
||
The `queryUnichemCompound` function allows you to query the UniChem Compound API | ||
to retrieve mappings for a given compound identifier. The function takes two mandatory arguments. | ||
The first is the `compound` argument which is the compound identifier to be queried. | ||
The second is the `type` argument which is the type of compound identifier to search for. | ||
Options are "uci", "inchi", "inchikey", and "sourceID". | ||
The `sourceID` argument is optional and is only required if the `type` argument is "sourceID". | ||
|
||
The function returns a list of: | ||
|
||
1. "External_Mappings" `data.table` containing the mapping to other Databases with the following headings: | ||
1. "compoundID" `character` The compound identifier | ||
2. "Name" `character` The name of the database | ||
3. "NameLong" `character` The long name of the database | ||
4. "SourceID" `character` The UniChem Source ID | ||
5. "sourceURL" `character` The URL of the source | ||
2. "UniChem_Mappings" `list` of the following six mappings: | ||
1. "UCI" `character` The UniChem Identifier | ||
2. "InchiKey" `character` The InChIKey | ||
3. "Inchi" `character` The InChI | ||
4. "formula" `character` The molecular formula | ||
5. "connections" `character` connection representation "1-6(10)13-8-5-3-2-4-7(8)9(11)12" | ||
6. "hAtoms" `character` hydrogen atom connections "2-5H,1H3,(H,11,12)" | ||
|
||
|
||
#### Example Searching using `uci` (UniChem Identifier) | ||
Note: This type of query requires you to know the UniChem Identifier for the compound. | ||
|
||
```{r uci query} | ||
queryUnichemCompound(compound = "161671", type = "uci") | ||
``` | ||
|
||
|