Merge pull request #49 from bhklab/development

docs: Started AnnotationStandards documentation
bhklab · Apr 2, 2024 · 6f1e402 · 6f1e402
2 parents e412900 + 2b99e32
commit 6f1e402
Show file tree

Hide file tree

Showing 8 changed files with 1,016 additions and 3 deletions.
diff --git a/.github/ISSUE_TEMPLATE/issue_template.md b/.github/ISSUE_TEMPLATE/issue_template.md
@@ -41,7 +41,6 @@ options(width = 120)
 
 </details>
 
-- [ ] `BiocManager::valid()` is `TRUE`
 
 **Note**. To avoid potential issues with version mixing and reproducibility, do
 not install packages from `GitHub`.
@@ -56,3 +55,33 @@ Provide some additional context for the bug report. You may include web links
 * code inside a commit
 * code from an R package
 
+---
+name: Feature Request
+about: Suggest an idea for this project
+title: "[FEATURE] Brief description of the feature"
+labels: 'enhancement'
+assignees: ''
+
+---
+
+## Feature Description
+
+Please provide a clear and concise description of the feature you're proposing.
+
+## Problem it Solves
+
+Explain the problem that this feature addresses. Why is this feature necessary? How will it improve the project?
+
+## Alternatives Considered
+
+Describe any alternative solutions or features you've considered. Why were they not chosen?
+
+## Additional Context
+
+Provide any additional context about the feature request here. You may include:
+
+* Links to similar features in other projects
+* Screenshots/mockups, if applicable
+* Any other context or screenshots about the feature request
+
+**Note**: Please ensure your feature request is not already reported in the issues list before creating a new one.
diff --git a/.gitignore b/.gitignore
@@ -19,3 +19,4 @@ Treatment-Annotation*.Rmd
 ./*.csv
 CCLE_treatmentMetadata.csv
 AnnotationGx.code-workspace
+AnnotationGx.code-workspace
diff --git a/AnnotationGx.code-workspace b/AnnotationGx.code-workspace
@@ -8,6 +8,7 @@
         }
     ],
     "settings": {
-        "liveServer.settings.port": 5501
+        "liveServer.settings.port": 5501,
+        "liveServer.settings.multiRootWorkspaceName": "AnnotationGx"
     }
 }
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: AnnotationGx
 Title: AnnotationGx: A package for building, updating and querying an
     annotation database for pharmaco-genomic data
-Version: 0.0.0.9095
+Version: 0.0.0.9096
 Authors@R: c(
     person("Jermiah", "Joseph", role = c("aut", "cre"),
         email = "[email protected]"),

diff --git a/R/matchNested-methods.R → R/utils-matchNested.R b/R/matchNested-methods.R → R/utils-matchNested.R
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -118,5 +118,6 @@ articles:
 - title: Pipelines
   navbar: Pipeline Tutorials
   contents:
+  - articles/AnnotationStandards
   - articles/CTRP-Treatment-Annotation
 
diff --git a/inst/extdata/treatmentMetadata_annotated_pubchem_unichem_chembl.tsv b/inst/extdata/treatmentMetadata_annotated_pubchem_unichem_chembl.tsv
diff --git a/vignettes/articles/AnnotationStandards.Rmd b/vignettes/articles/AnnotationStandards.Rmd
@@ -0,0 +1,138 @@
+---
+title: "A Standard for Annotations"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{A Standard for Annotations}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{css, echo=FALSE}
+.main-point strong {
+  color: #4CAF50; /* Green color for the title */
+  font-size: 120%;  /*font-size: 120%;*/
+  font-weight: bold;  /*bold:*/
+}
+.main-point {
+  background-color: #f0f0f0;
+  border-left: 6px solid #4CAF50; /* Green border */
+  padding: 10px;
+  margin-bottom: 20px;
+}
+
+{css, echo=FALSE}
+.emphasis { /* Red color for emphasis */
+  font-weight: bold; 
+  color: #f44336;
+}
+```
+
+```{r setup}
+library(AnnotationGx)
+```
+
+
+# Introduction
+The goal of AnnotationGx is to provide the tools that may help annotate chemi- and bio-informatic data.
+While the package is still in its early stages, it already provides a number of functions that may be useful for the annotation of data.
+In the interest of standardizing the annotation process, we propose a standard for annotations that may be used in the future.
+
+
+# Starting Point
+
+The starting point of any annotation process might be a table with a number of columns or a list of identifiers that need to be annotated.
+
+For example, we might have a data frame with a column of cell line names that we would like to annotate with information about the cell lines or
+a list of drugs that we would like to annotate with information about the drugs.
+
+
+```{r}
+# "sample" refers to the cell line names
+data(CCLE_sampleMetadata)
+head(CCLE_sampleMetadata)
+
+
+# "treatment" refers to the drug names
+data(CCLE_treatmentMetadata)
+head(CCLE_treatmentMetadata)
+```
+
+## Generic names for classes of data
+
+The first standard is to use generic names for different classes so that the annotation process can be generalized.
+When referring to cell lines, patients, or other biological entities that are being studied, we should use the name "sample".
+When referring to drugs, chemicals, or other treatments that are being applied to the samples, we should use the name "treatment".
+
+<div class="main-point">
+  <strong>Standard 1</strong>
+  <p>
+    Use the name "sample" for biological entities and "treatment" for treatments.
+
+    Use the name "treatment" for drugs, chemicals, radiological treatments, etc that are being applied to the samples.
+  </p>
+</div>
+
+In the CCLE example data provided, the name of the data frames are `CCLE_sampleMetadata` and `CCLE_treatmentMetadata`, already following this standard.
+
+However, within the dataframes, the names of the columns are not standardized. `CCLE_treatmentMetadata` correctly identifies the column with the treatment 
+names as "treatmentid", but `CCLE_sampleMetadata` uses the varying names. 
+
+Before we rename the columns, we introduce the second standard for column names.
+
+## Standardized column names
+
+Throughout the annotation process, many sources might be used for generating metadata. 
+
+For example, in annotating treatments, one might use the DrugBank database, the PubChem database, and the ChEMBL database.
+
+For transparency and reproducibility, we should use standardized column names for the metadata that we collect from these sources.
+
+
+<div class="main-point">
+  <strong>Standard 2</strong>
+  <p>
+    Use the format <span class="emphasis">{SOURCE}.{COLUMN_NAME}</span> for column names in the metadata.
+  </p>
+</div>
+
+
+For example, if we are using the Pubchem and DrugBank database, we might have columns like "pubchem.CID", "pubchem.SMILES", "drugbank.ID", "drugbank.SMILES", etc.
+
+
+This also applies to the data we start with. Take for example the GDSC example data provided:
+
+```{R}
+head(GDSC_sampleMetadata)
+```
+
+The column names above follow both Standard 1 and Standard 2. It tells us that the data for the two columns comes from the GDSC database. 
+
+This is especially useful when we are combining data from multiple sources and they might have columns with the same name.
+Additionally, it provides a level of confidence for users trying to compare data from different sources. 
+
+## Example Annotated Data
+
+In the example below, we have a data frame annotating the treatments for the 4 datasets CCLE, GDSC, CTRP, and gCSI.
+
+
+
+```{R}
+treatmentMetadata <- data.table::fread(system.file("extdata", "treatmentMetadata_annotated_pubchem_unichem_chembl.tsv", package = "AnnotationGx"))
+
+# two drugs: Erlotinib and Tanespimycin
+str(treatmentMetadata[pubchem.CID %in% c("6505803", "176870"),])
+
+```
+
+We can see above how the dataset sources are named in the column names ('CCLE.treatmentid', 'GDSC.treatmentid', 'CTRP.treatmentid', 'gCSI.treatmentid').
+
+If a user wanted to get the InChiKey, they would use the "pubchem.InChiKey" column, and understand that these inchikeys are from the PubChem database.
+
+Similarly, they have access to mechanism_of_action data in the "chembl.mechanism_of_action" column, and understand that these mechanisms are from the ChEMBL database.