Skip to content

Commit

Permalink
Merge pull request #9 from jen-dfci/main
Browse files Browse the repository at this point in the history
Merge New Manual Edits to Main
  • Loading branch information
jen-dfci authored Apr 2, 2024
2 parents 7b68b94 + 9bea96b commit 0bf6adb
Show file tree
Hide file tree
Showing 38 changed files with 394 additions and 146 deletions.
Binary file modified .DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion access_controlled/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ order: 1000

# Access-Controlled Data

Access-controlled HTAN data requires dbGaP access approval for study [phs002371](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002371.v3.p1), and is currently only available via the National Cancer Institute's Cancer Data Services (CDS).
Access-controlled HTAN data requires dbGaP access approval for study [phs002371](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002371.v3.p1), and is currently only available via the [National Cancer Institute's Cancer Data Services (CDS)](https://dataservice.datacommons.cancer.gov/#/home).
38 changes: 38 additions & 0 deletions addtnl_info/RFC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
order: 996
---

# The RFC Process and Data Model Changes

## RFC Overview
The HTAN Data Model is expected to evolve with advances in science. This evolution is a community-driven, peer-reviewed process, where members of a working group will first assess established community data standards and create a request for comment (RFC) document soliciting community feedback.

The status of current RFCs is provided in the [RFC Overview](https://docs.google.com/document/d/1dJ7NUoVCtewdtny8bITwtWnzItB4IibL5kJO3ZNh0go/edit?usp=sharing) document. The RFC Overview can be used to:

- Get a sense of what is available in DCA.
- Get a sense of new assays being considered.
- Look at old RFCs & get a sense of past discussions/considerations.

!!! Note:
The links to specific RFC documents within the RFC Overview do **not** represent the final data model. Once an RFC is closed and an assay is available on the Data Curator App (DCA), the metadata template on the DCA represents the final data model. Details regarding the data model are also available on HTAN's [Data Standards page](https://humantumoratlas.org/standards) and HTAN's [data-models repository](https://github.com/ncihtan/data-models) on github.
!!!

## Data Model Changes
The following are requests which require changes to the Data Model and may result in the initiation of a RFC:

- New assay types which are expected to be used frequently by multiple centers.
- New metadata templates or additional required metadata fields which should be validated.

HTAN members should contact their [data liaison](../data_submission/Data_Liaisons.md) for help determining whether a Data Model change is needed and how to make a Data Model change request.

## RFC Process
Once a new assay type or a set of needed Data Model changes are identified, the following steps are taken:

1. **A working group is organized** by the Data Coordinating Center (DCC). As a part of this process, the following people are also designated:
* A **DCC Owner**, who is responsible for finalizing the RFC and overall accepting/rejecting/integrating community feedback. The DCC Owner is also the primary point of contact for the specified RFC.
* A **single DCC PI**, to monitor progress towards completion.
* One or more **Co-Authors** from one or more HTAN centers, to help draft the RFC. Representatives from each HTAN center help identify individuals at their center who can contribute to a particular RFC.
2. **A first draft of an RFC Google Document is created** based upon feedback from the working group.
3. **The RFC is open for public comment**. All HTAN members can provide suggestions by adding comments directly to the document.
4. After a designated period of time, the **RFC is closed**. Feedback from HTAN community is no longer accepted. The content of the RFC will be reflected in the respective version of the HTAN Data Model used for validating metadata files uploaded to the DCC.
5. **The metadata template is available on the [Data Curator App (DCA)](https://dca.app.sagebionetworks.org/).**
12 changes: 12 additions & 0 deletions addtnl_info/WG_internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
order: 997
---

# Working Groups and Internal Communications

Information regarding Network Working Groups and Internal Communications can be found on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990). Access to the HTAN Wiki is restricted to HTAN Members.

!!! Note

The HTAN Synapse Wiki page is restricted to HTAN members. Please contact [email protected] if you are a member of HTAN and need access to the wiki.
!!!
12 changes: 12 additions & 0 deletions addtnl_info/data_release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
order: 998
---

# Data Release

The Data Coordinating Center (DCC) prepares major data releases every 4-6 months. HTAN Centers are notified of the data submission deadline for an upcoming data release. After that deadline, the pre-release process involves a number of data processing and metadata verification steps. Data is released via the HTAN Data Portal, and then disseminated to various Cancer Data Research Commons (CRDC) nodes including Cancer Data Service (CDS) and the Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC) to enable download of controlled-access data and long-term cloud access

![The HTAN Data Release Process](../img/Data_release.svg)

Please see [HTAN Data Release Process](https://docs.google.com/document/d/15xvIbfyQmgbMD_uB2e0SwPFw67_AePB5YspF4dsilCA/edit#heading=h.tddsmkcn4p1p) for more information regarding the data release process.

2 changes: 2 additions & 0 deletions addtnl_info/index.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
label: Additional Information
order: 995
12 changes: 12 additions & 0 deletions addtnl_info/publications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
order: 999
---

# Submitting Publications

To facilitate data sharing and adherence to FAIR (Findability, Accessibility, Interoperability, and Reusability) principles, the HTAN portal provides links to specimen files used in publications. Currently, the HTAN Data Coordinating Center (DCC) faciliates this linking once provided the appropriate information by HTAN Centers. To submit publication information, HTAN Center's should contact Alex Lash at [email protected].

!!! *In order to support data sharing and public data access, the DCC encourages authors using HTAN data to either:*
* *use HTAN identifiers in their publication; or*
* *provide a lookup table in the publication to map publication identifiers to HTAN identifiers.*
!!!
21 changes: 21 additions & 0 deletions addtnl_info/tnps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
order: 995
---

# Trans-Network Projects (TNPs)
Trans-Network Projects are multi-center projects created to facilitate collaborative research. Examples include cross-testing experimental and analytical protocols, exchange of personnel to disseminate SOPs or pursuit of additional HTAN critical methods or technologies. Specific information about each TNP is available on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990) for HTAN members.

!!! Note

The HTAN Synapse Wiki page is restricted to HTAN members. Please contact [email protected] if you are a member of HTAN and need access to the wiki.
!!!


Current Trans-Network Projects

| Code | Name | Description |
|------|------|-------------|
| HTA13 | TNP SARDANA | The **S**h**a**red **R**epositories, **D**ata, **An**alysis and **A**ccess TNP focuses on optimizing the repeatability, interpretability and accessibility of HTAN characterization methods and the data they generate. |
| HTA14 | TNP TMA | The **T**issue **M**icro**A**rray TNP extends the TNP SARDANA characterization and analytics methodologies for evaluation and validation to a large array of breast tumor TMA samples that provide a broad spectrum of disease states and subtypes. |
| HTA15 | TNP SRRS | The **S**tandardized **R**epository of **R**eference **S**pecimens TNP's mission is to assemble an extensive catalogue of cases from premalignant lesions, pre- and post-treatment tumor tissue and metastatic tumor tissue for protocol optimization and validation. |
| HTA16 | TNP CASI | The goal of the **C**ell **A**nnotations and **S**ignatures **I**nitiative TNP is to provide robust and accurate tools for cell type annotation from single-cell data. |
18 changes: 18 additions & 0 deletions addtnl_info/tool_protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
order: 1000
---

# Tool and Protocol Curation

Computational tools developed or used to support HTAN research projects can be added to the HTAN tool catalog by filling out the tool curation form available on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990).


Information regarding how protocols are developed/shared is also available on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990).


!!! Note

The HTAN Synapse Wiki page is restricted to HTAN members. Please contact [email protected] if you are a member of HTAN and need access to the wiki.
!!!


File renamed without changes.
17 changes: 0 additions & 17 deletions data_model/biospecimens.md

This file was deleted.

29 changes: 0 additions & 29 deletions data_model/clinical.md

This file was deleted.

11 changes: 11 additions & 0 deletions data_model/data_standards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
order: 997
---

# Data Standards

This page is a place holder for a data standards page/set of data standards pages similar to [MC2 Center Data Model](https://mc2-center.github.io/data-models/). The HTAN version of the MC2 tables will include additional columns such as "required_if_component" and "required_if_value". Until the new pages are constructed, please see the information on the [Data Standards](https://humantumoratlas.org/standards) page of the HTAN Data Portal.

!!! Note
Once these pages are added, the [Data Standards](https://humantumoratlas.org/standards) page will be removed from the data portal. All links to "Data Standards" throughout this manual will need to be updated.
!!!
41 changes: 22 additions & 19 deletions data_model/identifiers.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,45 +15,48 @@ Research participants are identified with the following pattern:
<participant_id> ::= <htan_center_id>_integer
```

Where the `htan_center_id` is derived from the identifier prefix table below.

| HTAN Center ID | Pilot Project or Contact PI Institution |
| -------------- | --------------------------------------- |
| HTA1 | HTAPP Pilot Project |
| HTA2 | PCAPP Pilot Project |
| HTA3 | Boston University |
| HTA4 | Children's Hospital of Philadelphia |
| HTA5 | Dana-Farber Cancer Institute |
| HTA6 | Duke University |
| HTA7 | Harvard Medical School |
| HTA8 | Memorial Sloan Kettering Cancer Center |
| HTA9 | Oregon Health Sciences University |
| HTA10 | Stanford University |
| HTA11 | Vanderbilt University |
| HTA12 | Washington University |
| HTA13 | TNP SARDANA |
| HTA14 | TNP TMA |
Where the `htan_center_id` is the HTAN Center Prefix. (e.g. HTA1, HTA2) Please see [HTAN Centers](../overview/centers.md) for a full list of HTAN Center prefixes.


Derivative data includes anything derived from a research participant, including biospecimens such as samples, tissue blocks, slides, aliquots, analytes, and data files that result from assaying those biospecimens. These identifiers follow the pattern:

```
<derivative_entity_id> ::= <participant_id>_integer
```

For example, if research participant 1 within the CHOP project has provided three samples, you would have three HTAN IDs, such as:
For example, if research participant 1 within the CHOP project (HTA4) has provided three samples, you would have three HTAN IDs, such as:

```
HTA4_1_1
HTA4_1_3
HTA4_1_8
```
## Special Identifiers

If a single data file is generated from one of those samples, that file could have an HTAN ID such as:

```
HTA4_1_42
```

If a single data file is derived from more than one participant, the file identifier may contain a wildcard string e.g. ‘0000’, after the HTAN center identifier. For example:

```
HTA4_0000_1
HTA4_0000_2
HTA4_0000_3
```

If a data file is derived from an external control participant, the biospecimen and file identifiers will contain the string ‘EXT’ before the external control participant integer. For example:

```
HTA4_EXT1_1
HTA4_EXT2_2
HTA4_EXT3_3
```

More detailed information about HTAN Identifiers may be found in the [HTAN Identifiers SOP](https://docs.google.com/document/d/1podtPP8L1UNvVxx9_c_szlDcU1f8n7bige6XA_GoRVM/edit#heading=h.768a6pngjha3).

## ID to ID linkages

Note that the explicit linking of participants to biospecimens to assays is not encoded in the HTAN Identifier. Rather, the linking is encoded in explicit metadata elements (see [Relationship Model](relationships.md)).
Loading

0 comments on commit 0bf6adb

Please sign in to comment.