-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathsequence.tex
50 lines (40 loc) · 3.32 KB
/
sequence.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
\subsection{Sequence}
\label{sec:Sequence}
The purpose of the \sbol{Sequence} class is to represent the primary structure of a \sbol{Component} object and the manner in which it is encoded. This representation is accomplished by means of the \sbol{elements} property and \sbol{encoding} property (\ref{uml:sequence}).
\begin{figure}[ht]
\begin{center}
\includegraphics[scale=0.6]{uml/sequence}
\caption[]{Diagram of the \sbol{Sequence} class and its associated properties.}
\label{uml:sequence}
\end{center}
\end{figure}
\subparagraph{The \sbolheading{elements} property}
\label{sec:elements}
The \sbol{elements} property is an OPTIONAL \sbol{String} of characters that represents the constituents of a biological or chemical molecule.
For example, these characters could represent the nucleotide bases of a molecule of DNA, the amino acid residues of a protein, or the atoms and chemical bonds of a small molecule.
If the \sbol{elements} property is not set, then it means the particulars of this \sbol{Sequence} have not yet been determined.
\subparagraph{The \sbolheading{encoding} property}
\label{sec:encoding}
The \sbol{encoding} property has a data type of \sbol{IRI}, and is OPTIONAL unless \sbol{elements} is set, in which case it is REQUIRED.
This property MUST indicate how the \sbol{elements} property of a \sbol{Sequence} are formed and interpreted.
The \sbol{encoding} property SHOULD respectively contain a \sbol{IRI} identifying from the textual format (\url{https://identifiers.org/edam:format_2330}) branch of the EDAM ontology.
For example, the \sbol{elements} property of a \sbol{Sequence} with an IUPAC DNA encoding property MUST contain characters that represent nucleotide bases, such as {\tt a}, {\tt t}, {\tt c}, and {\tt g}. The \sbol{elements} property of a \sbol{Sequence} with a Simplified Molecular-Input Line-Entry System (SMILES) encoding, on the other hand, MUST contain characters that represent atoms and chemical bonds, such as {\tt C}, {\tt N}, {\tt O}, and {\tt =}.
\ref{tbl:sequence_encodings} contains a partial list of possible \sbol{IRI} values for the \sbol{encoding} property.
These terms are organized by the type of \sbol{Component} (see \ref{tbl:component_types}) that typically refer to a \sbol{Sequence} with such an \sbol{encoding}.
It is RECOMMENDED that the encoding property of a Sequence contains a IRI from \ref{tbl:sequence_encodings}.
When the \sbol{encoding} of a \sbol{Sequence} is well described by one of the \sbol{IRI}s in \ref{tbl:sequence_encodings}, it MUST contain that \sbol{IRI}.
%A Summary of letters for nucleic acids and aminoacids
\begin{table}[ht]
\begin{edtable}{tabular}{lll}
\toprule
\textbf{Encoding} & \textbf{URL} & \textbf{Component Type} \\
\midrule
IUPAC DNA, RNA & \url{https://identifiers.org/edam:format_1207} & DNA, RNA \\
IUPAC Protein & \url{https://identifiers.org/edam:format_1208} & Protein\\
InChI & \url{https://identifiers.org/edam:format_1197} & Simple Chemical \\
SMILES & \url{https://identifiers.org/edam:format_1196} & Simple Chemical \\
\bottomrule
\end{edtable}
\caption{\sbol{URL}s for specifying the \sbol{encoding} property of a \sbol{Sequence}, organized by the type of \sbol{Component} (see \ref{tbl:component_types}) that typically refer to a \sbol{Sequence} with such an \sbol{encoding}.}
\label{tbl:sequence_encodings}
\end{table}