Deployed 3e16870 with MkDocs version: 1.5.3

COMBINE-lab · Feb 11, 2024 · 172958b · 172958b
1 parent a7f9db6
commit 172958b
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 5 deletions.
diff --git a/index.html b/index.html
@@ -420,11 +420,10 @@ <h3 id="basic-usage">Basic usage</h3>
 </code></pre>
 <p>The input should be a <code>bam</code> format file, with reads aligned using <a href="https://github.com/lh3/minimap2"><code>minimap2</code></a> against the <em>transcriptome</em>. That is, <code>oarfish</code> does not currently handle spliced alignment to the genome.  Further, the output alignments should be name sorted (the default order produced by <code>minimap2</code> should be fine). <em>Specifically</em>, <code>oarfish</code> relies on the existence of the <code>AS</code> tag in the <code>bam</code> records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read.</p>
 <h3 id="output">Output</h3>
-<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a directory (that will be created if it doesn't exist), under which the relevant output files will be placed.
-The output of <code>oarfish</code> constist of 2 files:</p>
+<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a path prefix (this prefix can contain the path separator character and if it refers to a directory that does not yeat exist, that directory will be created). Based on this path prefix, say <code>P</code>, <code>oarfish</code> will create 2 files:</p>
 <ul>
-<li><code>info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li>
-<li><code>quant.tsv</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li>
+<li><code>P.meta_info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li>
+<li><code>P.quant</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li>
 </ul>
 <h3 id="references">References</h3>
 <div class="footnote">

diff --git a/search/search_index.json b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"oarfish: transcript quantification from long-read RNA-seq data","text":""},{"location":"#basic-usage","title":"Basic usage","text":"<p><code>oarfish</code> is a program, written in <code>rust</code>, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. <code>oarfish</code> requires a sample of sequencing reads aligned to the transcriptome (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm.  </p> <p>It optionally employs many filters to help discard alignments that may reduce quantification accuracy.  Currently, the set of filters applied in <code>oarfish</code> are directly derived from the <code>NanoCount</code><sup>1</sup> tool; both the filters that exist, and the way their values are set (with the exception of the <code>--three-prime-clip</code> filter, which is not set by default in <code>oarfish</code> but is in <code>NanoCount</code>).</p> <p>Additionally, <code>oarfish</code> provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy.  The use of this coverage model is enabled with the <code>--model-coverage</code> flag.</p> <p>The usage can be provided by passing <code>-h</code> at the command line.</p> <pre><code>accurate transcript quantification from long-read RNA-seq data\n\nUsage: oarfish [OPTIONS] --alignments &lt;ALIGNMENTS&gt; --output &lt;OUTPUT&gt;\n\nOptions:\n      --quiet                    be quiet (i.e. don't output log messages that aren't at least warnings)\n      --verbose                  be verbose (i.e. output all non-developer logging messages)\n  -a, --alignments &lt;ALIGNMENTS&gt;  path to the file containing the input alignments\n  -o, --output &lt;OUTPUT&gt;          location where output quantification file should be written\n  -t, --threads &lt;THREADS&gt;        maximum number of cores that the oarfish can use to obtain binomial probability [default: 1]\n  -h, --help                     Print help\n  -V, --version                  Print version\n\nfilters:\n      --filter-group &lt;FILTER_GROUP&gt;\n          [possible values: no-filters, nanocount-filters]\n  -t, --three-prime-clip &lt;THREE_PRIME_CLIP&gt;\n          maximum allowable distance of the right-most end of an alignment from the 3' transcript end [default: 4294967295]\n  -f, --five-prime-clip &lt;FIVE_PRIME_CLIP&gt;\n          maximum allowable distance of the left-most end of an alignment from the 5' transcript end [default: 4294967295]\n  -s, --score-threshold &lt;SCORE_THRESHOLD&gt;\n          fraction of the best possible alignment score that a secondary alignment must have for consideration [default: 0.95]\n  -m, --min-aligned-fraction &lt;MIN_ALIGNED_FRACTION&gt;\n          fraction of a query that must be mapped within an alignemnt to consider the alignemnt valid [default: 0.5]\n  -l, --min-aligned-len &lt;MIN_ALIGNED_LEN&gt;\n          minimum number of nucleotides in the aligned portion of a read [default: 50]\n  -n, --allow-negative-strand\n          allow both forward-strand and reverse-complement alignments\n\ncoverage model:\n      --model-coverage  apply the coverage model\n  -b, --bins &lt;BINS&gt;     number of bins to use in coverage model [default: 10]\n\nEM:\n      --max-em-iter &lt;MAX_EM_ITER&gt;\n          maximum number of iterations for which to run the EM algorithm [default: 1000]\n      --convergence-thresh &lt;CONVERGENCE_THRESH&gt;\n          maximum number of iterations for which to run the EM algorithm [default: 0.001]\n  -q, --short-quant &lt;SHORT_QUANT&gt;\n          location of short read quantification (if provided)\n</code></pre> <p>The input should be a <code>bam</code> format file, with reads aligned using <code>minimap2</code> against the transcriptome. That is, <code>oarfish</code> does not currently handle spliced alignment to the genome.  Further, the output alignments should be name sorted (the default order produced by <code>minimap2</code> should be fine). Specifically, <code>oarfish</code> relies on the existence of the <code>AS</code> tag in the <code>bam</code> records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read.</p>"},{"location":"#output","title":"Output","text":"<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a directory (that will be created if it doesn't exist), under which the relevant output files will be placed. The output of <code>oarfish</code> constist of 2 files:</p> <ul> <li><code>info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li> <li><code>quant.tsv</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li> </ul>"},{"location":"#references","title":"References","text":"<ol> <li> <p>Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, https://doi.org/10.1093/nar/gkab1129 \u21a9</p> </li> </ol>"}]}
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"oarfish: transcript quantification from long-read RNA-seq data","text":""},{"location":"#basic-usage","title":"Basic usage","text":"<p><code>oarfish</code> is a program, written in <code>rust</code>, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. <code>oarfish</code> requires a sample of sequencing reads aligned to the transcriptome (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm.  </p> <p>It optionally employs many filters to help discard alignments that may reduce quantification accuracy.  Currently, the set of filters applied in <code>oarfish</code> are directly derived from the <code>NanoCount</code><sup>1</sup> tool; both the filters that exist, and the way their values are set (with the exception of the <code>--three-prime-clip</code> filter, which is not set by default in <code>oarfish</code> but is in <code>NanoCount</code>).</p> <p>Additionally, <code>oarfish</code> provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy.  The use of this coverage model is enabled with the <code>--model-coverage</code> flag.</p> <p>The usage can be provided by passing <code>-h</code> at the command line.</p> <pre><code>accurate transcript quantification from long-read RNA-seq data\n\nUsage: oarfish [OPTIONS] --alignments &lt;ALIGNMENTS&gt; --output &lt;OUTPUT&gt;\n\nOptions:\n      --quiet                    be quiet (i.e. don't output log messages that aren't at least warnings)\n      --verbose                  be verbose (i.e. output all non-developer logging messages)\n  -a, --alignments &lt;ALIGNMENTS&gt;  path to the file containing the input alignments\n  -o, --output &lt;OUTPUT&gt;          location where output quantification file should be written\n  -t, --threads &lt;THREADS&gt;        maximum number of cores that the oarfish can use to obtain binomial probability [default: 1]\n  -h, --help                     Print help\n  -V, --version                  Print version\n\nfilters:\n      --filter-group &lt;FILTER_GROUP&gt;\n          [possible values: no-filters, nanocount-filters]\n  -t, --three-prime-clip &lt;THREE_PRIME_CLIP&gt;\n          maximum allowable distance of the right-most end of an alignment from the 3' transcript end [default: 4294967295]\n  -f, --five-prime-clip &lt;FIVE_PRIME_CLIP&gt;\n          maximum allowable distance of the left-most end of an alignment from the 5' transcript end [default: 4294967295]\n  -s, --score-threshold &lt;SCORE_THRESHOLD&gt;\n          fraction of the best possible alignment score that a secondary alignment must have for consideration [default: 0.95]\n  -m, --min-aligned-fraction &lt;MIN_ALIGNED_FRACTION&gt;\n          fraction of a query that must be mapped within an alignemnt to consider the alignemnt valid [default: 0.5]\n  -l, --min-aligned-len &lt;MIN_ALIGNED_LEN&gt;\n          minimum number of nucleotides in the aligned portion of a read [default: 50]\n  -n, --allow-negative-strand\n          allow both forward-strand and reverse-complement alignments\n\ncoverage model:\n      --model-coverage  apply the coverage model\n  -b, --bins &lt;BINS&gt;     number of bins to use in coverage model [default: 10]\n\nEM:\n      --max-em-iter &lt;MAX_EM_ITER&gt;\n          maximum number of iterations for which to run the EM algorithm [default: 1000]\n      --convergence-thresh &lt;CONVERGENCE_THRESH&gt;\n          maximum number of iterations for which to run the EM algorithm [default: 0.001]\n  -q, --short-quant &lt;SHORT_QUANT&gt;\n          location of short read quantification (if provided)\n</code></pre> <p>The input should be a <code>bam</code> format file, with reads aligned using <code>minimap2</code> against the transcriptome. That is, <code>oarfish</code> does not currently handle spliced alignment to the genome.  Further, the output alignments should be name sorted (the default order produced by <code>minimap2</code> should be fine). Specifically, <code>oarfish</code> relies on the existence of the <code>AS</code> tag in the <code>bam</code> records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read.</p>"},{"location":"#output","title":"Output","text":"<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a path prefix (this prefix can contain the path separator character and if it refers to a directory that does not yeat exist, that directory will be created). Based on this path prefix, say <code>P</code>, <code>oarfish</code> will create 2 files:</p> <ul> <li><code>P.meta_info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li> <li><code>P.quant</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li> </ul>"},{"location":"#references","title":"References","text":"<ol> <li> <p>Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, https://doi.org/10.1093/nar/gkab1129 \u21a9</p> </li> </ol>"}]}
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"oarfish: transcript quantification from long-read RNA-seq data","text":""},{"location":"#basic-usage","title":"Basic usage","text":"<p><code>oarfish</code> is a program, written in <code>rust</code>, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. <code>oarfish</code> requires a sample of sequencing reads aligned to the transcriptome (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm. </p> <p>It optionally employs many filters to help discard alignments that may reduce quantification accuracy. Currently, the set of filters applied in <code>oarfish</code> are directly derived from the <code>NanoCount</code><sup>1</sup> tool; both the filters that exist, and the way their values are set (with the exception of the <code>--three-prime-clip</code> filter, which is not set by default in <code>oarfish</code> but is in <code>NanoCount</code>).</p> <p>Additionally, <code>oarfish</code> provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy. The use of this coverage model is enabled with the <code>--model-coverage</code> flag.</p> <p>The usage can be provided by passing <code>-h</code> at the command line.</p> <pre><code>accurate transcript quantification from long-read RNA-seq data\n\nUsage: oarfish [OPTIONS] --alignments <ALIGNMENTS> --output <OUTPUT>\n\nOptions:\n --quiet be quiet (i.e. don't output log messages that aren't at least warnings)\n --verbose be verbose (i.e. output all non-developer logging messages)\n -a, --alignments <ALIGNMENTS> path to the file containing the input alignments\n -o, --output <OUTPUT> location where output quantification file should be written\n -t, --threads <THREADS> maximum number of cores that the oarfish can use to obtain binomial probability [default: 1]\n -h, --help Print help\n -V, --version Print version\n\nfilters:\n --filter-group <FILTER_GROUP>\n [possible values: no-filters, nanocount-filters]\n -t, --three-prime-clip <THREE_PRIME_CLIP>\n maximum allowable distance of the right-most end of an alignment from the 3' transcript end [default: 4294967295]\n -f, --five-prime-clip <FIVE_PRIME_CLIP>\n maximum allowable distance of the left-most end of an alignment from the 5' transcript end [default: 4294967295]\n -s, --score-threshold <SCORE_THRESHOLD>\n fraction of the best possible alignment score that a secondary alignment must have for consideration [default: 0.95]\n -m, --min-aligned-fraction <MIN_ALIGNED_FRACTION>\n fraction of a query that must be mapped within an alignemnt to consider the alignemnt valid [default: 0.5]\n -l, --min-aligned-len <MIN_ALIGNED_LEN>\n minimum number of nucleotides in the aligned portion of a read [default: 50]\n -n, --allow-negative-strand\n allow both forward-strand and reverse-complement alignments\n\ncoverage model:\n --model-coverage apply the coverage model\n -b, --bins <BINS> number of bins to use in coverage model [default: 10]\n\nEM:\n --max-em-iter <MAX_EM_ITER>\n maximum number of iterations for which to run the EM algorithm [default: 1000]\n --convergence-thresh <CONVERGENCE_THRESH>\n maximum number of iterations for which to run the EM algorithm [default: 0.001]\n -q, --short-quant <SHORT_QUANT>\n location of short read quantification (if provided)\n</code></pre> <p>The input should be a <code>bam</code> format file, with reads aligned using <code>minimap2</code> against the transcriptome. That is, <code>oarfish</code> does not currently handle spliced alignment to the genome. Further, the output alignments should be name sorted (the default order produced by <code>minimap2</code> should be fine). Specifically, <code>oarfish</code> relies on the existence of the <code>AS</code> tag in the <code>bam</code> records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read.</p>"},{"location":"#output","title":"Output","text":"<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a directory (that will be created if it doesn't exist), under which the relevant output files will be placed. The output of <code>oarfish</code> constist of 2 files:</p> <ul> <li><code>info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li> <li><code>quant.tsv</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li> </ul>"},{"location":"#references","title":"References","text":"<ol> <li> <p>Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, https://doi.org/10.1093/nar/gkab1129 \u21a9</p> </li> </ol>"}]}
		{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"oarfish: transcript quantification from long-read RNA-seq data","text":""},{"location":"#basic-usage","title":"Basic usage","text":"<p><code>oarfish</code> is a program, written in <code>rust</code>, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. <code>oarfish</code> requires a sample of sequencing reads aligned to the transcriptome (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm. </p> <p>It optionally employs many filters to help discard alignments that may reduce quantification accuracy. Currently, the set of filters applied in <code>oarfish</code> are directly derived from the <code>NanoCount</code><sup>1</sup> tool; both the filters that exist, and the way their values are set (with the exception of the <code>--three-prime-clip</code> filter, which is not set by default in <code>oarfish</code> but is in <code>NanoCount</code>).</p> <p>Additionally, <code>oarfish</code> provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy. The use of this coverage model is enabled with the <code>--model-coverage</code> flag.</p> <p>The usage can be provided by passing <code>-h</code> at the command line.</p> <pre><code>accurate transcript quantification from long-read RNA-seq data\n\nUsage: oarfish [OPTIONS] --alignments <ALIGNMENTS> --output <OUTPUT>\n\nOptions:\n --quiet be quiet (i.e. don't output log messages that aren't at least warnings)\n --verbose be verbose (i.e. output all non-developer logging messages)\n -a, --alignments <ALIGNMENTS> path to the file containing the input alignments\n -o, --output <OUTPUT> location where output quantification file should be written\n -t, --threads <THREADS> maximum number of cores that the oarfish can use to obtain binomial probability [default: 1]\n -h, --help Print help\n -V, --version Print version\n\nfilters:\n --filter-group <FILTER_GROUP>\n [possible values: no-filters, nanocount-filters]\n -t, --three-prime-clip <THREE_PRIME_CLIP>\n maximum allowable distance of the right-most end of an alignment from the 3' transcript end [default: 4294967295]\n -f, --five-prime-clip <FIVE_PRIME_CLIP>\n maximum allowable distance of the left-most end of an alignment from the 5' transcript end [default: 4294967295]\n -s, --score-threshold <SCORE_THRESHOLD>\n fraction of the best possible alignment score that a secondary alignment must have for consideration [default: 0.95]\n -m, --min-aligned-fraction <MIN_ALIGNED_FRACTION>\n fraction of a query that must be mapped within an alignemnt to consider the alignemnt valid [default: 0.5]\n -l, --min-aligned-len <MIN_ALIGNED_LEN>\n minimum number of nucleotides in the aligned portion of a read [default: 50]\n -n, --allow-negative-strand\n allow both forward-strand and reverse-complement alignments\n\ncoverage model:\n --model-coverage apply the coverage model\n -b, --bins <BINS> number of bins to use in coverage model [default: 10]\n\nEM:\n --max-em-iter <MAX_EM_ITER>\n maximum number of iterations for which to run the EM algorithm [default: 1000]\n --convergence-thresh <CONVERGENCE_THRESH>\n maximum number of iterations for which to run the EM algorithm [default: 0.001]\n -q, --short-quant <SHORT_QUANT>\n location of short read quantification (if provided)\n</code></pre> <p>The input should be a <code>bam</code> format file, with reads aligned using <code>minimap2</code> against the transcriptome. That is, <code>oarfish</code> does not currently handle spliced alignment to the genome. Further, the output alignments should be name sorted (the default order produced by <code>minimap2</code> should be fine). Specifically, <code>oarfish</code> relies on the existence of the <code>AS</code> tag in the <code>bam</code> records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read.</p>"},{"location":"#output","title":"Output","text":"<p>The <code>--output</code> option passed to <code>oarfish</code> corresponds to a path prefix (this prefix can contain the path separator character and if it refers to a directory that does not yeat exist, that directory will be created). Based on this path prefix, say <code>P</code>, <code>oarfish</code> will create 2 files:</p> <ul> <li><code>P.meta_info.json</code> - a JSON format file containing information about relevant parameters with which <code>oarfish</code> was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.</li> <li><code>P.quant</code> - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The <code>num_reads</code> column provides the estimate of the number of reads originating from each target.</li> </ul>"},{"location":"#references","title":"References","text":"<ol> <li> <p>Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, https://doi.org/10.1093/nar/gkab1129 \u21a9</p> </li> </ol>"}]}