barcodedIgReC_manual.html

<head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <title>BarcodedIgReC Manual</title>
    <style type="text/css">
        .code {
            background-color: lightgray;
        }
    </style>
    <style>
    </style>
</head>

<h1>BarcodedIgReC manual</h1>
1. <a href="#intro">What is BarcodedIgReC?</a><br>

2. <a href="#install">Installation</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;2.1. <a href="#test_datasets">Verifying your installation</a><br>

3. <a href="#usage">BarcodedIgReC usage</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.1. <a href="#basic_opts">Basic options</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.2. <a href="#advanced_opts">Advanced options</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.3. <a href="#run_examples">Examples</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.4. <a href="#output">Output files</a><br>

4. <a href="#citation">Citation</a><br>

5. <a href="#feedback">Feedback and bug reports</a><br>

<!--- ----------------- -->

<h2 id="intro">1. What is BarcodedIgReC?</h2>
<a href="barcodedIgReC.html">BarcodedIgReC</a> is a modification of <a href="index.html">IgReC</a> full-length antibody repertoire construction tool for barcoded datasets.
BarcodedIgReC pipeline is shown below:<br>
<p align = center>
    <img src="barigrec_figures/BarcodedIgReC_pipeline.png" alt="BarcodedIgReC_pipeline" width = 70%>
</p>

<h4>Input:</h4>
<p align="justify">
    BarcodedIgReC takes as an input demultiplexed paired-end or single reads with unique molecular identifiers (UMIs).
    <b>Please note that IgRepertoireConstructor constructs full-length repertoire and
        expects that input reads cover variable region of antibody/TCR.</b>
</p>

<h4>Output:</h4>
<p align = justify>
    BarcodedIgReC corrects sequencing and amplification errors and groups together reads corresponding to identical antibodies.
    Thus, constructed repertoire is a set of <i>antibody clusters</i> characterized by
    <i>sequence</i>, <i>read multiplicity</i> and <i>molecule multiplicity</i>.
    While read multiplicity is the number of reads in an antibody cluster,
    molecule multiplicity is an estimate to the number of RNA molecules related to the cluster.
    BarcodedIgReC provides user with the following information about constructed repertoire:
</p>

<ul>
    <li>antibody clusters: antibody sequences with read multiplicities,</li>
    <li>antibody clusters: antibody sequences with molecule multiplicities,</li>
    <li>read-cluster map: information about antibody clusters contents.</li>
    <!--<li>highly abundant antibody clusters,</li>-->
    <!--<li>super reads: groups of identical input reads with high coverage.</li>-->
</ul>

<h4>Stages:</h4>
BarcodedIgReC pipeline consists of the following steps:
<ol>
    <li><b>VJ Finder</b>: cleaning input reads using alignment against Ig germline genes</li>
    <li><b>Barcode clustering</b>: correcting errors in reads sharing a barcode,
        correcting barcode errors, handling issues with identical and close barcodes assigned to unrelated molecules.
        Also chimeric reads are discarded at this step.
        As a result, we group all reads by original molecules.
    </li>
    <li><b>IgReC</b>: grouping very close molecules, thus correcting minor remaining errors.
        The output is formed by this step as well.
    </li>
</ol>

<!--<p align="justify">-->
    <!--You can find details of IgReC algorithm in <a href = "http://bioinformatics.oxfordjournals.org/content/31/12/i53.long">our paper</a>.-->
<!--</p>-->

<!--- ---------------------------------------------------------------- -->
<!--- ---------------------------------------------------------------- -->

<a id="install"></a>
<h2>2. Installation</h2>

For installation instructions please refer to <a href = "manual.html#install">IgRepertoireConstructor installation guide</a>.

<a id="test_datasets"></a>
<h3>2.1. Verifying your installation</h3>
For testing purposes, BarcodedIgReC comes with a toy dataset. <br><br>

To try BarcodedIgReC on the test data set, run:
<pre class="code"><code>
    ./barcoded_igrec.py --test
</code>
</pre>

If the installation of BarcodedIgReC is successful, you will find the following information at the end of the log:

<pre class="code">
    <code>
    Thank you for using BarcodedIgReC!
    Log was written to barigrec_test/igrec.log
    </code>
</pre>

<!--- ---------------------------------------------------------------- -->
<!--- ---------------------------------------------------------------- -->

<a id="usage"></a>
<h2>3. BarcodedIgReC usage</h2>
<p>
    BarcodedIgReC takes as an input demultiplexed barcoded Illumina reads covering variable region of antibody and constructs repertoire
    in <a href = "ig_repertoire_constructor_manual.html#repertoire_files">CLUSTERS.FA and RCM format</a>.
</p>

To run BarcodedIgReC, type:
<pre class="code">
    <code>
    ./barcoded_igrec.py [options] -s &lt;single_reads.fastq&gt; -o &lt;output_dir&gt;
    </code>
</pre>

OR

<pre class="code">
    <code>
    ./barcoded_igrec.py [options] -1 &lt;left_reads.fastq&gt; -2 &lt;right_reads.fastq&gt; -o &lt;output_dir&gt;
    </code>
</pre>

<!--- --------------------- -->

<a id="basic_opts"></a>
<h3>3.1. Basic options:</h3>

<code>-s &lt;single_reads.fastq&gt;</code><br>
FASTQ file with single Illumina reads (required).

<br><br>

<code>-1 &lt;left_reads.fastq&gt; -2 &lt;right_reads.fastq&gt;</code><br>
FASTQ files with paired-end Illumina reads (required).

<br><br>

<code>-o / --output &lt;output_dir&gt;</code><br>
Output directory (required).

<br><br>

<code>-t / --threads &lt;int&gt;</code><br>
The number of parallel threads. The default value is <code>16</code>.

<br><br>

<code>--test</code><br>
Running on the toy test dataset. Command line corresponding to the test run is equivalent to the following:
<pre class = "code">
    <code>
    ./barcoded_igrec.py -s test_dataset/barcodedIgReC_test.fasta -l all -o barigrec_test
    </code>
</pre>

<code>--loci / -l &lt;str&gt;</code><br>
Immunological loci to align input reads and discard reads with low score (required). <br>
Available values are <code>IGH</code> / <code>IGL</code> / <code>IGK</code> / <code>IG</code> (for all BCRs) /
<code>TRA</code> / <code>TRB</code> / <code>TRG</code> / <code>TRD</code> / <code>TR</code> (for all TCRs) or <code>all</code>.
This is a required parameter.

<br><br>

<code>--help</code><br>
Printing help.

<br><br>

<!--- --------------------- -->

<a id="advanced_opts"></a>
<h3>3.2. Advanced options:</h3>

<code>--organism &lt;str&gt;</code><br>
Organism for which the germline is taken.
Available values are <code>human</code>, <code>mouse</code>, <code>pig</code>,
<code>rabbit</code>, <code>rat</code> and <code>rhesus_monkey</code>.
The default value is <code>human</code>.
<br><br>

<code>--igrec-tau &lt;int&gt;</code><br>
Maximal allowed number of mismatches between two barcode cluster consensuses corresponding to identical antibodies.
The default (and recommended) value is 2.
This value allows barcode cluster consensuses to contain a single error.
Higher values can reduce barcoding advantage.
Lower values may produce better results for large clusters, not gluing close sequences.
At the same time, small clusters can suffer from undercorrection in such case.

<br><br>

<code>--clustering-thr <int></code><br>
Maximal allowed distance between reads sharing a barcode to put them into one cluster.
The default value is 20.
Our analysis shows that this value allows both not to put unrelated antibodies into the same cluster and to correct all the amplification errors.
You can increase this value for overamplified datasets to ensure better error correction.
You can decrease this value for datasets with high clonality to better distinguish close antibodies.


<a id="run_examples"></a>
<h3>3.3. Examples</h3>
To construct antibody repertoire from single reads <code>reads.fastq</code>, type:
<pre class="code">
    <code>
    ./barcoded_igrec.py -s reads.fastq -o output_dir -l all
    </code>
</pre>
<!--- --------------------- -->

<a id="output"></a>
<h3>3.4. Output files</h3>
BarcodedIgReC creates working directory (which name was specified using option <code>-o</code>)
and outputs the following files there:

<ul>
  <li>Final repertoire files:</li>
  <ul>
    <li><b>final_repertoire/final_repertoire.fa</b> &mdash; CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of reads.
        (details in <a href= "ig_repertoire_constructor_manual.html#repertoire_files">Antibody repertoire representation</a>).</li>

    <li><b>final_repertoire/final_repertoire_umi.fa.gz</b> &mdash; CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of RNA molecules.

    <li><b>final_repertoire/final_repertoire.rcm</b> &mdash; RCM file for the constructed repertoire
        (details in <a href = "ig_repertoire_constructor_manual.html#repertoire_files">Antibody repertoire representation</a>).</li>
  </ul><br>

  <li>VJ finder output:</li>
    <ul>
      <li>
          <b>vj_finder/cleaned_reads.fa</b> &mdash; FASTA file with cleaned reads constructed at the VJ Finder stage.
          Cleaned reads have forward direction (from V to J),
          contain V and J gene segments and are cropped by the left bound of V gene segment.
      </li>

      <li>
          <b>vj_finder/filtered_reads.fa</b> &mdash; FASTA file with filtered reads.
          Filtered reads have bad alignment to Ig germline gene segments and are likely to <re></re>present contaminations.
      </li>

      <li>
          <b>vj_finder/alignment_info.csv</b> &mdash; CSV file containing information about alignment of cleaned reads to
          V and J gene segments.
          Details of <b>alignment_info.csv</b> format are given in <a href="ig_repertoire_constructor_manual.html#alignment_info">Alignment Info file format</a>.
      </li>
    </ul><br>

    <li><b>igrec.log</b> &mdash; full log of BarcodedIgReC run.</li>
</ul>
<br>

<!--- -------------------------------------------------------------------- --->
<h2 id = "citation">4. Citations</h2>
If you use BarcodedIgReC in your research, please refer to
<p align = "justify">
    Alexander Shlemov, Sergey Bankevich, Andrey Bzikadze,
    Dmitriy M. Chudakov,
    Yana Safonova,
    and Pavel A. Pevzner.
    Reconstructing antibody repertoires from error-prone immunosequencing datasets <i>(submitted)</i>
</p>

<a id="feedback"></a>
<h2>5. Feedback and bug reports</h2>
Your comments, bug reports, and suggestions are very welcome.
They will help us to further improve BarcodedIgReC.
<br><br>
If you have any trouble running BarcodedIgReC, please send us the log file from the output directory.
<br><br>
Address for communications: <a href="mailto:igtools_support@googlegroups.com">igtools_support@googlegroups.com</a>.

</body></html>