diff --git a/docs/src/guides/data-prep/local-data.rst b/docs/src/guides/data-prep/local-data.rst index f74b1771f..823d5fdeb 100644 --- a/docs/src/guides/data-prep/local-data.rst +++ b/docs/src/guides/data-prep/local-data.rst @@ -18,7 +18,7 @@ The first 2 lines in ``data/sequences.fasta`` look like this: **The first line is the ``strain`` or ``name`` of the sequence.** Lines with names in FASTA files always start with the ``>`` character (this is not part of the name), and may not contain spaces or ``()[]{}|#><``. Note that “strain” here carries no biological or functional significance and should largely be thought of as synonymous with “sample.” -The sequence itself is a `consensus genome `__. +The sequence itself is a `consensus genome `__. **By default, sequences less than 27,000 bases in length or with more than 3,000 ``N`` (unknown) bases are omitted from the analysis.** For a basic QC and preliminary analysis of your sequence data, you can use `clades.nextstrain.org `__. This tool will check your sequences for excess divergence, clustered differences from the reference, and missing or ambiguous data. In addition, it will assign nextstrain clades and call mutations relative to the reference.