From e14e013866e0b9ab82e2f37eae543c8a0ba93c8c Mon Sep 17 00:00:00 2001
From: Andrea Telatin <andrea@telatin.com>
Date: Wed, 24 Feb 2021 16:59:50 +0000
Subject: [PATCH] docs

---
 docs/1_install.md                    | 22 +++++--------
 docs/2_usage.md                      | 37 ++++++++++++++++++----
 docs/3_examples.md                   | 11 -------
 docs/4_notes.md                      | 10 +++---
 docs/tools/2.1_usage_interleave.md   | 32 ++++++++++++++-----
 docs/tools/2.2_usage_deinterleave.md | 36 ++++++++++++++++++----
 docs/tools/2.3_usage_count.md        | 46 ++++++++++++++++++++++++++++
 docs/tools/2.4_usage_stats.md        | 41 +++++++++++++++++++++++++
 8 files changed, 186 insertions(+), 49 deletions(-)
 delete mode 100644 docs/3_examples.md
 create mode 100644 docs/tools/2.3_usage_count.md
 create mode 100644 docs/tools/2.4_usage_stats.md

diff --git a/docs/1_install.md b/docs/1_install.md
index adad57f..0308bc3 100644
--- a/docs/1_install.md
+++ b/docs/1_install.md
@@ -5,26 +5,18 @@ permalink: /installation
 
 # Installation
 
-## Pre-compiled binaries
 
-Pre-compiled binaries are the fastest and easiest way to get _qax_. To get the latest version,
-use the following command, otherwise check the [stable releases](https://github.com/telatin/qax/releases).  
+## Install via Miniconda
 
+The recommended installation method is via BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)):
 
 ```
-# From linux
-wget "https://github.com/telatin/seqfu2/raw/main/bin/seqfu"
-chmod +x seqfu
-
-# From macOS
-wget -O seqfu "https://github.com/telatin/seqfu2/raw/main/bin/seqfu_mac"
-chmod +x seqfu
+conda install -c conda-forge -c bioconda seqfu
 ```
 
-## Install via Miniconda
 
-Alternatively, you can install **seqfu** from BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)):
+## Pre-compiled binaries
 
-```
-conda install -c conda-forge -c bioconda seqfu
-```
+Pre-compiled binaries are distributed with the [stable releases](https://github.com/telatin/qax/releases).  
+
+ 
\ No newline at end of file
diff --git a/docs/2_usage.md b/docs/2_usage.md
index dbb274f..e029b99 100644
--- a/docs/2_usage.md
+++ b/docs/2_usage.md
@@ -4,12 +4,37 @@ permalink: /usage
 ---
 # Short guide
 
-```note
-This page is a stub
-```
+*SeqFu* is composed by a main program with multiple subprograms, and a set of utilities.
+Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed
+documentation.
 
-`seqfu` is composed by several subprogram, and the general syntax is:
 
+## Main program
+
+If invoked without parameters, *SeqFu* will print the list of subprograms:
+
+```text
+SeqFU - Sequence Fastx Utilities
+version: 0.8.8
+
+	• count [cnt]         : count FASTA/FASTQ reads, pair-end aware
+	• deinterleave [dei]  : deinterleave FASTQ
+	• derep [der]         : feature-rich dereplication of FASTA/FASTQ files
+	• interleave [ilv]    : interleave FASTQ pair ends
+	• sort [srt]          : sort sequences by size (uniques)
+	• stats [st]          : statistics on sequence lengths
+
+	• grep                : select sequences with patterns
+	• head                : print first sequences
+	• tail                : view last sequences
+	• view                : view sequences with colored quality and oligo matches
+
+Add --help after each command to print usage
 ```
-qax program parameters
-```
+
+## Subprograms
+
+*SeqFu* is bundled with an (increasing) set of utilities sharing the FASTX parsing library:
+* **fu-primers** to remove amplification primers from sequencing datasets
+* **fu-orf** to extract ORFs from Paired-End libraries
+* **fu-cov** to extract contigs from the most commonly used assembly programs using the coverage information printed in the headers
diff --git a/docs/3_examples.md b/docs/3_examples.md
deleted file mode 100644
index 91ef9bb..0000000
--- a/docs/3_examples.md
+++ /dev/null
@@ -1,11 +0,0 @@
----
-sort: 3
-permalink: /examples
----
-
-# Usage examples
-
-This page contains a small selection of examples for getting started using **seqfu**.
-
-Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed
-documentation.
diff --git a/docs/4_notes.md b/docs/4_notes.md
index c351faa..ca867b5 100644
--- a/docs/4_notes.md
+++ b/docs/4_notes.md
@@ -1,11 +1,13 @@
 ---
-sort: 6
-permalink: /about
+sort: 3
+permalink: /notes
 ---
 
 # About SeqFu
 
 This page contains a small selection of examples for getting started using **seqfu**.
 
-Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed
-documentation.
+The main parsing library is `klib.nim` by Heng Li ([lh3/biofast](https://github.com/lh3/biofast)), that provides good performances.
+
+For some utilities the *readfq* library has been used ([andreas-wilm/nimreadfq](https://github.com/andreas-wilm/nimreadfq)). This is based on the
+C version of Heng Li's parsed, wrapped in an object oriented module.
\ No newline at end of file
diff --git a/docs/tools/2.1_usage_interleave.md b/docs/tools/2.1_usage_interleave.md
index e628a7b..f0e1af7 100644
--- a/docs/tools/2.1_usage_interleave.md
+++ b/docs/tools/2.1_usage_interleave.md
@@ -3,12 +3,30 @@ sort: 1
 ---
 # seqfu interleave
 
-```note
-This page is a stub
-```
+*interleave* (or *ilv*) is one of the core subprograms of *SeqFu*.
+It's used to produce an _interleaved FASTQ file_ from two separate 
+files containing the forward and the reverse read of a paired-end 
+fragment.
 
-`seqfu` is composed by several subprogram, and the general syntax is:
+```text
+ilv: interleave FASTQ files
 
-```
-qax program parameters
-```
+  Usage: ilv [options] -1 <forward-pair> [-2 <reverse-pair>]
+
+  -f --for-tag <tag-1>       string identifying forward files [default: auto]
+  -r --rev-tag <tag-2>       string identifying forward files [default: auto]
+  -o --output <outputfile>   save file to <out-file> instead of STDOUT
+  -c --check                 enable careful mode (check sequence names and numbers)
+  -v --verbose               print verbose output
+
+  -s --strip-comments        skip comments
+  -p --prefix "string"       rename sequences (append a progressive number)
+
+guessing second file:
+  by default <forward-pair> is scanned for _R1. and substitute with _R2.
+  if this fails, the patterns _1. and _2. are tested.
+
+example:
+
+    ilv -1 file_R1.fq > interleaved.fq
+```
\ No newline at end of file
diff --git a/docs/tools/2.2_usage_deinterleave.md b/docs/tools/2.2_usage_deinterleave.md
index bb72274..83cca60 100644
--- a/docs/tools/2.2_usage_deinterleave.md
+++ b/docs/tools/2.2_usage_deinterleave.md
@@ -3,12 +3,36 @@ sort: 2
 ---
 # seqfu deinterleave
 
-```note
-This page is a stub
-```
+*deinterleave* (or *dei*) is one of the core subprograms of *SeqFu*.
+It's used to produce two separate FASTQ files from an interleaved file. 
 
-`seqfu` is composed by several subprogram, and the general syntax is:
+```text
+ilv: interleave FASTQ files
 
+  Usage: dei [options] -o basename <interleaved-fastq>
+
+  -o --output-basename "str"     save output to output_R1.fq and output_R2.fq
+  -f --for-ext "R1"              extension for R1 file [default: _R1.fq]
+  -r --rev-ext "R2"              extension for R2 file [default: _R2.fq]
+  -c --check                     enable careful mode (check sequence names and numbers)
+  -v --verbose                   print verbose output
+
+  -s --strip-comments            skip comments
+  -p --prefix "string"           rename sequences (append a progressive number)
+ 
+notes:
+    use "-" as input filename to read from STDIN
+
+example:
+
+    dei -o newfile file.fq
 ```
-qax program parameters
-```
+
+
+### Streaming
+
+If a program produce an interleaved output, `seqfu deinterleave` can be used in a pipe (specifying "-" as input):
+
+```bash
+fu-primers -1 file_R1.fq -2 file_R2.fq | seqfu deinterleave -o fileNoPrimers -
+```
\ No newline at end of file
diff --git a/docs/tools/2.3_usage_count.md b/docs/tools/2.3_usage_count.md
new file mode 100644
index 0000000..8868355
--- /dev/null
+++ b/docs/tools/2.3_usage_count.md
@@ -0,0 +1,46 @@
+---
+sort: 3
+---
+# seqfu count
+
+*count* (or *cnt*) is one of the core subprograms of *SeqFu*.
+It's used to count the sequences in FASTA/FASTQ files, and it's _paired-end_ aware so 
+it will print the count of both files in a single line, but checking that both
+files have the same number of sequences.
+
+```text
+Usage: count [options] [<inputfile> ...]
+
+Options:
+  -a, --abs-path         Print absolute paths
+  -b, --basename         Print only filenames
+  -u, --unpair           Print separate records for paired end files
+  -f, --for-tag R1       Forward tag [default: auto]
+  -r, --rev-tag R2       Reverse tag [default: auto]
+  -v, --verbose          Verbose output
+  -h, --help             Show this help
+```
+
+
+### Streaming
+
+Input from stream is supported.
+
+### Example output
+
+Output is a TSV text with three columns: sample name, number of reads and type ("SE" for Single End, "Paired" for Paired End)
+
+```text
+data/test.fastq       3  SE
+data/comments.fastq   5  SE
+data/test2.fastq      3  SE
+data/qualities.fq     5  SE
+data/illumina_1.fq.gz 7  Paired
+```
+
+In case of errors will print a warning:
+```text
+ERROR: Different counts in data/longerone_R1.fq.gz and data/longerone_R2.fq.gz
+# data/longerone_R1.fq.gz: 7
+# data/longerone_R2.fq.gz: 2
+```
\ No newline at end of file
diff --git a/docs/tools/2.4_usage_stats.md b/docs/tools/2.4_usage_stats.md
new file mode 100644
index 0000000..02c5796
--- /dev/null
+++ b/docs/tools/2.4_usage_stats.md
@@ -0,0 +1,41 @@
+---
+sort: 4
+---
+# seqfu stats
+
+*stats*  is one of the core subprograms of *SeqFu*.
+
+```text
+Usage: stats [options] [<inputfile> ...]
+
+Options:
+  -a, --abs-path         Print absolute paths
+  -b, --basename         Print only filenames
+  -n, --nice             Print nice terminal table
+  --csv                  Separate with commas (default: tabs)
+  -v, --verbose          Verbose output
+  -h, --help             Show this help
+```
+
+ 
+### Example output
+
+Output is a TSV text with three columns (or CSV using  `--csv`):
+```text
+File,#Seq,Sum,Avg,N50,N75,N90,Min,Max
+data/filt.fa.gz,78730,24299931,308.6,316,316,220,180,485
+```
+
+### Screen friendly output
+
+When using `-n` (`--nice`) output:
+
+```text 
+seqfu stats data/filt.fa.gz  -n
+┌─────────────────┬───────┬──────────┬───────┬─────┬─────┬─────┬─────┬─────┐
+│ File            │ #Seq  │ Total bp │ Avg   │ N50 │ N75 │ N90 │ Min │ Max │
+├─────────────────┼───────┼──────────┼───────┼─────┼─────┼─────┼─────┼─────┤
+│ data/filt.fa.gz │ 78730 │ 24299931 │ 308.6 │ 316 │ 316 │ 220 │ 180 │ 485 │
+└─────────────────┴───────┴──────────┴───────┴─────┴─────┴─────┴─────┴─────┘
+```
+ 
\ No newline at end of file