From e14e013866e0b9ab82e2f37eae543c8a0ba93c8c Mon Sep 17 00:00:00 2001 From: Andrea Telatin Date: Wed, 24 Feb 2021 16:59:50 +0000 Subject: [PATCH] docs --- docs/1_install.md | 22 +++++-------- docs/2_usage.md | 37 ++++++++++++++++++---- docs/3_examples.md | 11 ------- docs/4_notes.md | 10 +++--- docs/tools/2.1_usage_interleave.md | 32 ++++++++++++++----- docs/tools/2.2_usage_deinterleave.md | 36 ++++++++++++++++++---- docs/tools/2.3_usage_count.md | 46 ++++++++++++++++++++++++++++ docs/tools/2.4_usage_stats.md | 41 +++++++++++++++++++++++++ 8 files changed, 186 insertions(+), 49 deletions(-) delete mode 100644 docs/3_examples.md create mode 100644 docs/tools/2.3_usage_count.md create mode 100644 docs/tools/2.4_usage_stats.md diff --git a/docs/1_install.md b/docs/1_install.md index adad57f..0308bc3 100644 --- a/docs/1_install.md +++ b/docs/1_install.md @@ -5,26 +5,18 @@ permalink: /installation # Installation -## Pre-compiled binaries -Pre-compiled binaries are the fastest and easiest way to get _qax_. To get the latest version, -use the following command, otherwise check the [stable releases](https://github.com/telatin/qax/releases). +## Install via Miniconda +The recommended installation method is via BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)): ``` -# From linux -wget "https://github.com/telatin/seqfu2/raw/main/bin/seqfu" -chmod +x seqfu - -# From macOS -wget -O seqfu "https://github.com/telatin/seqfu2/raw/main/bin/seqfu_mac" -chmod +x seqfu +conda install -c conda-forge -c bioconda seqfu ``` -## Install via Miniconda -Alternatively, you can install **seqfu** from BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)): +## Pre-compiled binaries -``` -conda install -c conda-forge -c bioconda seqfu -``` +Pre-compiled binaries are distributed with the [stable releases](https://github.com/telatin/qax/releases). + + \ No newline at end of file diff --git a/docs/2_usage.md b/docs/2_usage.md index dbb274f..e029b99 100644 --- a/docs/2_usage.md +++ b/docs/2_usage.md @@ -4,12 +4,37 @@ permalink: /usage --- # Short guide -```note -This page is a stub -``` +*SeqFu* is composed by a main program with multiple subprograms, and a set of utilities. +Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed +documentation. -`seqfu` is composed by several subprogram, and the general syntax is: +## Main program + +If invoked without parameters, *SeqFu* will print the list of subprograms: + +```text +SeqFU - Sequence Fastx Utilities +version: 0.8.8 + + • count [cnt] : count FASTA/FASTQ reads, pair-end aware + • deinterleave [dei] : deinterleave FASTQ + • derep [der] : feature-rich dereplication of FASTA/FASTQ files + • interleave [ilv] : interleave FASTQ pair ends + • sort [srt] : sort sequences by size (uniques) + • stats [st] : statistics on sequence lengths + + • grep : select sequences with patterns + • head : print first sequences + • tail : view last sequences + • view : view sequences with colored quality and oligo matches + +Add --help after each command to print usage ``` -qax program parameters -``` + +## Subprograms + +*SeqFu* is bundled with an (increasing) set of utilities sharing the FASTX parsing library: +* **fu-primers** to remove amplification primers from sequencing datasets +* **fu-orf** to extract ORFs from Paired-End libraries +* **fu-cov** to extract contigs from the most commonly used assembly programs using the coverage information printed in the headers diff --git a/docs/3_examples.md b/docs/3_examples.md deleted file mode 100644 index 91ef9bb..0000000 --- a/docs/3_examples.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -sort: 3 -permalink: /examples ---- - -# Usage examples - -This page contains a small selection of examples for getting started using **seqfu**. - -Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed -documentation. diff --git a/docs/4_notes.md b/docs/4_notes.md index c351faa..ca867b5 100644 --- a/docs/4_notes.md +++ b/docs/4_notes.md @@ -1,11 +1,13 @@ --- -sort: 6 -permalink: /about +sort: 3 +permalink: /notes --- # About SeqFu This page contains a small selection of examples for getting started using **seqfu**. -Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed -documentation. +The main parsing library is `klib.nim` by Heng Li ([lh3/biofast](https://github.com/lh3/biofast)), that provides good performances. + +For some utilities the *readfq* library has been used ([andreas-wilm/nimreadfq](https://github.com/andreas-wilm/nimreadfq)). This is based on the +C version of Heng Li's parsed, wrapped in an object oriented module. \ No newline at end of file diff --git a/docs/tools/2.1_usage_interleave.md b/docs/tools/2.1_usage_interleave.md index e628a7b..f0e1af7 100644 --- a/docs/tools/2.1_usage_interleave.md +++ b/docs/tools/2.1_usage_interleave.md @@ -3,12 +3,30 @@ sort: 1 --- # seqfu interleave -```note -This page is a stub -``` +*interleave* (or *ilv*) is one of the core subprograms of *SeqFu*. +It's used to produce an _interleaved FASTQ file_ from two separate +files containing the forward and the reverse read of a paired-end +fragment. -`seqfu` is composed by several subprogram, and the general syntax is: +```text +ilv: interleave FASTQ files -``` -qax program parameters -``` + Usage: ilv [options] -1 [-2 ] + + -f --for-tag string identifying forward files [default: auto] + -r --rev-tag string identifying forward files [default: auto] + -o --output save file to instead of STDOUT + -c --check enable careful mode (check sequence names and numbers) + -v --verbose print verbose output + + -s --strip-comments skip comments + -p --prefix "string" rename sequences (append a progressive number) + +guessing second file: + by default is scanned for _R1. and substitute with _R2. + if this fails, the patterns _1. and _2. are tested. + +example: + + ilv -1 file_R1.fq > interleaved.fq +``` \ No newline at end of file diff --git a/docs/tools/2.2_usage_deinterleave.md b/docs/tools/2.2_usage_deinterleave.md index bb72274..83cca60 100644 --- a/docs/tools/2.2_usage_deinterleave.md +++ b/docs/tools/2.2_usage_deinterleave.md @@ -3,12 +3,36 @@ sort: 2 --- # seqfu deinterleave -```note -This page is a stub -``` +*deinterleave* (or *dei*) is one of the core subprograms of *SeqFu*. +It's used to produce two separate FASTQ files from an interleaved file. -`seqfu` is composed by several subprogram, and the general syntax is: +```text +ilv: interleave FASTQ files + Usage: dei [options] -o basename + + -o --output-basename "str" save output to output_R1.fq and output_R2.fq + -f --for-ext "R1" extension for R1 file [default: _R1.fq] + -r --rev-ext "R2" extension for R2 file [default: _R2.fq] + -c --check enable careful mode (check sequence names and numbers) + -v --verbose print verbose output + + -s --strip-comments skip comments + -p --prefix "string" rename sequences (append a progressive number) + +notes: + use "-" as input filename to read from STDIN + +example: + + dei -o newfile file.fq ``` -qax program parameters -``` + + +### Streaming + +If a program produce an interleaved output, `seqfu deinterleave` can be used in a pipe (specifying "-" as input): + +```bash +fu-primers -1 file_R1.fq -2 file_R2.fq | seqfu deinterleave -o fileNoPrimers - +``` \ No newline at end of file diff --git a/docs/tools/2.3_usage_count.md b/docs/tools/2.3_usage_count.md new file mode 100644 index 0000000..8868355 --- /dev/null +++ b/docs/tools/2.3_usage_count.md @@ -0,0 +1,46 @@ +--- +sort: 3 +--- +# seqfu count + +*count* (or *cnt*) is one of the core subprograms of *SeqFu*. +It's used to count the sequences in FASTA/FASTQ files, and it's _paired-end_ aware so +it will print the count of both files in a single line, but checking that both +files have the same number of sequences. + +```text +Usage: count [options] [ ...] + +Options: + -a, --abs-path Print absolute paths + -b, --basename Print only filenames + -u, --unpair Print separate records for paired end files + -f, --for-tag R1 Forward tag [default: auto] + -r, --rev-tag R2 Reverse tag [default: auto] + -v, --verbose Verbose output + -h, --help Show this help +``` + + +### Streaming + +Input from stream is supported. + +### Example output + +Output is a TSV text with three columns: sample name, number of reads and type ("SE" for Single End, "Paired" for Paired End) + +```text +data/test.fastq 3 SE +data/comments.fastq 5 SE +data/test2.fastq 3 SE +data/qualities.fq 5 SE +data/illumina_1.fq.gz 7 Paired +``` + +In case of errors will print a warning: +```text +ERROR: Different counts in data/longerone_R1.fq.gz and data/longerone_R2.fq.gz +# data/longerone_R1.fq.gz: 7 +# data/longerone_R2.fq.gz: 2 +``` \ No newline at end of file diff --git a/docs/tools/2.4_usage_stats.md b/docs/tools/2.4_usage_stats.md new file mode 100644 index 0000000..02c5796 --- /dev/null +++ b/docs/tools/2.4_usage_stats.md @@ -0,0 +1,41 @@ +--- +sort: 4 +--- +# seqfu stats + +*stats* is one of the core subprograms of *SeqFu*. + +```text +Usage: stats [options] [ ...] + +Options: + -a, --abs-path Print absolute paths + -b, --basename Print only filenames + -n, --nice Print nice terminal table + --csv Separate with commas (default: tabs) + -v, --verbose Verbose output + -h, --help Show this help +``` + + +### Example output + +Output is a TSV text with three columns (or CSV using `--csv`): +```text +File,#Seq,Sum,Avg,N50,N75,N90,Min,Max +data/filt.fa.gz,78730,24299931,308.6,316,316,220,180,485 +``` + +### Screen friendly output + +When using `-n` (`--nice`) output: + +```text +seqfu stats data/filt.fa.gz -n +┌─────────────────┬───────┬──────────┬───────┬─────┬─────┬─────┬─────┬─────┐ +│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ Min │ Max │ +├─────────────────┼───────┼──────────┼───────┼─────┼─────┼─────┼─────┼─────┤ +│ data/filt.fa.gz │ 78730 │ 24299931 │ 308.6 │ 316 │ 316 │ 220 │ 180 │ 485 │ +└─────────────────┴───────┴──────────┴───────┴─────┴─────┴─────┴─────┴─────┘ +``` + \ No newline at end of file