Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
telatin committed Feb 24, 2021
1 parent b416789 commit e14e013
Show file tree
Hide file tree
Showing 8 changed files with 186 additions and 49 deletions.
22 changes: 7 additions & 15 deletions docs/1_install.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,18 @@ permalink: /installation

# Installation

## Pre-compiled binaries

Pre-compiled binaries are the fastest and easiest way to get _qax_. To get the latest version,
use the following command, otherwise check the [stable releases](https://github.com/telatin/qax/releases).
## Install via Miniconda

The recommended installation method is via BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)):

```
# From linux
wget "https://github.com/telatin/seqfu2/raw/main/bin/seqfu"
chmod +x seqfu
# From macOS
wget -O seqfu "https://github.com/telatin/seqfu2/raw/main/bin/seqfu_mac"
chmod +x seqfu
conda install -c conda-forge -c bioconda seqfu
```

## Install via Miniconda

Alternatively, you can install **seqfu** from BioConda, if you have _conda_ installed ([how to install it](https://docs.conda.io/en/latest/miniconda.html)):
## Pre-compiled binaries

```
conda install -c conda-forge -c bioconda seqfu
```
Pre-compiled binaries are distributed with the [stable releases](https://github.com/telatin/qax/releases).

37 changes: 31 additions & 6 deletions docs/2_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,37 @@ permalink: /usage
---
# Short guide

```note
This page is a stub
```
*SeqFu* is composed by a main program with multiple subprograms, and a set of utilities.
Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed
documentation.

`seqfu` is composed by several subprogram, and the general syntax is:

## Main program

If invoked without parameters, *SeqFu* will print the list of subprograms:

```text
SeqFU - Sequence Fastx Utilities
version: 0.8.8
• count [cnt] : count FASTA/FASTQ reads, pair-end aware
• deinterleave [dei] : deinterleave FASTQ
• derep [der] : feature-rich dereplication of FASTA/FASTQ files
• interleave [ilv] : interleave FASTQ pair ends
• sort [srt] : sort sequences by size (uniques)
• stats [st] : statistics on sequence lengths
• grep : select sequences with patterns
• head : print first sequences
• tail : view last sequences
• view : view sequences with colored quality and oligo matches
Add --help after each command to print usage
```
qax program parameters
```

## Subprograms

*SeqFu* is bundled with an (increasing) set of utilities sharing the FASTX parsing library:
* **fu-primers** to remove amplification primers from sequencing datasets
* **fu-orf** to extract ORFs from Paired-End libraries
* **fu-cov** to extract contigs from the most commonly used assembly programs using the coverage information printed in the headers
11 changes: 0 additions & 11 deletions docs/3_examples.md

This file was deleted.

10 changes: 6 additions & 4 deletions docs/4_notes.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
---
sort: 6
permalink: /about
sort: 3
permalink: /notes
---

# About SeqFu

This page contains a small selection of examples for getting started using **seqfu**.

Check the complete documentation for each [tool]({{site.baseurl}}/tools), that contains the detailed
documentation.
The main parsing library is `klib.nim` by Heng Li ([lh3/biofast](https://github.com/lh3/biofast)), that provides good performances.

For some utilities the *readfq* library has been used ([andreas-wilm/nimreadfq](https://github.com/andreas-wilm/nimreadfq)). This is based on the
C version of Heng Li's parsed, wrapped in an object oriented module.
32 changes: 25 additions & 7 deletions docs/tools/2.1_usage_interleave.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,30 @@ sort: 1
---
# seqfu interleave

```note
This page is a stub
```
*interleave* (or *ilv*) is one of the core subprograms of *SeqFu*.
It's used to produce an _interleaved FASTQ file_ from two separate
files containing the forward and the reverse read of a paired-end
fragment.

`seqfu` is composed by several subprogram, and the general syntax is:
```text
ilv: interleave FASTQ files
```
qax program parameters
```
Usage: ilv [options] -1 <forward-pair> [-2 <reverse-pair>]
-f --for-tag <tag-1> string identifying forward files [default: auto]
-r --rev-tag <tag-2> string identifying forward files [default: auto]
-o --output <outputfile> save file to <out-file> instead of STDOUT
-c --check enable careful mode (check sequence names and numbers)
-v --verbose print verbose output
-s --strip-comments skip comments
-p --prefix "string" rename sequences (append a progressive number)
guessing second file:
by default <forward-pair> is scanned for _R1. and substitute with _R2.
if this fails, the patterns _1. and _2. are tested.
example:
ilv -1 file_R1.fq > interleaved.fq
```
36 changes: 30 additions & 6 deletions docs/tools/2.2_usage_deinterleave.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,36 @@ sort: 2
---
# seqfu deinterleave

```note
This page is a stub
```
*deinterleave* (or *dei*) is one of the core subprograms of *SeqFu*.
It's used to produce two separate FASTQ files from an interleaved file.

`seqfu` is composed by several subprogram, and the general syntax is:
```text
ilv: interleave FASTQ files
Usage: dei [options] -o basename <interleaved-fastq>
-o --output-basename "str" save output to output_R1.fq and output_R2.fq
-f --for-ext "R1" extension for R1 file [default: _R1.fq]
-r --rev-ext "R2" extension for R2 file [default: _R2.fq]
-c --check enable careful mode (check sequence names and numbers)
-v --verbose print verbose output
-s --strip-comments skip comments
-p --prefix "string" rename sequences (append a progressive number)
notes:
use "-" as input filename to read from STDIN
example:
dei -o newfile file.fq
```
qax program parameters
```


### Streaming

If a program produce an interleaved output, `seqfu deinterleave` can be used in a pipe (specifying "-" as input):

```bash
fu-primers -1 file_R1.fq -2 file_R2.fq | seqfu deinterleave -o fileNoPrimers -
```
46 changes: 46 additions & 0 deletions docs/tools/2.3_usage_count.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
sort: 3
---
# seqfu count

*count* (or *cnt*) is one of the core subprograms of *SeqFu*.
It's used to count the sequences in FASTA/FASTQ files, and it's _paired-end_ aware so
it will print the count of both files in a single line, but checking that both
files have the same number of sequences.

```text
Usage: count [options] [<inputfile> ...]
Options:
-a, --abs-path Print absolute paths
-b, --basename Print only filenames
-u, --unpair Print separate records for paired end files
-f, --for-tag R1 Forward tag [default: auto]
-r, --rev-tag R2 Reverse tag [default: auto]
-v, --verbose Verbose output
-h, --help Show this help
```


### Streaming

Input from stream is supported.

### Example output

Output is a TSV text with three columns: sample name, number of reads and type ("SE" for Single End, "Paired" for Paired End)

```text
data/test.fastq 3 SE
data/comments.fastq 5 SE
data/test2.fastq 3 SE
data/qualities.fq 5 SE
data/illumina_1.fq.gz 7 Paired
```

In case of errors will print a warning:
```text
ERROR: Different counts in data/longerone_R1.fq.gz and data/longerone_R2.fq.gz
# data/longerone_R1.fq.gz: 7
# data/longerone_R2.fq.gz: 2
```
41 changes: 41 additions & 0 deletions docs/tools/2.4_usage_stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
sort: 4
---
# seqfu stats

*stats* is one of the core subprograms of *SeqFu*.

```text
Usage: stats [options] [<inputfile> ...]
Options:
-a, --abs-path Print absolute paths
-b, --basename Print only filenames
-n, --nice Print nice terminal table
--csv Separate with commas (default: tabs)
-v, --verbose Verbose output
-h, --help Show this help
```


### Example output

Output is a TSV text with three columns (or CSV using `--csv`):
```text
File,#Seq,Sum,Avg,N50,N75,N90,Min,Max
data/filt.fa.gz,78730,24299931,308.6,316,316,220,180,485
```

### Screen friendly output

When using `-n` (`--nice`) output:

```text
seqfu stats data/filt.fa.gz -n
┌─────────────────┬───────┬──────────┬───────┬─────┬─────┬─────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ Min │ Max │
├─────────────────┼───────┼──────────┼───────┼─────┼─────┼─────┼─────┼─────┤
│ data/filt.fa.gz │ 78730 │ 24299931 │ 308.6 │ 316 │ 316 │ 220 │ 180 │ 485 │
└─────────────────┴───────┴──────────┴───────┴─────┴─────┴─────┴─────┴─────┘
```

0 comments on commit e14e013

Please sign in to comment.