Skip to content

Commit

Permalink
Merge pull request #14 from COMBINE-lab/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
rob-p authored Jan 7, 2023
2 parents 39e552b + 815def0 commit fa48fe6
Show file tree
Hide file tree
Showing 7 changed files with 1,276 additions and 10 deletions.
51 changes: 46 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,20 @@ The `pyroe` package provides useful functions for analyzing single-cell or singl
## Installation
The `pyroe` package can be accessed from its [github repository](https://github.com/COMBINE-lab/pyroe), installed via [`pip`](https://pip.pypa.io/en/stable/). To install the `pyroe` package via `pip` use the command:

```
```sh
pip install pyroe
```

To make use of the `load_fry` function (which, itself, installs [scanpy](https://scanpy.readthedocs.io/en/stable/)), you should also be sure to install the package with the `scanpy` extra:

```
```sh
pip install pyroe[scanpy]
```

Alternatively, `pyroe` can be installed via `bioconda`, which will automatically install the variant of the package including `load_fry`, and will
also install `bedtools` to enable faster construction of the *splici* reference (see below). This installation can be performed with the command:

```
```sh
conda install pyroe
```

Expand All @@ -39,7 +39,7 @@ The USA mode in alevin-fry requires a special index reference, which is called t

Following is an example of calling the `pyroe` to make the *splici* index reference. The final flank length is calculated as the difference between the read length and the flank_trim_length, i.e., 5-2=3. This function allows you to add extra spliced and unspliced sequences to the *splici* index, which will be useful when some unannotated sequences, such as mitochondrial genes, are important for your experiment. **Note** : to make `pyroe` work more quickly, it is recommended to have the latest version of [`bedtools`](https://bedtools.readthedocs.io/en/latest/) ([Aaron R. Quinlan and Ira M. Hall, 2010](https://doi.org/10.1093/bioinformatics/btq033)) installed.

```
```sh
pyroe make-splici extdata/small_example_genome.fa extdata/small_example.gtf 5 splici_txome \
--flank-trim-length 2 --filename-prefix transcriptome_splici --dedup-seqs
```
Expand Down Expand Up @@ -90,6 +90,47 @@ optional arguments:

The *splici* index of a given species consists of the transcriptome of the species, i.e., the spliced transcripts, and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences. For more detailed information, please check the section S2 in the supplementary file of [alevin-fry manuscript](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2).

## Prepare spliceu index for quantification with alevin-fry

Recently, [He et al.](https://www.biorxiv.org/content/10.1101/2023.01.04.522742v1) introduced the <ins>*splice*</ins>d+<ins>*u*</ins>nspliced (_spliceu_) index in alevin-fry. This requires the _spliceu_ transcriptome. The command of making an *spliceu* transcriptome reference is similar to making a _splici_ reference:

```sh
pyroe make-spliceu extdata/small_example_genome.fa extdata/small_example.gtf spliceu_txome \
--filename-prefix transcriptome_spliceu
```

### Full usage
```
usage: pyroe make-spliceu [-h] [--filename-prefix FILENAME_PREFIX]
[--extra-spliced EXTRA_SPLICED]
[--extra-unspliced EXTRA_UNSPLICED]
[--bt-path BT_PATH] [--no-bt] [--dedup-seqs]
[--write-clean-gtf]
genome-path gtf-path output-dir
positional arguments:
genome-path The path to a genome fasta file.
gtf-path The path to a gtf file.
output-dir The output directory where Spliceu reference files
will be written.
options:
-h, --help show this help message and exit
--filename-prefix FILENAME_PREFIX
The file name prefix of the generated output files.
--extra-spliced EXTRA_SPLICED
The path to an extra spliced sequence fasta file.
--extra-unspliced EXTRA_UNSPLICED
The path to an extra unspliced sequence fasta file.
--bt-path BT_PATH The path to bedtools v2.30.0 or greater.
--no-bt A flag indicates whether bedtools will be used for
generating Spliceu reference files.
--dedup-seqs A flag indicates whether identical sequences will be
deduplicated.
--write-clean-gtf A flag indicates whether a clean gtf will be written
if encountered invalid records.
```

## Processing alevin-fry quantification result

The quantification result of alevin-fry can be loaded into python by the `load_fry()` function. This function takes a output directory returned by `alevin-fry quant` command as the minimum input, and load the quantification result as an `AnnData` object. When processing USA mode result, it assumes that the data comes from a single-cell RNA-sequencing experiment. If one wants to process single-nucleus RNA-sequencing data or prepare the single-cell data for RNA-velocity analysis, the `output_format` argument should be set as `snRNA` or `velocity` correspondingly. One can also define customized output format, see the Full Usage section for detail.
Expand Down Expand Up @@ -176,7 +217,7 @@ We provide two python functions:
- `load_processed_quant()` can fetch the quantification result of one or more available dataset as `fetch_processed_quant()`, and load them into python as `AnnData` objects. We also provide a CLI for fetching quantification results.


```bash
```sh
pyroe fetch-quant 1 3 6
```

Expand Down
75 changes: 74 additions & 1 deletion bin/pyroe
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import logging

from pyroe import make_splici_txome
from pyroe import make_splici_txome, make_spliceu_txome
from pyroe import fetch_processed_quant
from pyroe import convert
from pyroe import id_to_name
Expand All @@ -22,12 +22,15 @@ if __name__ == "__main__":
parser.add_argument(
"-v", "--version", action="version", version=f"pyroe {__version__}"
)

subparsers = parser.add_subparsers(
title="subcommands",
dest="command",
description="valid subcommands",
help="additional help",
)

# make-splici
parser_makeSplici = subparsers.add_parser(
"make-splici", help="Make splici reference"
)
Expand Down Expand Up @@ -101,6 +104,63 @@ if __name__ == "__main__":
help="A flag indicates whether a clean gtf will be written if encountered invalid records.",
)

# make-spliceu
parser_makeSpliceu = subparsers.add_parser(
"make-spliceu", help="Make spliceu reference"
)
parser_makeSpliceu.add_argument(
"genome_path",
metavar="genome-path",
type=str,
help="The path to a genome fasta file.",
)
parser_makeSpliceu.add_argument(
"gtf_path", metavar="gtf-path", type=str, help="The path to a gtf file."
)
parser_makeSpliceu.add_argument(
"output_dir",
metavar="output-dir",
type=str,
help="The output directory where Spliceu reference files will be written.",
)
parser_makeSpliceu.add_argument(
"--filename-prefix",
type=str,
default="spliceu",
help="The file name prefix of the generated output files.",
)
parser_makeSpliceu.add_argument(
"--extra-spliced",
type=str,
help="The path to an extra spliced sequence fasta file.",
)
parser_makeSpliceu.add_argument(
"--extra-unspliced",
type=str,
help="The path to an extra unspliced sequence fasta file.",
)
parser_makeSpliceu.add_argument(
"--bt-path",
type=str,
default="bedtools",
help="The path to bedtools v2.30.0 or greater.",
)
parser_makeSpliceu.add_argument(
"--no-bt",
action="store_true",
help="A flag indicates whether bedtools will be used for generating Spliceu reference files.",
)
parser_makeSpliceu.add_argument(
"--dedup-seqs",
action="store_true",
help="A flag indicates whether identical sequences will be deduplicated.",
)
parser_makeSpliceu.add_argument(
"--write-clean-gtf",
action="store_true",
help="A flag indicates whether a clean gtf will be written if encountered invalid records.",
)

# parse available datasets
available_datasets = fetch_processed_quant()
epilog = "\n".join(
Expand Down Expand Up @@ -212,6 +272,19 @@ if __name__ == "__main__":
no_flanking_merge=args.no_flanking_merge,
write_clean_gtf=args.write_clean_gtf,
)
elif args.command == "make-spliceu":
make_spliceu_txome(
genome_path=args.genome_path,
gtf_path=args.gtf_path,
output_dir=args.output_dir,
filename_prefix=args.filename_prefix,
extra_spliced=args.extra_spliced,
extra_unspliced=args.extra_unspliced,
dedup_seqs=args.dedup_seqs,
no_bt=args.no_bt,
bt_path=args.bt_path,
write_clean_gtf=args.write_clean_gtf,
)
elif args.command == "fetch-quant":
fetch_processed_quant(
dataset_ids=args.dataset_ids,
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = pyroe
version = 0.6.4
version = 0.7.0
author = Dongze He, Rob Patro
author_email = [email protected], [email protected]
description = utilities of alevin-fry
Expand Down
4 changes: 2 additions & 2 deletions src/pyroe/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
__version__ = "0.6.4"
__version__ = "0.7.0"

from pyroe.load_fry import load_fry
from pyroe.make_splici_txome import make_splici_txome
from pyroe.make_txome import make_splici_txome, make_spliceu_txome
from pyroe.fetch_processed_quant import fetch_processed_quant
from pyroe.load_processed_quant import load_processed_quant
from pyroe.ProcessedQuant import ProcessedQuant
Expand Down
Loading

0 comments on commit fa48fe6

Please sign in to comment.