diff --git a/README.md b/README.md index 594eb89..0cc7d6c 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ The `pyroe` package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`, which consists of -1. Preparing the *splici* reference for the `USA` mode of `alevin-fry`, which will export a unspliced, a spliced, and an ambiguous molecule count for each gene within each cell. +1. Preparing the spliced + intronic (*splici*) or spliced + unspliced (*spliceu*) reference for the `USA` mode of `alevin-fry`, which will export a unspliced, a spliced, and an ambiguous molecule count for each gene within each cell. 2. Fetching and loading the preprocessed quantification results of `alevin-fry` into python as an [`AnnData`](https://anndata.readthedocs.io/en/latest/) object. 3. Converting the `mtx` format output of `alevin-fry` (specifically in USA mode) to other formats, such as the `AnnData` native [`h5ad` format](https://anndata.readthedocs.io/en/latest/generated/anndata.read_h5ad.html#anndata.read_h5ad). @@ -24,7 +24,7 @@ pip install pyroe[scanpy] ``` Alternatively, `pyroe` can be installed via `bioconda`, which will automatically install the variant of the package including `load_fry`, and will -also install `bedtools` to enable faster construction of the *splici* reference (see below). This installation can be performed with the command: +also install `bedtools` to enable faster construction of the ** reference (see below). This installation can be performed with the command: ```sh conda install pyroe @@ -33,25 +33,25 @@ conda install pyroe with the appropriate bioconda channel in the conda channel list. -## Preparing a splici index for quantification with alevin-fry +## Preparing a *spliced + intronic (_splici_)* index for quantification with alevin-fry -The USA mode in alevin-fry requires a special index reference, which is called the *splici* reference. The *splici* reference contains the spliced transcripts plus the intronic sequences of each gene. The `make_splici_txome()` function is designed to make the *splici* reference by taking a genome FASTA file and a gene annotation GTF file as the input. Details about the *splici* can be found in Section S2 of the supplementary file of the [alevin-fry paper](https://www.nature.com/articles/s41592-022-01408-3). To run pyroe, you also need to specify the read length argument `read_length` of the experiment you are working on and the flank trimming length `flank_trim_length`. A final flank length will be computed as the difference between the read_length and flank trimming length and will be attached to the ends of each intron to absorb the intron-exon junctional reads. +The USA mode in alevin-fry requires a special index reference. Specifically, it requires either a spliced + intronic (*splici*)reference or a spliced + unspliced (*spliceu*) reference. The spliced + intronic (*splici*) reference contains the spliced transcripts plus the (merged and collapsed) intronic sequences of each gene. The `make_splici_txome()` function is designed to make the spliced + intronic reference by taking a genome FASTA file and a gene annotation GTF file as the input. Details about the spliced + intronic can be found in Section S2 of the supplementary file of the [alevin-fry paper](https://www.nature.com/articles/s41592-022-01408-3). To run pyroe, you also need to specify the read length argument `read_length` of the experiment you are working on and the flank trimming length `flank_trim_length`. A final flank length will be computed as the difference between the read_length and flank trimming length and will be attached to the ends of each intron to absorb the intron-exon junctional reads. To make the splici index using `pyroe`, one can call `pyroe make-spliced+intronic` or its alias `pyroe make-splici`. -Following is an example of calling the `pyroe` to make the *splici* index reference. The final flank length is calculated as the difference between the read length and the flank_trim_length, i.e., 5-2=3. This function allows you to add extra spliced and unspliced sequences to the *splici* index, which will be useful when some unannotated sequences, such as mitochondrial genes, are important for your experiment. **Note** : to make `pyroe` work more quickly, it is recommended to have the latest version of [`bedtools`](https://bedtools.readthedocs.io/en/latest/) ([Aaron R. Quinlan and Ira M. Hall, 2010](https://doi.org/10.1093/bioinformatics/btq033)) installed. +Following is an example of calling the `pyroe` to make the *splici* index reference. The final flank length is calculated as the difference between the read length and the flank_trim_length, i.e., 5-2=3. This function allows you to add extra spliced and unspliced sequences to the spliced + intronic index, which will be useful when some unannotated sequences, such as mitochondrial genes, are important for your experiment. **Note** : to make `pyroe` work more quickly, it is recommended to have the latest version of [`bedtools`](https://bedtools.readthedocs.io/en/latest/) ([Aaron R. Quinlan and Ira M. Hall, 2010](https://doi.org/10.1093/bioinformatics/btq033)) installed. ```sh -pyroe make-splici extdata/small_example_genome.fa extdata/small_example.gtf 5 splici_txome \ +pyroe make-spliced+intronic extdata/small_example_genome.fa extdata/small_example.gtf 5 splici_txome \ --flank-trim-length 2 --filename-prefix transcriptome_splici --dedup-seqs ``` The `pyroe` program writes two files to your specified output directory `output_dir`. They are - A FASTA file that stores the extracted splici sequences. -- A three columns' transcript-name-to-gene-name file that stores the name of each transcript in the splici index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts. +- A three columns' transcript-name-to-gene-name file that stores the name of each transcript in the spliced + intronic index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts. ### Full usage ``` -usage: pyroe make-splici [-h] [--filename-prefix FILENAME_PREFIX] +usage: pyroe make-spliced+intronic [-h] [--filename-prefix FILENAME_PREFIX] [--flank-trim-length FLANK_TRIM_LENGTH] [--extra-spliced EXTRA_SPLICED] [--extra-unspliced EXTRA_UNSPLICED] @@ -86,22 +86,22 @@ optional arguments: adding flanking length. ``` -### the *splici* index +### the *spliced + intronic (splici)* index -The *splici* index of a given species consists of the transcriptome of the species, i.e., the spliced transcripts, and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences. For more detailed information, please check the section S2 in the supplementary file of [alevin-fry manuscript](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2). +The spliced + intronic index of a given species consists of the transcriptome of the species, i.e., the spliced transcripts, and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences. For more detailed information, please check the section S2 in the supplementary file of [alevin-fry manuscript](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2). -## Prepare spliceu index for quantification with alevin-fry +## Prepare *spliced + unspliced (_spliceu_)* index for quantification with alevin-fry -Recently, [He et al.](https://www.biorxiv.org/content/10.1101/2023.01.04.522742v1) introduced the *splice*d+*u*nspliced (_spliceu_) index in alevin-fry. This requires the _spliceu_ transcriptome. The command of making an *spliceu* transcriptome reference is similar to making a _splici_ reference: +Recently, [He et al.](https://www.biorxiv.org/content/10.1101/2023.01.04.522742v1) introduced the *splice*d+*u*nspliced (_spliceu_) index in alevin-fry. This requires the spliced + unspliced transcriptome. The command to make an spliced + unspliced transcriptome reference is similar to making a spliced + intronic reference. To make the splici index using `pyroe`, one can call `pyroe make-spliced+unspliced` or its alias `pyroe make-spliceu` ```sh -pyroe make-spliceu extdata/small_example_genome.fa extdata/small_example.gtf spliceu_txome \ +pyroe make-spliced+unspliced extdata/small_example_genome.fa extdata/small_example.gtf spliceu_txome \ --filename-prefix transcriptome_spliceu ``` ### Full usage ``` -usage: pyroe make-spliceu [-h] [--filename-prefix FILENAME_PREFIX] +usage: pyroe make-spliced+unspliced [-h] [--filename-prefix FILENAME_PREFIX] [--extra-spliced EXTRA_SPLICED] [--extra-unspliced EXTRA_UNSPLICED] [--bt-path BT_PATH] [--no-bt] [--dedup-seqs] @@ -350,4 +350,4 @@ optional arguments: The structure that U,S and A counts should occupy in the output matrix. --output-format OUTPUT_FORMAT The format in which the output should be written, one of {'loom', 'h5ad', 'zarr', 'csvs'}. -``` \ No newline at end of file +``` diff --git a/bin/pyroe b/bin/pyroe index c4a4296..7333349 100755 --- a/bin/pyroe +++ b/bin/pyroe @@ -32,7 +32,9 @@ if __name__ == "__main__": # make-splici parser_makeSplici = subparsers.add_parser( - "make-splici", help="Make splici reference" + "make-spliced+intronic", + help="Make spliced + intronic reference", + aliases=['make-splici'] ) parser_makeSplici.add_argument( "genome_path", @@ -106,7 +108,9 @@ if __name__ == "__main__": # make-spliceu parser_makeSpliceu = subparsers.add_parser( - "make-spliceu", help="Make spliceu reference" + "make-spliced+unspliced", + help="Make spliced + unspliced reference", + aliases=['make-spliceu'] ) parser_makeSpliceu.add_argument( "genome_path", diff --git a/setup.cfg b/setup.cfg index 8c8bee3..bb13782 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,6 +1,6 @@ [metadata] name = pyroe -version = 0.7.0 +version = 0.7.1 author = Dongze He, Rob Patro author_email = dhe17@umd.edu, rob@cs.umd.edu description = utilities of alevin-fry diff --git a/src/pyroe/__init__.py b/src/pyroe/__init__.py index 3a7b831..9ce7ffc 100644 --- a/src/pyroe/__init__.py +++ b/src/pyroe/__init__.py @@ -1,4 +1,4 @@ -__version__ = "0.7.0" +__version__ = "0.7.1" from pyroe.load_fry import load_fry from pyroe.make_txome import make_splici_txome, make_spliceu_txome