Skip to content

Commit

Permalink
Update pipeline to download and use SRAs from GEO (#1)
Browse files Browse the repository at this point in the history
* rework from sra-download

* linting

* through sras and trimming

* through making rsem-star reference

* adding velocity calculations

* working through making a_obs

* copying new results

* finish pipeline through making dataframes

* comment out testing

* update input link

* add kinase results to all to prompt full run
  • Loading branch information
Anthony authored Jun 9, 2021
1 parent 5c77ce1 commit cb4a070
Show file tree
Hide file tree
Showing 25 changed files with 1,892 additions and 204 deletions.
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.snakemake
ensembl
output
resources/ensembl
results
input
ESCG_data
**/.DS_Store
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "SingleCellProteogenomics"]
path = SingleCellProteogenomics
url = https://github.com/CellProfiling/SingleCellProteogenomics.git
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,21 @@ This repository contains the _snakemake_ pipeline for analyzing the RNA sequenci

## Single-cell sequencing files

The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773).
The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773).

This data is downloaded automatically in this pipeline.

## Updating the Ensembl version

The genome and ensembl versions are located at the top of the file `Snakefile`.
The genome and Ensembl versions are located at the top of the file `Snakefile`.
These can be updated, and the references will be downloaded automatically.

## Usage

1) Clone repository and initialize submodules: `git clone --recurse-submodules https://github.com/CellProfiling/FucciSingleCellSeqPipeline.git && cd FucciSingleCellSeqPipeline`
1) Install conda: https://docs.conda.io/en/latest/miniconda.html
2) Create the conda environment: `conda env create --file environment.yaml --name cellquant`
3) Activate the conda environment: `conda activate cellquant`
4) Run the workflow: recommended command is `snakemake --cores 24 --resources mem_mb=100000`, where you can subsitute the max number of cores and max memory allocation. The memory allocation should be at least 50000 MB if possible. It might work with 32000 MB, but no guarantees.
2) Install snakemake using conda: `conda install -c conda-forge snakemake-minimal`
4) Run the workflow: `snakemake --use-conda --cores 24 --resources mem_mb=100000`, where you can subsitute the max number of cores and max memory allocation. At least 54 GB of free memory should be available.

## Citation

Expand Down
1 change: 1 addition & 0 deletions SingleCellProteogenomics
65 changes: 0 additions & 65 deletions Snakefile

This file was deleted.

16 changes: 0 additions & 16 deletions environment.yaml

This file was deleted.

File renamed without changes.
Loading

0 comments on commit cb4a070

Please sign in to comment.