Functional consequences of reductive protein evolution in a minimal eukaryotic genome

Authors: Jason Jiang, Rui Qu, Maria Grigorescu, Winnie Zhao, Aaron Reinke

Read the pre-print here: https://www.biorxiv.org/content/10.1101/2023.12.31.573788v2

This repo contains the environment and Snakemake pipeline needed to enact the main workflow in Figure 1, that is used throughout the paper

Prerequisites

Local installation of Singularity >= 3.10
Local installation of Python >= 3.10

Set-up

1. Clone repo to local machine, then cd into repo

2. Run set-up script to initialize Python virtual environment for Snakemake

NOTE: if you run into permission issues, try running chmod u+x setup.sh

./setup.sh

source venv_snakemake/bin/activate

snakemake --help

3. Initialize Singularity container for OrthoFinder, HMMER, cath-resolve-hits + all required R packages

sudo singularity build container.sif container.def

4. Test Snakemake workflow + Singularity container

NOTE: you may see a message like "System has not been booted with systemd as init system (PID 1). Can't operate."

This does NOT affect our workflow, and only concerns datetime operations with R tidyverse (which we don't use)

cd workflow

snakemake --cores all --use-singularity singularity_test

Running the main workflow

1. Put your proteome fasta files of interest into data/proteomes. Four partial sample proteomes are included for testing

2. Open snakemake_config.yaml, and edit parameters as necessary

3. Download Pfam 35.0 from https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/, download Pfam-A.hmm.gz and Pfam-A.clans.tsv.gz, extract then move to data/pfam (or as specified in snakemake_config.yaml)

4. Run main workflow

   cd workflow  # assuming you're not already in that folder
   
   snakemake --cores all --use-singularity

Ancilliary scripts to create figures (to be run interactively)

Note: these scripts require that you have already run the main workflow, so all orthogroups and domain architectures have already been assigned to your input proteomes.

Figure 2

Fig 2A - workflow/scripts/plot_ortholog_length_distributions.R

Fig 2B - workflow/scripts/plot_domain_arch_change_freqs.R

Fig 2C - workflow/scripts/compare_domain_and_linker_lengths.R

Figure 3

Fig 3B, 3C, 3D - workflow/scripts/classify_lost_c_terminal_residues.R

Figure 4

Fig 4B - workflow/scripts/annotate_domain_essentiality_with_ptcs.R

Figure 5

Fig 5A - workflow/scripts/domain_arch_hierarchical_clustering.R

Fig 5B - workflow/scripts/create_lost_doms_tsne.R

Figure S1

workflow/scripts/annotate_domain_essentiality_with_ptcs.R

Figure S2

workflow/scripts/plot_ortholog_length_distributions.R

Figure S3

workflow/scripts/compare_ortholog_lengths_to_percent_identities.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Functional consequences of reductive protein evolution in a minimal eukaryotic genome

Authors: Jason Jiang, Rui Qu, Maria Grigorescu, Winnie Zhao, Aaron Reinke

Prerequisites

Set-up

1. Clone repo to local machine, then cd into repo

2. Run set-up script to initialize Python virtual environment for Snakemake

NOTE: if you run into permission issues, try running chmod u+x setup.sh

3. Initialize Singularity container for OrthoFinder, HMMER, cath-resolve-hits + all required R packages

4. Test Snakemake workflow + Singularity container

NOTE: you may see a message like "System has not been booted with systemd as init system (PID 1). Can't operate."

This does NOT affect our workflow, and only concerns datetime operations with R tidyverse (which we don't use)

Running the main workflow

1. Put your proteome fasta files of interest into data/proteomes. Four partial sample proteomes are included for testing

2. Open snakemake_config.yaml, and edit parameters as necessary

3. Download Pfam 35.0 from https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/, download Pfam-A.hmm.gz and Pfam-A.clans.tsv.gz, extract then move to data/pfam (or as specified in snakemake_config.yaml)

4. Run main workflow

Ancilliary scripts to create figures (to be run interactively)

Figure 2

Figure 3

Figure 4

Figure 5

Figure S1

Figure S2

Figure S3

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
resources		resources
singularity		singularity
workflow		workflow
README.md		README.md
setup.sh		setup.sh
snakemake_config.yaml		snakemake_config.yaml

Jason-B-Jiang/reductive-microsporidia-evolution

Folders and files

Latest commit

History

Repository files navigation

Functional consequences of reductive protein evolution in a minimal eukaryotic genome

Authors: Jason Jiang, Rui Qu, Maria Grigorescu, Winnie Zhao, Aaron Reinke

Prerequisites

Set-up

1. Clone repo to local machine, then cd into repo

2. Run set-up script to initialize Python virtual environment for Snakemake

NOTE: if you run into permission issues, try running chmod u+x setup.sh

3. Initialize Singularity container for OrthoFinder, HMMER, cath-resolve-hits + all required R packages

4. Test Snakemake workflow + Singularity container

NOTE: you may see a message like "System has not been booted with systemd as init system (PID 1). Can't operate."

This does NOT affect our workflow, and only concerns datetime operations with R tidyverse (which we don't use)

Running the main workflow

1. Put your proteome fasta files of interest into data/proteomes. Four partial sample proteomes are included for testing

2. Open snakemake_config.yaml, and edit parameters as necessary

3. Download Pfam 35.0 from https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/, download Pfam-A.hmm.gz and Pfam-A.clans.tsv.gz, extract then move to data/pfam (or as specified in snakemake_config.yaml)

4. Run main workflow

Ancilliary scripts to create figures (to be run interactively)

Figure 2

Figure 3

Figure 4

Figure 5

Figure S1

Figure S2

Figure S3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages