Skip to content

Code used for main bioinformatics workflow in "Functional consequences of reductive protein evolution in a minimal eukaryotic genome"

Notifications You must be signed in to change notification settings

Jason-B-Jiang/reductive-microsporidia-evolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Functional consequences of reductive protein evolution in a minimal eukaryotic genome

Authors: Jason Jiang, Rui Qu, Maria Grigorescu, Winnie Zhao, Aaron Reinke

Read the pre-print here: https://www.biorxiv.org/content/10.1101/2023.12.31.573788v2

This repo contains the environment and Snakemake pipeline needed to enact the main workflow in Figure 1, that is used throughout the paper

Prerequisites

  • Local installation of Singularity >= 3.10
  • Local installation of Python >= 3.10

Set-up

1. Clone repo to local machine, then cd into repo

2. Run set-up script to initialize Python virtual environment for Snakemake

NOTE: if you run into permission issues, try running chmod u+x setup.sh

./setup.sh

source venv_snakemake/bin/activate

snakemake --help

3. Initialize Singularity container for OrthoFinder, HMMER, cath-resolve-hits + all required R packages

sudo singularity build container.sif container.def

4. Test Snakemake workflow + Singularity container

NOTE: you may see a message like "System has not been booted with systemd as init system (PID 1). Can't operate."

This does NOT affect our workflow, and only concerns datetime operations with R tidyverse (which we don't use)

cd workflow

snakemake --cores all --use-singularity singularity_test

Running the main workflow

1. Put your proteome fasta files of interest into data/proteomes. Four partial sample proteomes are included for testing

2. Open snakemake_config.yaml, and edit parameters as necessary

3. Download Pfam 35.0 from https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/, download Pfam-A.hmm.gz and Pfam-A.clans.tsv.gz, extract then move to data/pfam (or as specified in snakemake_config.yaml)

4. Run main workflow

   cd workflow  # assuming you're not already in that folder
   
   snakemake --cores all --use-singularity

Ancilliary scripts to create figures (to be run interactively)

Note: these scripts require that you have already run the main workflow, so all orthogroups and domain architectures have already been assigned to your input proteomes.

Figure 2

Fig 2A - workflow/scripts/plot_ortholog_length_distributions.R

Fig 2B - workflow/scripts/plot_domain_arch_change_freqs.R

Fig 2C - workflow/scripts/compare_domain_and_linker_lengths.R

Figure 3

Fig 3B, 3C, 3D - workflow/scripts/classify_lost_c_terminal_residues.R

Figure 4

Fig 4B - workflow/scripts/annotate_domain_essentiality_with_ptcs.R

Figure 5

Fig 5A - workflow/scripts/domain_arch_hierarchical_clustering.R

Fig 5B - workflow/scripts/create_lost_doms_tsne.R

Figure S1

workflow/scripts/annotate_domain_essentiality_with_ptcs.R

Figure S2

workflow/scripts/plot_ortholog_length_distributions.R

Figure S3

workflow/scripts/compare_ortholog_lengths_to_percent_identities.R

About

Code used for main bioinformatics workflow in "Functional consequences of reductive protein evolution in a minimal eukaryotic genome"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published