Code to analyze deep sequencing files of COV107-23 combinatorial mutations, and ddG simulations of COV107-23 mutations
Analysis of deep sequencing data following fluorescence-activated cell sorting of COV107-23 combinatorial mutants
Due to large size of fastq files, they cannot be uploaded to Github. Create a /fastq/ folder. Please download fastq files from NCBI, and transfer to fastq folder.
- ./Fasta/COV107_germline_ref.fa: Amino acid sequence of COV107-23 germline
- ./fastq/Sample1_ATCACGAT_L001_R1_001.fastq: Input expression forward reads
- ./fastq/Sample1_ATCACGAT_L001_R2_001.fastq: Input expression reverse reads
- ./fastq/Sample2_CGATGTAT_L001_R1_001.fastq: Sorted expression forward reads (Replicate 1)
- ./fastq/Sample2_CGATGTAT_L001_R2_001.fastq: Sorted expression reverse reads (Replicate 1)
- ./fastq/Sample3_TTAGGCAT_L001_R1_001.fastq: Sorted expression forward reads (Replicate 2)
- ./fastq/Sample3_TTAGGCAT_L001_R2_001.fastq: Sorted expression reverse reads (Replicate 2)
- Calculate read counts and fitness from fastq files. Create a /fastq/ folder to store all downloaded fastq files.
python scripts/COV107SHM_fq2fit.py
- Input files
- ./Fasta/COV107_germline_ref.fa: Amino acid sequence of COV107-23 germline
- ./fastq/Sample1_ATCACGAT_L001_R1_001.fastq: Input expression forward reads
- ./fastq/Sample1_ATCACGAT_L001_R2_001.fastq: Input expression reverse reads
- ./fastq/Sample2_CGATGTAT_L001_R1_001.fastq: Sorted expression forward reads (Replicate 1)
- ./fastq/Sample2_CGATGTAT_L001_R2_001.fastq: Sorted expression reverse reads (Replicate 1)
- ./fastq/Sample3_TTAGGCAT_L001_R1_001.fastq: Sorted expression forward reads (Replicate 2)
- ./fastq/Sample3_TTAGGCAT_L001_R2_001.fastq: Sorted expression reverse reads (Replicate 2)
- Output file
- Filter results
python scripts/COV107SHM_filter_result.py
- Input file
- Output file
- Plot correlation of expression fitness between two independent experimental replicates
Rscript scripts/COV107_filtered.R
- Input file
- Output file
- Python (version 3.9)
- Rosetta Software Suite
- Either an academic or commercial license is required. One can request a license in the link above.
- PyMOL
- pdb-tools
- Using PyMOL and the pdb file (PDB: 7LKA), remove the solvent by running, in the PyMOL terminal
remove solvent
- After removing the solvent, only select chain A (antibody heavy chain) and chain B (antibody light chain) by running, in the PyMOL terminal
remove chain C+D+E+F+H+L
-
Export the new molecule and retain atom IDs as COV107.pdb.
-
Using pdb_reres.py in pdb-tools and the COV107.pdb file, run in the terminal
python pdb_reres.py COV107.pdb > COV107_renum.pdb
COV107_renum.pdb is now ready to be used as input for ddG prediction using Rosetta.
II. Predicting ddG using a modified high-resolution protocol of the ddG_monomer application in Rosetta
Link to ddG_monomer documentation: https://www.rosettacommons.org/docs/latest/application_documentation/analysis/ddg-monomer Instead of 50 iterations, only 30 iterations were performed.
- Pre-minimize the input structure COV107_renum.pdb
nohup /path/to/rosetta/main/source/bin/minimize_with_cst.static.linuxgccrelease -s /path/to/COV107_renum.pdb -in:file:fullatom -ignore_zero_occupancy false -ignore_unrecognized_res -fa_max_dis 9.0 -database /path/to/rosetta/main/database/ -ddg::harmonic_ca_tether 0.5 -score:weights /path/to/rosetta/main/database/scoring/weights/pre_talaris_2013_standard.wts -restore_pre_talaris_2013_behavior -ddg::constraint_weight 1.0 -ddg::out_pdb_prefix min_cst_0.5 -ddg::sc_min_only false -score:patch /path/to/rosetta/main/database/scoring/weights/score12.wts_patch > mincst.log 2>&1 </dev/null &
- Input file
- Output file
- Convert the .log file to a .cst file
bash /path/to/rosetta/main/source/src/apps/public/ddg/convert_to_cst_file.sh ./mincst.log > ./Constraint.cst
- Input file
- Output file
- Perform ddG prediction in the background. Perform 3 independent replicates.
nohup /path/to/rosetta/main/source/bin/ddg_monomer.static.linuxgccrelease -in:file:s /path/to/min_cst_0.5.COV107_renum_0001.pdb -ignore_zero_occupancy false -resfile F27I.resfile -ddg:weight_file soft_rep_design -ddg:minimization_scorefunction /path/to/rosetta/main/database/scoring/weights/pre_talaris_2013_standard.wts -restore_pre_talaris_2013_behavior -ddg::minimization_patch /path/to/rosetta/main/database/scoring/weights/score12.wts_patch -database /path/to/rosetta/main/database/ -fa_max_dis 9.0 -ddg::iterations 30 -ddg::dump_pdbs true -ignore_unrecognized_res -ddg::local_opt_only false -ddg::min_cst true -constraints::cst_file /path/to/Constraint.cst -ddg::suppress_checkpointing true -in::file::fullatom -ddg::mean false -ddg::min true -ddg::sc_min_only false -ddg::ramp_repulsive true -unmute core.optimization.LineMinimizer -ddg::output_silent false -out:path:all /path/to/F27I_rep1/ 2>&1 </dev/null &
- Input file
- Output file
- ddg_predictions.out for each replicate
- Compile total scores for all mutations.
- Input file
- Scores from ddg_predictions.out
- Output file