We introduced 10 amino acid substitutions found betweeen H3N2 A/Italy/11871/2020 (Italy20) and H3N2 A/Singapore/INFIMH-16-0019/2016 (Sing16) to generate all possible combinations (1024 variants) and probe for mutations that can confer compatibility with egg-adaptive mutation L194P.
- ./fasta/Italy20HA_mutlib_ref.fasta: Reference amino acid seqeunce of Italy20HA regions of interests (contains L194P backgroud)
- Raw read files in fastq format from NIH SRA database BioProject PRJNA883249
- Merge overlapping paired-end reads using PEAR
pear -f [FASTQ FILE FOR FORWARD READ] -r [FASTQ FILE FOR FORWARD READ] -o [OUTPUT FASTQ FILE]
- Output files should be placed in a folder named fastq_merged/
- Trim off 5' and 3' flanking region, count variants based on nucleotide sequences, then translate the nucleotide seqeunces into amino acid seqeunces, and finally identify mutations
python3 script/Italy20_HA_fastq2enrich.py
- Input files:
- Merged read files in fastq_merged/ folder
- ./fasta/Italy20HA_mutlib_ref.fasta
- Output files:
- Input files:
-
Filter out the variants that contains L194P
python3 script/Italy20_HA_filter.py
- Input files:
- Output files:
-
Find variants that are highly enriched across both replicates
Rscript script/Italy20_HA_CutOff.R
- Input files:
- Output files:
- ./results/mutation_count_size43.tsv
- Plot the enrichment data across replicates
Rscript script/Italy20_plot_compare_rep.R
- Input files:
- Output files: