Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate lens and find out what tools can be integrated into version 2 #22

Closed
nikhil opened this issue Mar 21, 2024 · 5 comments
Closed
Assignees

Comments

@nikhil
Copy link
Member

nikhil commented Mar 21, 2024

No description provided.

@nikhil
Copy link
Member Author

nikhil commented Mar 21, 2024

@nikhil nikhil changed the title investiage lens and find out what tools can be intergrated into version 2 investigate lens and find out what tools can be integrated into version 2 Mar 21, 2024
@nikhil
Copy link
Member Author

nikhil commented Mar 21, 2024

Investigate how it works with indels

@nikhil
Copy link
Member Author

nikhil commented Apr 4, 2024

Currently we are looking at these two scripts and seeing what we are doing differently

https://github.com/taylor-lab/neoantigen-dev/blob/master/neoantigen.py

https://github.com/spvensko/lenstools/blob/main/lenstools.py

@pintoa1-mskcc
Copy link
Collaborator

pintoa1-mskcc commented Apr 11, 2024

Comparing how peptides are generated between lenstools and our neoantigen script, our script is a lot more elegant. Allison had asked me how we generate the peptides compared to lenstools, primarily asking if they translate out the entire peptide or not. Neither script retains the entire peptide for netMHC, both add some padding so additional aas are added to upstream and downstream of the indel.

Lenstools uses this workflow: lenstools.nf (readme, called by: lens.nf-> neos.nf --> seq_variation.nf ) to generate var_tx_seqs. Var_tx_seqs are individual FA files PER MUTATION, generated from bcftools. Command: bcftools consensus -H A --mark-del X --mark-ins lc -f bcftools ref. I tried to find the input FA file, but its nearly impossible with how they buried it. Its in samtools.nf as samtools_faidx_fetch_somatic_folder_parameters, however without knowing the inputs I cannot tell what is what. From demo_inputs I would guess it is just the whole genome FASTA file, not a cds FA file.

Actual generation of peptides is here: make_indel_peptides_context
This script pulls the created FA file per variant as the FA sequence (selecting only cds regions). Then depending if the sequence is on the negative strand, it returns the reverse compliment of the sequence. Lenstools adds a 30 bp padding to conservative inframe indels (so 10 extra amino acids are added to both sides of the peptide from the indel, or if a stop codon is reached) and will translate frameshift indels from 30 bases upstream of the indel to the end of the peptide (or if a stop codon is reached). Script deletes any "X" created by bctools consensus and generated peptides off of the resulting sequence with Bio.Seq. This script never looks at the WT peptide sequence, entirely depends on bcftools consensus FA output.

Taylorlab script pulls the cds and cdna sequences from the provided reference files. Then utilizing the maf's HGVSc column, it alters the cds sequence by either inserting the insertion sequence or removing the reference allele for deletions at the position stated. Once the WT and MUT peptide sequences are identified, script manually converts entire sequence to amino acid sequences. Script then compares WT to MUT amino acid sequences, identifying location where AAs do not match. Script will then return the altered amino acid (plus 10 aas before and after altered region).

Result: Lenstools and taylorlab scripts both return peptides with a padding of 10 AAs. Primary difference appears to be flipping the sequence when the strand is negative, however I am unsure if that is due to the reference FASTA that lenstools inputs. It may not be a cds sequence as we are inputting.

@pintoa1-mskcc
Copy link
Collaborator

Discussed with mark, FA file is a genomic FASTA file, this is likely being done because prior to conversion of indel, germline variants are integrated. This may be something to look into further when we add germline variants to our pipeline.

Will continue investigating LENStools

@nikhil nikhil closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants