Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented RepeatMasker and Liftoff Strategy #84

Conversation

zyosufzai
Copy link

Description

To allow tostadas to annotate variola and mpox genes implemented a RepeatMasker and Liftoff (cli version) subworkflow. Repeat masker will find the repeat regions and Liftoff will annotate the rest of the features. A concat gff module was also created that runs a python script to concatenate the two gffs into a suitable gff format for Genbank submission. Full list of changes:

  • Added liftoff cli and concat modules
  • Implemented new python script to concat and correctly format liftoff and repeatmasker gff
  • Added entry subworkflow for the repeatmasker_liftoff strategy and modified workflow and mpxv.nf files
  • Created two custom libraries for repeatmasker one for MPOX, the other for VARV. If you want to add your own custom library use 'params.repeat_lib =' in config file, if you choose to use one of ours state 'varv' to use the 'varv' custom lib or 'mpox' to use the mpox custom lib in the 'organism' field

Checklist

Go Through Checklist Below and Place A ✔️ (X Inside the Box) if Completed

General Checks

  • Have you run appropriate tests (unit/integration/end-to-end) to check logic across run environments (Conda/Docker/Singularity on Scicomp/AWS/NF Tower/Local)?

    For each relevant configuration:

    • Can the program run completely through without erroring out? yes
    • Does it produce the expected outputs, given the inputs provided? yes
  • Have you conducted proper linting procedures?

    • Numpy formatted docstrings for functions
    • Comments explaining lines of code
    • Consistent and intuitive naming conventions for variables, functions, classes, methods, attributes, and scripts
    • Single empty line between class functions, two lines between non-class functions, and two lines between imports and code body
    • Camel case formatting for class names
  • [] Have you updated existing documentation (README.md, etc.) or created new ones within docs?

CDC Checks

  • Did you check for sensitive data, and remove any?
  • [] If you added or modified HTML, did you check that it was 508 compliant?

Are additional approvals needed for this change? If so, please mention them below:

Are there potential vulnerabilities or licensing issues with any new dependencies introduced? If so, please mention them below:
Although modules have been configured for submission, submission has not been tested yet. Also, the module expects one gff from each repeat masker and liftoff to concatenate as the cli version of Liftoff doesn't handle multifasta (but repeat masker does).
potential solution to go around this would be to create a sample sheet or incorporate a channel that splits the fasta file, like so
Channel
.fromPath(params.fasta_path)
.splitFasta( record: [id: true, seqString: true ])
.set { ch_fasta }

@zyosufzai zyosufzai linked an issue Nov 17, 2023 that may be closed by this pull request
1 task
@kyleoconnell kyleoconnell self-assigned this Dec 5, 2023
@kyleoconnell kyleoconnell changed the base branch from master to dev December 6, 2023 21:39
@kyleoconnell kyleoconnell merged commit 74b25e9 into dev Dec 6, 2023
@kyleoconnell kyleoconnell deleted the 65-external-feature-running-variola-sequences-through-the-pipeline branch December 6, 2023 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[External] [Feature] Running Variola Sequences through Pipeline
2 participants