-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[phylo github actions] add summary message #75
Closed
jameshadfield
wants to merge
7
commits into
james/snakemake-simplifications
from
james/improved-gha-summaries
Closed
[phylo github actions] add summary message #75
jameshadfield
wants to merge
7
commits into
james/snakemake-simplifications
from
james/improved-gha-summaries
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This abstracts out the configuration into two separate YAML files. As a result the snakemake complexity is reduced and (hopefully) the main interaction point can now be the YAMLs themselves. There are a few changes to the behaviour of the h5n1-cattle-outbreak builds: * For individual segment builds we now use the h5n1-cattle-outbreak dropped list (previously we used the H5N1 drop list) * For individual segment builds we now use the H5NX input data (previously we used the H5N1 input data) * We no longer remove sequences via a clock filter There are no changes to the behaviour of the GISAID builds. The config is generally straightforward except where parameters differ for a genome build vs the corresponding segment builds. To avoid having to list the same parameters out 8 times, I implemented (e.g.) `config.traits.genome_columns` and `config.traits.columns`. This is only observed in the h5n1-cattle-outbreak config.
The rules in `common.smk` were separated out to reduce rule duplication between the main Snakefile and Snakefile.genome. The latter has since been integrated into the main snakefile, and so we do the same with these "common" rules.
This results in disjoint sets of filenames for the GISAID builds (config/gisaid.yaml) and the NCBI builds (config/h5n1-cattle-outbreak.yaml), which therefore allows you to run each set of builds locally without one interfering with the other. In addition, the way local-ingest data can be used is streamlined so that you can achieve the same outcome with local data. Note that if you run (e.g.) GISAID builds using local data then run them with S3 data all the intermediate files will be regenerated. In other words you cannot maintain parallel "versions" of these simultaneously.
Makes listing / looking at the results files a more pleasant experience There should be no changes to behaviour with this commit.
The pipeline already adds this field to the metadata TSV in-use, but it won't be exported without this addition to the auspice-config JSON Note that the clade definitions haven't been regenerated for NCBI data so there's actually no clades defined at the moment, and thus nothing is exported.
to reflect the changes made in the previous few commits. The addition of "genome" to the h5n1-cattle-outbreak config YAML is needed to make it an explicit output of the `all` rule, and this output is what's used by the `deploy_all` rule
The `pathogen-repo-build` reusable action adds an extremely helpful summary describing the AWS run. This adds some similarly helpful info which should make it much simpler to check the results of a phylo run.
joverlee521
approved these changes
Jul 16, 2024
jameshadfield
force-pushed
the
james/snakemake-simplifications
branch
from
July 16, 2024 23:24
736b114
to
6b9bc7c
Compare
Cherry-picked into #72 - thanks for taking a look @joverlee521! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
pathogen-repo-build
reusable action adds an extremely helpful summary describing the AWS run. This adds some similarly helpful info which should make it much simpler to check the results of a phylo run.Trial run(s):