Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard logging of config values #18

Open
joverlee521 opened this issue Nov 27, 2023 · 8 comments
Open

Add standard logging of config values #18

joverlee521 opened this issue Nov 27, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Context

With many layers of Snakemake configs provided via default configs and/or CLI options (--configfile/--config), it is helpful to have a standard way of logging the config values used for a workflow run.

Possible solutions

  1. This is done in the ncov workflow with a dump_config rule. Users must specify the target with the same configs as their workflow run to see the config output.

  2. We could print out the config with each workflow run using the onstart handler. However, Snakemake docs note that these handlers are not triggered during dry-runs.

onstart:
    import yaml, sys
    yaml.dump(config, sys.stdout, explicit_start = True, explicit_end = True)
  1. We could print out the config with each workflow run using Snakemake's logger:
import yaml
from snakemake.logging import logger

# Use default configuration values. Override with Snakemake's --configfile/--config options.
configfile: "config/defaults.yaml"

logger.info(f"Config is:\n{yaml.dump(config, explicit_start = True, explicit_end = True)}")
  1. If the config output is too noisy, we can make it a debug level log that will only output if users provide the --verbose flag.
import yaml
from snakemake.logging import logger

# Use default configuration values. Override with Snakemake's --configfile/--config options.
configfile: "config/defaults.yaml"

logger.debug(f"Config is:\n{yaml.dump(config, explicit_start = True, explicit_end = True)}")
@joverlee521 joverlee521 added the enhancement New feature or request label Nov 27, 2023
@tsibley
Copy link
Member

tsibley commented Jan 10, 2024

Option 3 is enticing because it means the actual config in use is always in build logs, so when something unexpectedly goes wrong you can inspect the config (without having to reconstruct it in a separate subsequent run).

@jameshadfield
Copy link
Member

jameshadfield commented Jan 11, 2024

Option 3 Option 5 👍

@tsibley
Copy link
Member

tsibley commented Dec 11, 2024

Option 5: Always dumping the final, fully-expanded, config-as-loaded to an output file, e.g. results/config.yaml.¹

This seems like the best option to me,

  • It's written to disk, so not hard to find in all the logging output or lost to scrollback limits.

  • It can be directly re-used as input config to reproduce/replicate the config for any given run (rather than having to be snipped out of logs, which can lead to errors).

and @jameshadfield and @joverlee521 +1'd this idea where I originally suggested it.

We could also do option 3 + 5 to get best of both worlds, which could be useful in rare cases when result files are lost (e.g. an out-of-disk-space Batch job), although 3 gets quite lengthy potentially.

@genehack
Copy link
Contributor

+1 for option 5, with the "yes, but" addition of "don't call the output results/config.yaml, call it something like results/run_config.yaml or results/config_log.yaml. Consider including a timestamp in the name.

@jameshadfield
Copy link
Member

Related thread where discussion about this is happening in parallel.

@joverlee521
Copy link
Contributor Author

Chatted with @j23414 and @kimandrews on this, they both like the config output file that can easily be re-used for another workflow run.

@tsibley
Copy link
Member

tsibley commented Dec 16, 2024

@genehack

don't call the output results/config.yaml, call it something like results/run_config.yaml or results/config_log.yaml.

Why? What makes the addition of run_ or _log important/useful?

(I can guess, but my guesses aren't compelling reasons (in my estimation), so I'm interested in your rationale.)

@genehack
Copy link
Contributor

@genehack

don't call the output results/config.yaml, call it something like results/run_config.yaml or results/config_log.yaml.

Why? What makes the addition of run_ or _log important/useful?

If it's not named config.yaml, it can't get confused with defaults/config.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants