YAML-based contrast definition #370

grst · 2024-12-03T10:54:16Z

Description of feature

In #362, we agreed that a yaml-based contrast sheet is more flexible than a csv-based sheet and is better suited to cover future developments of the pipeline. The aim of this issue is to come up with a specification of this format that shall then be defined as a json-schema for validation.

A very minimal version for feature parity with the current contrasts.csv could look something like

models:
  - formula: "~ treatment + response"
    contrasts:
      - id: treatment_A_vs_B
        comparison: ["treatment", "A", "B"]
      - id: response_responder_vs_non_responder
        comparison: ["response", "responder", "non_responder"]

How exactly to specify contrasts will be the topic of a separate issue.

Beyond the minimal information, we could consider including additional parameters here:

method (in light of Facilitate iteration in differential subworkflow modules#7115)
some filtering prameters (e.g. filtering_min_abundance) as they could be method/contrast-specific
method-specific parameters
gene set paths (may be contrast-specific)

models: 
  - formula: ...
    contrasts: ...
    gene_sets:
       - "MSigDB:HALLMARK" # could be queried from omnipath
       - "MSigDB:GO"
       - /path_to_custom_geneset.gmt
    method: DESeq2
    filtering:
      min_abundance: 0.125
      ...
    deseq2:
      lfc_threshold: 1.25
      ...

The question is really what information do want to allow setting at a method/contrast level, and what's fixed globally for the entire pipeline run.
Specifying information locally increases flexibility, but potentially leads to redundant information (need to specify parameters over and over for different models) and increases pipeline complexity (need to keep track of more information from meta rather than accessing global parameters)

CC @apeltzer @tschwarzl @atrigila @nschcolnicov @alanmmobbs93 @suzannejin

The text was updated successfully, but these errors were encountered:

grst · 2024-12-03T12:54:39Z

Taking one step back, a first PR could replace contrasts.csv with contrasts.yaml without implementing any new features, such as formula.
In that case the yaml would look somewhat like

contrasts:
      - id: treatment_A_vs_B
        comparison: ["treatment", "A", "B"]
        blocking_factors: ["response"]

It would be a good first step as it already provides the modules (e.g json validation) for furture additions.

nschcolnicov · 2024-12-09T14:09:21Z

@grst Created this POC for just making the transition from contrasts.csv to contrasts.yaml: #382

Tested it using the test_maxquant profile, I provided the following yaml file:

Old contrasts.tsv file:

id,variable,reference,target,blocking
genotype_celltype_t1_t2,Celltype,T1,T2,
genotype_celltype_t1_FoB,Celltype,T1,FoB,
genotype_celltype_t1_MZ_fakeBatch,Celltype,T1,MZ,fakeBatch
fakebatch_fakeBatch_b1_b2,fakeBatch,b1,b2,

MaxQuant_contrasts.yaml file

contrasts:
  - id: genotype_celltype_t1_t2
    comparison: ["Celltype", "T1", "T2"]

  - id: genotype_celltype_t1_FoB
    comparison: ["Celltype", "T1", "FoB"]

  - id: genotype_celltype_t1_MZ_fakeBatch
    comparison: ["Celltype", "T1", "MZ"]
    blocking_factors: ["fakeBatch"]

  - id: fakebatch_fakeBatch_b1_b2
    comparison: ["fakeBatch", "b1", "b2"]

Running it with
nextflow run ../../main.nf -profile docker,test_maxquant --outdir results -resume --contrasts MaxQuant_contrasts.yaml

This worked ok, I even ran the nf-test for this profile using this PR, passing it the new contrasts yaml file, and it passed without having to update the snaps:

Some caveats that I see:
The current validate_fom_components.R script and a lot of custom functions from the script rely on https://github.com/pinin4fjords/shinyngs, we will have to update this tool, or move some of its functions to custom bin scripts. Also the VALIDATOR process is an nf-core module, so we will also have to update that.

grst · 2024-12-09T14:57:15Z

Thanks @nschcolnicov!

I think shinyngs and differentialabundance are deeply interweaved and both under control from @pinin4fjords. So ultimately, the best way forward seems to update shinyngs rather than duplicating code into the bin folder.

As a next step, could you please create a json schema for contrasts.yaml and introduce some logic to validate it? nf-schema claims it can work also with YAML-based samplesheets. If this works, it might be the most elegant solution.

nschcolnicov · 2024-12-10T14:29:20Z

@grst Waiting on a fix to nf-schema plugin to get the yaml validation to work: nextflow-io/nf-schema#79

pinin4fjords · 2024-12-11T17:35:10Z

Thanks all! I'd like to avoid bin scripts and keep things associated with the modules.

Happy for anyone to make contributions to shinyngs though!

nschcolnicov · 2024-12-18T15:31:23Z

@grst @pinin4fjords
I created a POC for supporting yaml contrasts file: #382
It currently supports a yaml contrasts file that has this format:

contrasts:
  - id: genotype_celltype_t1_t2
    comparison: ["Celltype", "T1", "T2"]

  - id: genotype_celltype_t1_FoB
    comparison: ["Celltype", "T1", "FoB"]

  - id: genotype_celltype_t1_MZ_fakeBatch
    comparison: ["Celltype", "T1", "MZ"]
    blocking_factors: ["fakeBatch"]

  - id: fakebatch_fakeBatch_b1_b2
    comparison: ["fakeBatch", "b1", "b2"]

I also created an issue in the shinyngs package to include these changes: pinin4fjords/shinyngs#67
And I created a POC PR for the tool as well: pinin4fjords/shinyngs#68

Before proceeding with merging any of these PRs we should align on what exactly is the format that we would lile the .yaml to have. @alanmmobbs93 proposed this format in this ticket #371 (comment):

models:
  - formula: "~ treatment"
    contrasts:
      - id: "treatment_mCherry_hND6"
        comparison: ["treatment", "mCherry", "hND6"]

      - id: "treatment_mCherry_hND6_sample_number"
        comparison: ["treatment", "mCherry", "hND6"]
        blocking_factors: ["sample_number"]

      - id: "treatment234"
        comparison: ["treatment", "mCherry", "hND6"]

I created this bin/ script to be able to test any changes to the validate_fom_components.R script from the shinyngs package: https://github.com/nf-core/differentialabundance/pull/382/files#diff-48cc6b0867b0868e90e7d5cd3e5b52ce4931590fe464e0f1314f9ba5eb972a5d

Once we have aligned on exactly how we want the yaml to look like, we can update the script in the PR, test the pipeline, and once that is done, we can proceed to update the shinyngs tool.

grst · 2024-12-19T07:57:00Z

I think we should first focus on the version without explicit model specification. The model is defined implicitly based on the comparison and blocking_factors.

The format proposed by @alanmmobbs93 will be the next iteration: switching to an explicit model definition. But this will require quite some changes also downstream in the pipeline, so in the interest of making the review process by @pinin4fjords smoother, I suggest to separate these two steps.

EDIT: how are we doing with respect to the nf-schema issue?

pinin4fjords · 2024-12-19T11:30:04Z

Yep, agreed, always good to separate things that way

nschcolnicov · 2024-12-20T13:30:04Z

Moving forward with creating a local version of the "shinyngs/validatefomcomponents" module that will use a local bin/ script instead of the one coming from the tool.
Add test profiles that use a yaml contrasts file in the nf-tests. Adding yaml contrasts file to test-datasets repository: nf-core/test-datasets@2a320ce
Rebasing PR towards dev_tmp

pinin4fjords · 2025-01-06T09:10:09Z

Moving forward with creating a local version of the "shinyngs/validatefomcomponents" module that will use a local bin/ script instead of the one coming from the tool.

I'd prefer we just updated it in shinyngs. I can do the release legwork etc.

nschcolnicov · 2025-01-06T12:57:15Z

Hi @pinin4fjords!
The steps to do this would be the following:

Merge the shinyngs PR, I opened one a few weeks ago in case we would follow the approach you mention, I'll add you as a reviewer: https://github.com/pinin4fjords/shinyngs/pull/68/files#diff-e75bd0106bcc8840c00f1505e58ddb3d251aa200b5316df7106a9d3183798561
Release a new shinyngs version.
Update the conda recipe so we can create wave containers from it.
Create a shinyngs module PR in the modules repo.
Finally, create a PR for differential abundance for removing the bin script and updating the module.

Keep in mind that we are looking into adding more changes to the script in the near future, so we will likely need to repeat this process multiple times.
Because of the many steps involved and the amount of PRs needed to do this, I don't think this would be an efficient approach, and I would prefer to keep the custom bin script until we settle on a final version.
Is there a particular reason why you prefer having shinyngs updated on this stage of development?

pinin4fjords · 2025-01-06T13:51:05Z

Yeah, a couple of reasons:

The creation of parallel versions of the same script. That brings the risk of drift and divergence.
I just don't like the separation of module and code - that's why there isn't a bin dir in the workflow already.

Appreciate it's a pain development-wise, but it's nice for production. I've resisted allowing others to create new local components for the same reasons, so it wouldn't be fair for me to not object here as well. Hopefully we can bundle the changes on the shinyngs side so there aren't too many cycles of this.

Is your PR ready for review? I was watching it, but it's marked as draft currently.

nschcolnicov · 2025-01-06T18:03:24Z

@pinin4fjords I see, ok makes sense then! Let me review it, I just converted it into a PR and I already see an error in the CI tests. I'll address this issue and tag you once its ready for review

pinin4fjords · 2025-01-08T11:18:16Z

Thanks! I was OOO yesterday, but will take a look ASAP

grst added the enhancement New feature or request label Dec 3, 2024

grst mentioned this issue Dec 3, 2024

More flexible model and contrast definition #362

Open

nschcolnicov self-assigned this Dec 9, 2024

nschcolnicov mentioned this issue Dec 9, 2024

POC contrasts csv -> yaml #382

Merged

11 tasks

This was referenced Dec 18, 2024

Update tool to allow yaml contrasts files pinin4fjords/shinyngs#67

Open

POC for supporting yaml contrasts pinin4fjords/shinyngs#68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML-based contrast definition #370

YAML-based contrast definition #370

grst commented Dec 3, 2024 •

edited

Loading

grst commented Dec 3, 2024

nschcolnicov commented Dec 9, 2024

grst commented Dec 9, 2024

nschcolnicov commented Dec 10, 2024

pinin4fjords commented Dec 11, 2024

nschcolnicov commented Dec 18, 2024 •

edited

Loading

grst commented Dec 19, 2024 •

edited

Loading

pinin4fjords commented Dec 19, 2024

nschcolnicov commented Dec 20, 2024

pinin4fjords commented Jan 6, 2025

nschcolnicov commented Jan 6, 2025

pinin4fjords commented Jan 6, 2025

nschcolnicov commented Jan 6, 2025

pinin4fjords commented Jan 8, 2025

YAML-based contrast definition #370

YAML-based contrast definition #370

Comments

grst commented Dec 3, 2024 • edited Loading

Description of feature

grst commented Dec 3, 2024

nschcolnicov commented Dec 9, 2024

grst commented Dec 9, 2024

nschcolnicov commented Dec 10, 2024

pinin4fjords commented Dec 11, 2024

nschcolnicov commented Dec 18, 2024 • edited Loading

grst commented Dec 19, 2024 • edited Loading

pinin4fjords commented Dec 19, 2024

nschcolnicov commented Dec 20, 2024

pinin4fjords commented Jan 6, 2025

nschcolnicov commented Jan 6, 2025

pinin4fjords commented Jan 6, 2025

nschcolnicov commented Jan 6, 2025

pinin4fjords commented Jan 8, 2025

grst commented Dec 3, 2024 •

edited

Loading

nschcolnicov commented Dec 18, 2024 •

edited

Loading

grst commented Dec 19, 2024 •

edited

Loading