More flexible model and contrast definition #362

grst · 2024-12-02T08:58:38Z

Description of feature

Currently, models and contrasts to compare are defined in the contrast.csv file as follows:

id,variable,reference,target,blocking

This limits contrasts to pairwise comparisons of a single variable and the model definition is implicit based on the variable and blocking variables. More complex comparisons such as interaction terms or mixed effects are not possible or require workarounds like constructing artificial variables combining multiple variables.

I propose to make the model definition more flexible by

explicitly specifying the model using a Wilkinson Formula (e.g. ~ treatment + response (variable + covariate) , ~ treatment * response (interaction model). This would also extend to mixed effects models, should they be added in the future, e.g. ~ treatment * timepoint + (1 | patient_id)
Define one or multiple contrasts per model. Multiple tests can be performed for a single model. This is also computationally more efficient since the model only needs to be fitted once.

How to define contrasts

There are different ways to define contrasts beyond simple comparisons. We need to discuss which ones to support and how.

(variable, baseline, target) tuples -> I'd definitely keep this one way or another as this cover a lot of cases and is very intutitive to define. However, we still need alternatives for more complex cases
Specifying coefficient names, e.g. treatmenta:responseresponder
Contrast formula, e.g. responseresponder - responsenon_responder, or cond(response="responder") - cond(response="non_responder")
Contrast vector (the most general case), e.g. (0, -1, 1, 0, 0, 0, 0)

The challenge for 2-3 is to define this in a way that's method-agnostic. We'd likely need a step that converts a coefficient name or contrast formula into a contrast vector. One such way is limma's makeContrasts function. Another way is the cond() method implemented in glmGamPoi:

cond(response="responder", treatment="A") - cond(response="non_responder", treatment="B")

If anyone's aware of other alternatives, I'm happy to consider them.

Configuration format

In principle it would still be possible to use a CSV file, e.g.

id	formula	contrast	comment
treatment_a_vs_b	~ treatment + response	treatment;a;b	simple comparison
response_responder_vs_non_responder	~ treatment + response	response;responder;non_responder	same model, different contrast
response_responder_vs_non_responder2	~ treatment + response	responseresponder - responsenon_responder	same comparison, different way to specify contrast
treatment_response_interaction	~ treatment * response	treatmenta:responseresponder	listing a coefficient rather than a comparison

Alternatively, I could see benefits from switching to json/yaml with an appropriate json schema for validation. IMO multiple contrasts per model and different ways to specify contrasts could be better represented with a hierarchical configuration format than csv. Having ;-separated fields within a csv column as it is currently implemented for blocking is a bit of a red flag as it's hard to read and hard to validate.

models:
  - formula: "~ treatment + response"
    contrasts: 
      - id: treatment_a_vs_b
        type: simple
        comparison: ["treatment", "A", "B"]
      - id: response_responder_vs_non_responder2
        type: formula
        comparison:  responseresponder - responsenon_responder
      - id: response_responder_vs_non_responder2
        type: cond
        comparison:  "cond(response="responder") - cond(response="non_responder")"
  - formula: "~ treatment * response"
    contrasts:
      - id: treatment_response_interaction
        type: coefficient
        comparison: treatmenta:responseresponder

LMK what you think

CC @apeltzer @tschwarzl @nschcolnicov @atrigila @alanmmobbs93
FYI @suzannejin

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2024-12-02T11:08:52Z

Thanks for the issue! Feedback:

The YAML file seems to be a no-brainer for me, think that would work well. That might be a good place to start to provide a basis for further work.
Support for model types beyond what is currently possible can only be a good thing, and I would support changes to the interface necessary to accomplish that.

But:

I would not be in favour of supporting multiple ways of specifying the same things for the sake of it, only where necessary for the new use cases. We should pick the simplest extension to (or if necessary, replacement of) the existing interface (in YML form).
I've been a little worried about the suggestion of explicit model specification in the past, just because it would be easy for a user to supply a model incompatible with a sample sheet (for example). So if we allow explicit model input we also need to take care to have validation for models to explain to users what's wrong early-on.
- A model validation step would actually be useful even now, since people e.g. try to use variables as batch effects when they are associated with main contrast variables.

I'd suggest a couple of POC PRs, and going from there.

grst · 2024-12-02T12:07:49Z

Support for model types beyond what is currently possible can only be a good thing, and I would support changes to the interface necessary to accomplish that.

I agree, we should settle on 1, max. 2 options. My point was mainly to provide an overview of what would be possible.

I've been a little worried about the suggestion of explicit model specification in the past, just because it would be easy for a user to supply a model incompatible with a sample sheet (for example). So if we allow explicit model input we also need to take care to have validation for models to explain to users what's wrong early-on.
A model validation step would actually be useful even now, since people e.g. try to use variables as batch effects when they are associated with main contrast variables.

What kind of validation steps do you invisage here? Wouldn't such a model fail anyway because the design matrix is not full-rank?

From your response I deduce the following follow-up tasks

implement prototype of yaml-samplesheet + validation for model definitions and contrasts. Keep it as simple as possible initially, but keep it extensible for future additions (YAML-based contrast definition #370)
Find consensus on how to define more complex contrasts Contrast specification #377
Implement model validation Model validation #371

pinin4fjords · 2024-12-02T13:47:31Z

What kind of validation steps do you invisage here? Wouldn't such a model fail anyway because the design matrix is not full-rank?

It's just that those failures are hard for users to interpret. We should start building a up a set of validation checks to make debugging easier. For example, specifying a model with variables that don't exist in the sample sheet should produce an error telling the user that.

There's already a validation step that e.g. checks the contrasts are compatible with the sample sheet, we could maybe add model checks there as well, to prevent adding a new step https://github.com/pinin4fjords/shinyngs/blob/develop/exec/validate_fom_components.R

suzannejin · 2024-12-04T11:26:03Z

Hello @grst! Thank you for proposing this! It looks good to me.
Just one comment... is it possible to do something like this if required?

- id: treatment_a_vs_b_vs_c
  type: simple
  comparison: ["treatment", "A", "B", "C"]

This is because the method that we are working on (ie. propd) can work with multiple conditions (even though now this is not allowed in the module, so that it remains coherent with the current contrast implementation).

grst · 2024-12-04T12:28:52Z

Hi @suzannejin,

I'll post a separate issue to discuss the contrast specification in more detail and we can certainly take it up there. Potentially we also need to accomondate different ways of specifying contrasts for different methods.

About your specific example

 ["treatment", "A", "B", "C"]

what would that mean? Compare both B and C separately to the baseline A? Or all against all?

suzannejin · 2024-12-04T12:59:48Z

@grst Sure! The flexibility to accomodate different ways could be nice.

In the specific case I was mentioning above is more for all against all. It was just to mention this as potential option to consider, but it is not so important to have this option for the moment though, as it asks a slightly different question: is a gene changing across the different conditions.

grst · 2024-12-04T13:37:26Z

as it asks a slightly different question: is a gene changing across the different conditions.

so basically the ANOVA case?

suzannejin · 2024-12-04T15:16:02Z

as it asks a slightly different question: is a gene changing across the different conditions.

so basically the ANOVA case?

yes exactly!

grst · 2024-12-10T08:48:48Z

Suggested way forward:

YAML-based contrast sheet MVP -> just switch to YAML + JSON schema, no other changes (POC contrasts csv -> yaml #382)
Model validation MVP (Model validation #371)
Formula MVP: Switch to formula instead of blocking factor (no additional functionality, just switch to formula)

Then work on additional contrast types (#377).

tschwarzl · 2024-12-12T07:33:55Z

as it asks a slightly different question: is a gene changing across the different conditions.

so basically the ANOVA case?

yes exactly!

Those use cases would be a very powerful addition. However, I would like to highlight, that the handling and interpretation of log2 fold need careful examination. Typically, the significance estimate (p-value) is used for analysis in these cases. To ensure interpretable log2 values, it is often necessary to perform an additional pairwise test. This consideration applies to use cases modeled with both full and reduced models, encompassing the mentioned example as well.

grst · 2024-12-20T10:31:24Z

@nschcolnicov @alanmmobbs93 to move this forward over the christmas break:

your draft PRs POC contrasts csv -> yaml #382 and New Feature POC: VALIDATE_MODEL #404 stay as they are (i.d. with the minimally-invasive yaml-based contrast sheet)
to work on supporting formulas, create a separate branch includes the changes from POC contrasts csv -> yaml #382 and New Feature POC: VALIDATE_MODEL #404 where you can move fast an break things. We'll then figure out in January how to best merge it into the main pipeline.

grst added the enhancement New feature or request label Dec 2, 2024

grst mentioned this issue Dec 2, 2024

Support for linear mixed effects models via DREAM #363

Open

nschcolnicov added this to differentialabundance Dec 2, 2024

nschcolnicov moved this to ToDo - high priority in differentialabundance Dec 2, 2024

This was referenced Dec 3, 2024

Multiple variables/interactions #211

Open

Continuous covariates #247

Open

YAML-based contrast definition #370

Open

Model validation #371

Closed

nschcolnicov self-assigned this Dec 4, 2024

grst mentioned this issue Dec 6, 2024

Contrast specification #377

Open

grst mentioned this issue Dec 10, 2024

Multi-tool functionality and subworkflows as hub of methods #385

Open

21 tasks

grst mentioned this issue Dec 20, 2024

Add DREAM to the differential subworkflow #407

Closed

grst mentioned this issue Jan 21, 2025

Contrasts schema validation #410

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible model and contrast definition #362

More flexible model and contrast definition #362

grst commented Dec 2, 2024 •

edited

Loading

pinin4fjords commented Dec 2, 2024

grst commented Dec 2, 2024 •

edited

Loading

pinin4fjords commented Dec 2, 2024 •

edited

Loading

suzannejin commented Dec 4, 2024

grst commented Dec 4, 2024

suzannejin commented Dec 4, 2024 •

edited

Loading

grst commented Dec 4, 2024

suzannejin commented Dec 4, 2024

grst commented Dec 10, 2024

tschwarzl commented Dec 12, 2024

grst commented Dec 20, 2024

More flexible model and contrast definition #362

More flexible model and contrast definition #362

Comments

grst commented Dec 2, 2024 • edited Loading

Description of feature

How to define contrasts

Configuration format

pinin4fjords commented Dec 2, 2024

grst commented Dec 2, 2024 • edited Loading

pinin4fjords commented Dec 2, 2024 • edited Loading

suzannejin commented Dec 4, 2024

grst commented Dec 4, 2024

suzannejin commented Dec 4, 2024 • edited Loading

grst commented Dec 4, 2024

suzannejin commented Dec 4, 2024

grst commented Dec 10, 2024

tschwarzl commented Dec 12, 2024

grst commented Dec 20, 2024

grst commented Dec 2, 2024 •

edited

Loading

grst commented Dec 2, 2024 •

edited

Loading

pinin4fjords commented Dec 2, 2024 •

edited

Loading

suzannejin commented Dec 4, 2024 •

edited

Loading