How to define complex contrasts? #386

grst · 2024-12-10T13:29:49Z

Description of feature

Follow up of #377. The goal is to find a consensus on how we want users to specify more complex contrasts based on linear model coefficients.

Background

Typically linear (and other) models rely on a design matrix for fitting the model and on contrast vectors for performing comparisons.
A convenient way of constructing such a design matrix is to use a Wilkinson formula. For instance, the formula ~ treatment + response with the following data frame...

patient_id	treatment	response
1	drugA	progression
2	drugA	response
3	drugA	stable_disease
4	drugB	response
5	drugB	stable_disease

... would result in the following design matrix

Intercept	treatmentdrugB	responseresponse	responsestable_disease
1	0	0	0
1	0	1	0
1	0	0	1
1	1	1	0
1	1	0	1

A comparison can be specified using a contrast vector that has the same columns as the design matrix. For instance, to compare response against progression, we'd need the following contrast vector

Intercept	treatmentdrugB	responseresponse	responsstable_disease
0	0	1	0

For comparing response against stable_disease

Intercept	treatmentdrugB	responseresponse	responsstable_disease
0	0	1	-1

A handy R library for exploring design matrices is the R package/Shiny app ExploreModelMatrix.

Below I showcase two options to define complex contrasts, namely limma's makeContrasts and glmGamPoi's cond.
While both originate from a specific method, they are method agnostic as they generate a numeric contrasts vector.

Limma's makeContrasts function

design = model.matrix(~ treatment + response, data=df) # where df is the table from above
makeContrasts(
    response_response_vs_progression = responseresponse,
    response_response_vs_stable = responseresponse - responsestable_disease,
    levels=design
) 
#                         Contrasts
# Levels                   response_response_vs_progression response_response_vs_stable
#  Intercept                                             0                           0
#  treatmentdrugB                                        0                           0
#  responseresponse                                      1                           1
#  responsestable_disease                                0                          -1

Pro

widely used

Con

requires knowledge of the coefficient names
requires knowledge about which levels have been dropped (e.g. in this case responsestable doesn't have a column in the design matrix because it's the baseline level)

This only becomes more complicated with more complex designs, e.g. interaction terms.

glmGamPoi `cond()`

glmGamPoi introduces a neat helper function to define contrasts. Using cond() one can simply specify a certain set of column/value combinations to retrieve a vector that represents this data in the design matrix. These vectors can be combined into arbitrary contrasts using standard arithmetic operations

response_response_vs_progression = cond(response="response") - cond(response="progression")
response_response_vs_stable = cond(response="response") - cond(response="stable")

Pro

only knoweldge of the columns and their values required
still convenient for complex designs with many columns in the design matrix.

Con

Relatively unknown
No off-the-shelf implementation available that can be used outside glmGamPoi (but rather straightforward to implement).

(Disclaimer: I'm a big fan of this approach and implemented this in a standalone Python library. The original idea is from glmGamPoi though and it's the only R version I know)

Others

There's also contrast and contrasts() from {rms}, that are a bit similar to cond(), but they apply the contrast directly to the model and are not implemented for limma or DESeq2.

Suggested implementation

Based on one of the solutions above, create a design matrix and contrast vectors early on. Then pass the design and contrast to the respective methods.

CC @tschwarzl @apeltzer @atrigila @nschcolnicov @alanmmobbs93

The text was updated successfully, but these errors were encountered:

alanmmobbs93 · 2025-01-06T12:22:27Z

I think that the YML structure should be defined first, and I'd propose it should be explicit so we could later gather all the components in the way each tool needs it:

limma: makeContrast( ... , list( <name> = <target> - <reference> ) )
DESeq2: results( ... , contrast = list( c( <target > ), c( <reference> ) )
DREAM: makeContrastsDream( ... , contrast = c( <name> = <target> - <reference> ))

For this approach, all scripts should have a first step to parse each formula (from a yml file), iterating over the contrasts IDs to construct what we need. The variables names can be extracted directly from the formula. The cons of this approach is that the user should know how to create the coefficients and at least some knowledge about formulas.

YML formats

I think that there should be two main variables here:

mixed: true/false (default: false), at the formula level, to trigger DREAM module or not.
contrast type: for each contrast that we want to run:
- simple: it's one condition (or combined condition) against other, the user should know how to write the contrasts. If "variable" is described, the code should combine variable + level.
- pariwaise: make all-against-all, providing only the variable (variables?) of interest.

Linear models

models:
  - formula: "~ genotype + treatment + genotype:treatment"
    mixed: false 
    contrasts:
      - id: genotype_a_vs_b
        type: simple
        comparison:
            reference: 'genotypeA.treatmentTreated'
            target: 'genotypeB.treatmentTreated'

We should be able to compile the contrasts functions:

  ## FOR LIMMA ---------------------------------------
  # Dynamically construct the contrast expression
  contrast_expression <- paste(target, " - ", reference)
  
  # Create the contrast matrix with the specified contrast_id
  contrasts <- makeContrasts(
    contrasts = setNames(list(contrast_expression), contrast_id),
    levels = design
  )

  ## FOR DESeq2 --------------------------------------
  # Construct the contrast as a list
  target <- "genotypeB.treatmentTreated"
  reference <- "genotypeA.treatmentTreated"
  contrast <- list( c(reference), c(target) )
  res <- results(dds, contrast = contrast)

Other examples for the same formula, open to get more complex scenarios to think about them

models:
  - formula: "~ 0 + treatment"
    mixed: false
    contrasts:
      ## paste internally tretmentA and treatmentB, how the user is working now
      - id: treatment_b_vs_a
        type: simple
        comparison:
          variable: treatment
          reference: A
          target: B
        
       ## Writing the coefficients, it will be just like the previous one
      - id: treatment_b_vs_a_explitic     # Should be the same as 'treatment_b_vs_a'
        type: simple
          reference: treatmentA
          target: treatmentB

      ## Writing the coefficients, the reference is the mean of two conditions
      - id: treatment_c_vs_ab
        type: simple
        comparison: 
            reference: 'treatmentB - treatmentA'
            target: 'treatmentC'

      ## All against all
      - id: treatment_all_vs_all   ## Asuming that "treatment" has multiple levels
        type: pairwise                 ## Runs all-against-all, taking A as first reference (alphabetically). The code should construct the contrasts programatically
        comparison:
            variable: 'treatment'

Linear mixed models

  - formula: "~ treatment + (1 | response )"
    mixed: true  # -> this should trigger `DREAM_DIFFERENTIAL` only. It can also be detected automatically from the formula in R
    contrasts:
      - id: treatment_a_vs_b
        type: simple
        comparison:
            variable: 'treatment'
            reference: 'A'
            target: 'B'
      - id: treatment_c_vs_ab
        type: simple
        comparison: 
            variable: 'treatment'
            reference: 'B, A'
            target: 'C'

CC @nschcolnicov @atrigila

grst added the enhancement New feature or request label Dec 10, 2024

grst mentioned this issue Dec 6, 2024

Contrast specification #377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to define complex contrasts? #386

How to define complex contrasts? #386

grst commented Dec 10, 2024

alanmmobbs93 commented Jan 6, 2025 •

edited

Loading

How to define complex contrasts? #386

How to define complex contrasts? #386

Comments

grst commented Dec 10, 2024

Description of feature

Background

Limma's makeContrasts function

Pro

Con

glmGamPoi cond()

Pro

Con

Others

Suggested implementation

alanmmobbs93 commented Jan 6, 2025 • edited Loading

YML formats

Linear models

Linear mixed models

glmGamPoi `cond()`

alanmmobbs93 commented Jan 6, 2025 •

edited

Loading