Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define complex contrasts? #386

Open
grst opened this issue Dec 10, 2024 · 1 comment
Open

How to define complex contrasts? #386

grst opened this issue Dec 10, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@grst
Copy link
Member

grst commented Dec 10, 2024

Description of feature

Follow up of #377. The goal is to find a consensus on how we want users to specify more complex contrasts based on linear model coefficients.

Background

Typically linear (and other) models rely on a design matrix for fitting the model and on contrast vectors for performing comparisons.
A convenient way of constructing such a design matrix is to use a Wilkinson formula. For instance, the formula ~ treatment + response with the following data frame...

patient_id treatment response
1 drugA progression
2 drugA response
3 drugA stable_disease
4 drugB response
5 drugB stable_disease

... would result in the following design matrix

Intercept treatmentdrugB responseresponse responsestable_disease
1 0 0 0
1 0 1 0
1 0 0 1
1 1 1 0
1 1 0 1

A comparison can be specified using a contrast vector that has the same columns as the design matrix. For instance, to compare response against progression, we'd need the following contrast vector

Intercept treatmentdrugB responseresponse responsstable_disease
0 0 1 0

For comparing response against stable_disease

Intercept treatmentdrugB responseresponse responsstable_disease
0 0 1 -1

A handy R library for exploring design matrices is the R package/Shiny app ExploreModelMatrix.

Below I showcase two options to define complex contrasts, namely limma's makeContrasts and glmGamPoi's cond.
While both originate from a specific method, they are method agnostic as they generate a numeric contrasts vector.

Limma's makeContrasts function

design = model.matrix(~ treatment + response, data=df) # where df is the table from above
makeContrasts(
    response_response_vs_progression = responseresponse,
    response_response_vs_stable = responseresponse - responsestable_disease,
    levels=design
) 
#                         Contrasts
# Levels                   response_response_vs_progression response_response_vs_stable
#  Intercept                                             0                           0
#  treatmentdrugB                                        0                           0
#  responseresponse                                      1                           1
#  responsestable_disease                                0                          -1

Pro

  • widely used

Con

  • requires knowledge of the coefficient names
  • requires knowledge about which levels have been dropped (e.g. in this case responsestable doesn't have a column in the design matrix because it's the baseline level)

This only becomes more complicated with more complex designs, e.g. interaction terms.

glmGamPoi cond()

glmGamPoi introduces a neat helper function to define contrasts. Using cond() one can simply specify a certain set of column/value combinations to retrieve a vector that represents this data in the design matrix. These vectors can be combined into arbitrary contrasts using standard arithmetic operations

response_response_vs_progression = cond(response="response") - cond(response="progression")
response_response_vs_stable = cond(response="response") - cond(response="stable")

Pro

  • only knoweldge of the columns and their values required
  • still convenient for complex designs with many columns in the design matrix.

Con

(Disclaimer: I'm a big fan of this approach and implemented this in a standalone Python library. The original idea is from glmGamPoi though and it's the only R version I know)

Others

There's also contrast and contrasts() from {rms}, that are a bit similar to cond(), but they apply the contrast directly to the model and are not implemented for limma or DESeq2.

Suggested implementation

Based on one of the solutions above, create a design matrix and contrast vectors early on. Then pass the design and contrast to the respective methods.

CC @tschwarzl @apeltzer @atrigila @nschcolnicov @alanmmobbs93

@grst grst added the enhancement New feature or request label Dec 10, 2024
@alanmmobbs93
Copy link

alanmmobbs93 commented Jan 6, 2025

I think that the YML structure should be defined first, and I'd propose it should be explicit so we could later gather all the components in the way each tool needs it:

  • limma: makeContrast( ... , list( <name> = <target> - <reference> ) )
  • DESeq2: results( ... , contrast = list( c( <target > ), c( <reference> ) )
  • DREAM: makeContrastsDream( ... , contrast = c( <name> = <target> - <reference> ))

For this approach, all scripts should have a first step to parse each formula (from a yml file), iterating over the contrasts IDs to construct what we need. The variables names can be extracted directly from the formula. The cons of this approach is that the user should know how to create the coefficients and at least some knowledge about formulas.

YML formats

I think that there should be two main variables here:

  • mixed: true/false (default: false), at the formula level, to trigger DREAM module or not.
  • contrast type: for each contrast that we want to run:
    • simple: it's one condition (or combined condition) against other, the user should know how to write the contrasts. If "variable" is described, the code should combine variable + level.
    • pariwaise: make all-against-all, providing only the variable (variables?) of interest.

Linear models

models:
  - formula: "~ genotype + treatment + genotype:treatment"
    mixed: false 
    contrasts:
      - id: genotype_a_vs_b
        type: simple
        comparison:
            reference: 'genotypeA.treatmentTreated'
            target: 'genotypeB.treatmentTreated'

We should be able to compile the contrasts functions:

  ## FOR LIMMA ---------------------------------------
  # Dynamically construct the contrast expression
  contrast_expression <- paste(target, " - ", reference)
  
  # Create the contrast matrix with the specified contrast_id
  contrasts <- makeContrasts(
    contrasts = setNames(list(contrast_expression), contrast_id),
    levels = design
  )

  ## FOR DESeq2 --------------------------------------
  # Construct the contrast as a list
  target <- "genotypeB.treatmentTreated"
  reference <- "genotypeA.treatmentTreated"
  contrast <- list( c(reference), c(target) )
  res <- results(dds, contrast = contrast)

Other examples for the same formula, open to get more complex scenarios to think about them

models:
  - formula: "~ 0 + treatment"
    mixed: false
    contrasts:
      ## paste internally tretmentA and treatmentB, how the user is working now
      - id: treatment_b_vs_a
        type: simple
        comparison:
          variable: treatment
          reference: A
          target: B
        
       ## Writing the coefficients, it will be just like the previous one
      - id: treatment_b_vs_a_explitic     # Should be the same as 'treatment_b_vs_a'
        type: simple
          reference: treatmentA
          target: treatmentB

      ## Writing the coefficients, the reference is the mean of two conditions
      - id: treatment_c_vs_ab
        type: simple
        comparison: 
            reference: 'treatmentB - treatmentA'
            target: 'treatmentC'

      ## All against all
      - id: treatment_all_vs_all   ## Asuming that "treatment" has multiple levels
        type: pairwise                 ## Runs all-against-all, taking A as first reference (alphabetically). The code should construct the contrasts programatically
        comparison:
            variable: 'treatment'

Linear mixed models

  - formula: "~ treatment + (1 | response )"
    mixed: true  # -> this should trigger `DREAM_DIFFERENTIAL` only. It can also be detected automatically from the formula in R
    contrasts:
      - id: treatment_a_vs_b
        type: simple
        comparison:
            variable: 'treatment'
            reference: 'A'
            target: 'B'
      - id: treatment_c_vs_ab
        type: simple
        comparison: 
            variable: 'treatment'
            reference: 'B, A'
            target: 'C'

CC @nschcolnicov @atrigila

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants