Initialise mediation functions #503

RiboRings · 2024-03-06T16:38:44Z

This PR is not ready yet, but it was opened to begin discussion around mediation analysis for mia. Here, two new main functions are introduced: mediateColData and mediateAssay, both located in R/mediate.R. They depend on the mediation package and if this is not ideal these functions could be placed in an extension of mia.

mediateColData provides an easy way to run mediation analysis between 3 variables in the colData, while mediateAssay allows to run multiple mediation models for every taxon in an assay, or for every component of a reducedDim, and subsequently adjust significance for multiple comparison. They both leverage the mediation::mediate function under the hood.

mediateColData takes a tse object, three colData variables for outcome, treatment and mediator, respectively, plus other covariates from the colData and other arguments supported by mediation::mediate. It returns the output of the mediation model (similar to the output of lm or glm). Below is a self-contained example of how it works:

library(mia)
library(microbiomeDataSets)
library(mediation)
library(dplyr)
source("R/mediate.R")

tse <- OKeefeDSData()

tse <- transformAssay(tse,
                      method = "relabundance")

tse <- estimateDiversity(tse,
                         index = "shannon",
                         assay.type = "relabundance")

colData(tse)$bmi_group <- as.numeric(tse$bmi_group)

med_out <- mediateColData(tse,
                          outcome = "bmi_group",
                          treatment = "nationality",
                          mediator = "shannon",
                          covariates = "timepoint.within.group",
                          boot = TRUE, sims = 1000)

summary(med_out)

# Causal Mediation Analysis 
#
# Nonparametric Bootstrap Confidence Intervals with the Percentile Method
#
# (Inference Conditional on the Covariate Values Specified in `covariates')
#
#                Estimate 95% CI Lower 95% CI Upper p-value    
# ACME             0.1086       0.0344         0.19   0.004 ** 
# ADE             -0.4580      -0.6726        -0.24  <2e-16 ***
# Total Effect    -0.3493      -0.5484        -0.13  <2e-16 ***
# Prop. Mediated  -0.3110      -0.9781        -0.08   0.004 ** 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Sample Size Used: 222 
#
#
# Simulations: 1000 

plot(med_out)

mediateAssay iterates over the same code used for mediateColData. It takes the extra arguments assay.type and dim.type. If one of them is correctly specified, it runs mediation::mediate for every row of an assay or every column of a reducedDim slot, respectively. As output, it returns a dataframe with effect sizes and p-values for every taxon/component. Below is an example that requires variables from the previous example above:

tse <- transformAssay(tse,
                      method = "clr",
                      pseudocount = 1)

tse <- tse[ , tse$timepoint.within.group == 2]

med_res <- mediateAssay(tse,
                        outcome = "bmi_group",
                        treatment = "nationality",
                        assay.type = "clr",
                        boot = TRUE, sims = 300)

med_res %>%
  filter(ACME_adjpval > 0.05) %>%
  head() %>%
  knitr::kable()

Treatment	Mediator	Outcome	ACME_estimate	ADE_estimate	ACME_pval	ADE_pval	ACME_adjpval	ADE_adjpval	ACME_CI_lower	ADE_CI_lower
nationality	Uncultured Mollicutes	bmi_group	-0.1700134	-0.1986718	0.0066667	0.2133333	0.2888889	0.2133333	-0.0469092	0.1062497
nationality	Eubacterium ventriosum et rel.	bmi_group	-0.1087084	-0.2599768	0.0200000	0.0866667	0.4814815	0.0901333	-0.0100311	0.0607181
nationality	Eubacterium cylindroides et rel.	bmi_group	0.1010478	-0.4697330	0.0200000	0.0133333	0.4814815	0.0279570	0.2347390	-0.1674882
nationality	Lactobacillus salivarius et rel.	bmi_group	0.0816559	-0.4503411	0.0266667	0.0066667	0.4814815	0.0222222	0.1957793	-0.1609730
nationality	Bacteroides uniformis et rel.	bmi_group	0.2371214	-0.6058066	0.0266667	0.0133333	0.4814815	0.0279570	0.4490582	-0.2087500
nationality	Uncultured Selenomonadaceae	bmi_group	0.1084270	-0.4771122	0.0333333	0.0133333	0.4814815	0.0279570	0.2127692	-0.1225544
nationality	Parabacteroides distasonis et rel.	bmi_group	0.2731733	-0.6418585	0.0333333	0.0000000	0.4814815	0.0000000	0.5877062	-0.2688401
nationality	Catenibacterium mitsuokai et rel.	bmi_group	-0.0810081	-0.2876771	0.0466667	0.0800000	0.5515152	0.0845528	-0.0009472	0.0257356

More examples can be found in this repository

antagomir · 2024-03-07T14:50:14Z

Great.

Input types: splitting functionality by colData and assay is one option. Another option is to split by univariate vs. multivariate. In such case you could do something like mediate(tse, outcome = ..., treatment = "nationality", assay.type = "clr", ...) and there the "outcome" could refer to colData variable, or to one assay feature (taxon). This would be recognized automatically. It is important to have a chance to run mediation also for colData but then the question is, what is the added value compared to just running the basic mediate::mediation function from the original package on just a single variable from colData.. And why not allow situations where multiple comparisons could be run for colData.. (e.g. if there are different derived indices in addition to Shannon diversity).. This is a more general design aspect of mia and might deserve some thought. On the other hand perhaps we could just go with this colData / assay split first and improve later.
Output: @TuomasBorman might comment but have been wondering if these functions should directly return augmented TreeSE objects as output. BUt I am not sure if it is feasible here. At least the examples should show how to add relevant pats of the results to the TreeSE object
Support for custom mediation functions. If possible, it would be idea to allow users to input also custom mediation functions. This will greately facilitate later innovation and comparisons. In practice, we should define how the inputs and ouputs of such function should look like and as long as the function follows those standards it could be anything. For instance, we have allow the "FUN" argument to insert entirely new dissimilarity measures in functions such as runNMDS: https://microbiome.github.io/mia/reference/runNMDS.html
We need an argument to choose from different methods, even if this would first provide only one method. Some other methods might like to take the entire assay rather than run for loop over features.

TuomasBorman · 2024-03-08T07:39:27Z

Looks nice!

I think it is better to have one function for one task if possible. For example, now we have 2 functions for cross-correlation which might just confuse user.

In this case one options would be getMediate(tse, outcome = "disease", mediator = "counts", treatment = "diet", outcome.type = NULL, mediator.type = NULL, treatment.type = NULL), where outcome,mediatorand treatment parameters are automatically fetched from colData/assay/reducedDim. -> give error if overlapping name --> user can manually specify with *type where it is searched

Yes, it could be added to metadata but I don't really see additional value. However, the function name could be getMediate to highlight that it does not return TreeSE. (add* would return)

RiboRings · 2024-03-09T20:02:46Z

Thank you for the comments!

@TuomasBorman, if I understand correctly, you are suggesting that mediateColData and mediateAssay should be combined into a single function, right? It sounds like it would simplify the code, and make it possible to use single or multiple variables for any of the outcome, mediator and treatment arguments.

But then how should we standardise the output? Because when multiple analyses are run (mediateAssay), the most practical output is a data.frame of model statistics for every combination, whereas for a single analysis (mediateColData), the most informative output is the trained model itself (so you can do summary or plot on it). From this aspect, keeping the two functions separate might be more convenient.

RiboRings · 2024-03-09T20:19:32Z

@antagomir, so you would consider this implementation with mediation::mediate like one option out of many, right? And then other mediation functions such as the hdmed ones could be specified with FUN = ...? This would definitely make this function more generic and complete.

In this case, how would you control the suitability of the custom method? For example, the hdmed (high-dimensional mediation) funs are only appropriate for high-dimensional data like the assay (mediateAssay), but not for colData variables (mediateColData). Also for this aspect having two separate functions might be more convenient(?)

Or actually what do you mean by custom mediation functions?

antagomir · 2024-03-10T22:00:03Z

Well in fact also the arguments may be very different for different methods. That's why there is for instance runMDS, runNMDS, etc. and not a single runOrdination.. so perhaps it is not good to go very generic for now. Let's just finalize this for the mediation package.

Regarding comment from TB above: it is recommened in general that the function output should always be the same, regardless of the inputs. We could have that, and in addition the user could choose to include the resulting model/s in metadata slot (default FALSE but examples could show how to do this when necessary). This applies also to the assay case, it would be a list of output models.

Would that work?

Default output could indeed be a data frame with effect sizes and (adjusted) p-values.

RiboRings · 2024-03-12T15:21:29Z

Hi!

The getMediation function is ready for review. I still need to add more argument checks, comment code and add unit testing, but the main parts are there. Now there is a single function that can take mediators from the colData (mediator arg), the assays (assay.type) or reducedDims (dim.type). It can also return the original model in attr(med_df, "metadata")[[model_number]] when add.metadata = TRUE.

Caveats:

It is probably not ideal to allow the user to input multiple outcomes and treatments, because the outcomes might have different distributions and the treatments different control/treatment conditions, which preferably should be specified as the family, control.value and treat.value arguments.
Adding the option to select specific taxa/reduced dimensions as mediators would make the function more complicated. I think the user could simply subset the (Tree)SE object or specify number of dimensions when running ordination, and then feeding that object to getMediation.

TuomasBorman

I still have to run the code, but these are what I found by just looking

R/mediate.R

antagomir

See the review comments. Overall, looks nice!

Is there any useful way to visualize mediation results? For the examples.

R/mediate.R

TuomasBorman

I checked the code now in detail, seems good and working

R/mediate.R

antagomir · 2024-03-22T20:18:12Z

@RiboRings if you can close the conversations one by one when they have been resolved that would be great, to keep track on what was solved and what not

antagomir · 2024-04-02T16:06:13Z

Up - @RiboRings

RiboRings · 2024-04-14T15:20:25Z

Hi! I finally got back to this

TuomasBorman

Top quality!

I forgot to mention that update NEWS file, It should include all the changes in easy, human-readable format. --> Added getMediation and addMediation functions.

antagomir · 2024-04-18T13:53:35Z

All options ok to me.

hitchip can move permanently to mia if this helps, there is no particular need to have it in miaTime. It was put there because we needed to limit the number of data sets in mia package and because hitchip had some time series in it

RiboRings · 2024-04-18T14:14:32Z

Maybe we could comment out the examples and remove miaTime from DESCRIPTION, and then uncomment the examples when miaTime is submitted to BioConductor?

TuomasBorman · 2024-04-19T06:31:50Z

Maybe we could comment out the examples and remove miaTime from DESCRIPTION, and then uncomment the examples when miaTime is submitted to BioConductor?

That was actually same to where I ended up after I thought this over night. --> So let's do that.

Instead of commenting out do:

if( require("miaTime") ){
examples
}

miaTime is used some other places also, but they are not affected. They are already ran only if miaTime is installed. (vignette is not ran)

antagomir · 2024-04-19T09:06:11Z

a shining idea

RiboRings · 2024-04-19T13:15:45Z

R CMD check fails due to warning:

❯ checking for unstated dependencies in examples ... WARNING
  'library' or 'require' call not declared from: ‘miaTime’

Also, for some reason miaTime is being installed even when it is not in DESCRIPTION:

miaTime                    0.1.21     2024-04-17 [1] Github (microbiome/miaTime@9fe9[771](https://github.com/microbiome/mia/actions/runs/8753409443/job/24023056975#step:4:791))

antagomir · 2024-04-20T23:29:59Z

For me running the latest version of the "mediation" branch through checks throws warning

Warning in data(hitchip1006) : data set ‘hitchip1006’ not found
Error: object 'hitchip1006' not found

Also

checking tests ...
Running ‘testthat.R’^[[5~^[[5;5~^[[5;5~^[[6;5~
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:

[9] 2.5067 - 2.5100 == -0.00331
...
Backtrace:
▆

└─mia (local) testTransformations(tse) at test-5transformCounts.R:331:5
└─testthat::expect_equal(actual, compare) at test-5transformCounts.R:293:9

TuomasBorman · 2024-04-22T08:27:11Z

I added \donttest which should fix the issue of @RiboRings.

@antagomir I cannot reproduce your error. It runs fine in my local machine, and the same tests are also run in GHA. Although, it is good to investigate this further to get answer on why you get different results

TuomasBorman · 2024-04-22T09:10:35Z

Did not run, I disabled the examples

antagomir · 2024-04-22T09:32:28Z

If this goes through automated tests then I think should be ok

RiboRings · 2024-04-26T11:49:52Z

Ubuntu:

* checking whether package ‘mia’ can be installed ... [30s/30s] WARNING
Found the following significant warnings:
  Warning: package ‘matrixStats’ was built under R version 4.5.0
  Warning: package ‘GenomicRanges’ was built under R version 4.5.0
  Warning: package ‘Biobase’ was built under R version 4.5.0
  Warning: package ‘SingleCellExperiment’ was built under R version 4.5.0

Mac:

Error: Error: Failed to install 'mia' from local:
  unable to load shared object '/Users/runner/work/_temp/Library/cli/libs/cli.so':
  dlopen(/Users/runner/work/_temp/Library/cli/libs/cli.so, 0x0006): tried: '/Users/runner/work/_temp/Library/cli/libs/cli.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/_temp/Library/cli/libs/cli.so' (no such file), '/Users/runner/work/_temp/Library/cli/libs/cli.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))

TuomasBorman · 2024-04-29T06:15:33Z

That will be hopefully fixed soon when new Bioc is released. However R 4.5.0 is little bit odd since next Bioc devel is developed against 4.4.0. That is something to do with rworkflows, I think

RiboRings · 2024-05-05T07:01:28Z

Hi @TuomasBorman! Should I do anything to fix this? How should we proceed to merge this PR?

TuomasBorman · 2024-05-05T07:29:05Z

I haven't been able to fix the problem, however, everything should be ok if at least one of these runs work and there are no errors related to this PR. --> so we can merge without passing all these rules

(I asked from rworkflows maintainers if they know what is the problem, and they have not answered yet)

TuomasBorman · 2024-05-05T08:47:21Z

Seems ok, I will merge (I just updated NEWS). Thanks!

Initialise mediation functions

0a7c947

RiboRings self-assigned this Mar 6, 2024

Finalise mediation wrapper and add vignettes

692c40e

RiboRings added 5 commits March 12, 2024 17:29

Fix se not found error

882b777

Replace n_distinct in mediate with base funs

3e60b7d

Fix error in mediate example and simplify argument checks

65f5c47

Add comments to mediate functions

f395065

Add tests for getMediation

27d56a0

RiboRings added the ready label Mar 13, 2024

TuomasBorman requested changes Mar 14, 2024

View reviewed changes

R/mediate.R Outdated Show resolved Hide resolved

R/mediate.R Outdated Show resolved Hide resolved

R/mediate.R Outdated Show resolved Hide resolved

R/mediate.R Outdated Show resolved Hide resolved

R/mediate.R Outdated Show resolved Hide resolved

RiboRings added 4 commits March 16, 2024 14:07

Replace se with x and print with message in mediate funs

f82cd12

Merge branch 'master' into mediation

7aea015

Update docs on getMediation

ebfc0be

Fix missing arg verbose in mediate fun

c7887d8

antagomir reviewed Mar 19, 2024

View reviewed changes

TuomasBorman requested changes Mar 20, 2024

View reviewed changes

Use standard bioc indentation and utility names

a7ffc34

RiboRings removed the ready label Apr 14, 2024

RiboRings added 2 commits April 14, 2024 17:27

Merge branch 'devel' into mediation

7b113eb

Implement feedback 1

3c6ec7a

Create addMediation function

63f9504

TuomasBorman approved these changes Apr 18, 2024

View reviewed changes

RiboRings added 4 commits April 19, 2024 14:37

Run mediation example only if miaTime installed

6336cac

Merge branch 'devel' into mediation

76f58ff

Try to fix miaTime require issue

9a0f7a8

Try requireNamespace instead of require

b3172a0

Merge branch 'devel' into mediation

a7097a9

up

a9d9946

up

7934b1f

RiboRings added 3 commits April 26, 2024 11:37

SOlve conflicts with devel

408b847

Minor fixes

ce0771d

Bump version

d48f8ae

Solve conflicts and bump version

e7aea93

TuomasBorman added 2 commits May 5, 2024 10:56

up

f1cec71

up

af73d39

TuomasBorman merged commit 21a74d1 into devel May 5, 2024
1 of 3 checks passed

TuomasBorman deleted the mediation branch May 5, 2024 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialise mediation functions #503

Initialise mediation functions #503

RiboRings commented Mar 6, 2024

antagomir commented Mar 7, 2024

TuomasBorman commented Mar 8, 2024

RiboRings commented Mar 9, 2024 •

edited

Loading

RiboRings commented Mar 9, 2024 •

edited

Loading

antagomir commented Mar 10, 2024

RiboRings commented Mar 12, 2024

TuomasBorman left a comment

antagomir left a comment

TuomasBorman left a comment

antagomir commented Mar 22, 2024

antagomir commented Apr 2, 2024

RiboRings commented Apr 14, 2024

TuomasBorman left a comment

antagomir commented Apr 18, 2024

RiboRings commented Apr 18, 2024

TuomasBorman commented Apr 19, 2024 •

edited

Loading

antagomir commented Apr 19, 2024

RiboRings commented Apr 19, 2024

antagomir commented Apr 20, 2024

TuomasBorman commented Apr 22, 2024

TuomasBorman commented Apr 22, 2024

antagomir commented Apr 22, 2024

RiboRings commented Apr 26, 2024 •

edited

Loading

TuomasBorman commented Apr 29, 2024

RiboRings commented May 5, 2024

TuomasBorman commented May 5, 2024

TuomasBorman commented May 5, 2024

Initialise mediation functions #503

Initialise mediation functions #503

Conversation

RiboRings commented Mar 6, 2024

antagomir commented Mar 7, 2024

TuomasBorman commented Mar 8, 2024

RiboRings commented Mar 9, 2024 • edited Loading

RiboRings commented Mar 9, 2024 • edited Loading

antagomir commented Mar 10, 2024

RiboRings commented Mar 12, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir left a comment

Choose a reason for hiding this comment

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir commented Mar 22, 2024

antagomir commented Apr 2, 2024

RiboRings commented Apr 14, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir commented Apr 18, 2024

RiboRings commented Apr 18, 2024

TuomasBorman commented Apr 19, 2024 • edited Loading

antagomir commented Apr 19, 2024

RiboRings commented Apr 19, 2024

antagomir commented Apr 20, 2024

TuomasBorman commented Apr 22, 2024

TuomasBorman commented Apr 22, 2024

antagomir commented Apr 22, 2024

RiboRings commented Apr 26, 2024 • edited Loading

TuomasBorman commented Apr 29, 2024

RiboRings commented May 5, 2024

TuomasBorman commented May 5, 2024

TuomasBorman commented May 5, 2024

RiboRings commented Mar 9, 2024 •

edited

Loading

RiboRings commented Mar 9, 2024 •

edited

Loading

TuomasBorman commented Apr 19, 2024 •

edited

Loading

RiboRings commented Apr 26, 2024 •

edited

Loading