Replies: 2 comments
-
Thanks for your comments! (And sorry for the delayed response.) I'm going to record my thoughts here about the points you raised for future reference, and will start separate issues in the future pertaining to each issue as I get around to working on it. First, a very high-level comment: This package is currently serving to perform the analysis for our manuscript, and the vignette gives an explanation of how this analysis is done and could be done on a new dataset. Our manuscript is a basic-research paper rather than a software or methods paper and hence why we have not developed user-friendly high-level interfaces that e.g. work with phyloseq and haven't submitted to a repository. We still have some work to do to develop more robust statistical inference methods and practical guidance on how to use estimate bias and perform calibration in the wider range of experiments that microbiologists face in practice. As we work on those, I will be experimenting with user-interfaces such as an
I'm also imagining that Bioconductor would be ideal once we meet the above goals and add the required integration and documentation.
I agree and to begin with, I plan to add a phyloseq interface to a bias estimation function, since I'm most familiar with phyloseq and I expect most of our target users would be as well. A downside of phyloseq is that it is not possible for a phyloseq object to hold both the observed abundances and the known abundances for the control samples in a natural way, and so it is necessary to have two phyloseq and/or otu_table objects, for the observed and actual abundances. In contrast I think a single SummarizedExperiment object could include both tables.
Agreed. This wasn't necessary for the paper (since the manuscript ultimately is the documentation) but I will be expanding the documentation over the next 1-2 months.
Once added, the bias-estimation function will abstract out the need for the user to understand the error matrix. But to answer your question, the error matrix arises from dividing the observed abundances by the actual abundances. If a taxon is not in the sample and it is not observed, that results in a 0/0, or
Agreed. The section "Bias measurement as a means of evaluating and improving protocols" of our manuscript illustrates some ways in which bias estimation can be fruitfully applied to "quality control" experiments, and it would be useful to have a more step-by-step guide to the analysis of that section. More generally I hope to add vignettes and/or blog posts illustrating various other applications (many of which we outline in the Discussion) as we develop and apply them in our work and learn more from microbiologists about their needs.
The reason for this is a bit historical (for lack of a better word). I patterned this first version of the tutorial on what I actually do in the manuscript, so that someone looking at the analysis could follow along (including future me). E.g., the
I think the speed penalty for creating the simulated datasets for the tests is insignificant, so I'm not sure about the need to store them externally. But I would like to add tests for more functions. |
Beta Was this translation helpful? Give feedback.
-
Quick update on some of the above issues: I've finally gotten around to adding an easier-to-use set of functions for estimation and calibration, which works with matrices or phyloseq objects rather than tidy data frames. These functions still need better documentation but I've updated the tutorial to show how to use them: https://mikemc.github.io/metacal/articles/tutorial.html |
Beta Was this translation helpful? Give feedback.
-
I came here because this post.
I don't know where (if) you plan to submit this to CRAN or Bioconductor. I would recommend Bioconductor for the topic of the package. But in that case you'll get a more through review if you submit to one of these repositories.
In any case, it seems that the package doesn't work well with other packages like phyloseq, or metagenomeSeq, or with other useful classes like SummarizedExperiment (used in Bioconductor to store data about a sequencing experiment). Doing so would help to use the package in existing pipelines/scripts.
Some functions would need more documentation of the parameters that they need and have some examples (at least that is a requirement for Bioconductor packages).
To get the error matrix, it would be perfect if we could distinguish what type of
NA
is a0/0
(which imho for the purpose of the error matrix it should be then 0) or a500/0
.In the vignette it is clearly explained how does the package work. It would be interesting to know how to use this information in other downstream analysis. Also it focus a lot on the tidy data frames, which might reduce the memory footprint of the data if it is very sparse but there are other solutions like data.table or Matrix, so I'm not sure if such an extensive space should be given to it in the tutorial.
The vignette focus on the error matrix and estimating bias, but I couldn't find any function to do it.
I've seen the tests and they should be more minimal, include just the data and the tests (you can create and have data just for tests). But at the same time it should test more than just the
center
function.Many thanks for tacking the effort to create this nice package. I'm sure it will be very well received by the community.
Beta Was this translation helpful? Give feedback.
All reactions