Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniFrac #336

Closed
microsud opened this issue Jan 13, 2023 · 17 comments
Closed

UniFrac #336

microsud opened this issue Jan 13, 2023 · 17 comments
Assignees

Comments

@microsud
Copy link
Member

Currently the UniFrac distance calculation is ported from phyloseq to work with TreeSummarizedExperiment objects. I recently came across this issue in phyloseq github
joey711/phyloseq#956

It seems that there are some differences in calculation between qiime2 and phyloseq. Have anyone tested this? Is it possible to check if it is also the case with tse?

@antagomir
Copy link
Member

Thanks, this is important.

The conclusion seem to be that rbiom matches best with QIIME2 (and gives similar results to 2 other independent R implementations as well but not with phyloseq). It is also the fastest R implementation.

-> mia should be updated to use rbiom implementation!

@antagomir
Copy link
Member

any volunteers for a quick PR..?

@antagomir
Copy link
Member

Also see FuncDiv that implemented this same suggestion in issue 2.

@antagomir
Copy link
Member

Link with issue #340

@Insaynoah
Copy link
Contributor

Insaynoah commented Apr 18, 2024

The calculateUnifrac file seems to implement it's own version of unifrac inspired from the phyloseq function without actually using other packages : calculate_Unifrac.R.

I've been trying to find the difference between the phyloseq implementation and the rbiom implementation to no avail.
The Rbiom version uses a function coded in c++ which i have a hard time understanding given my limited knowledge on both c++ and unifrac.

@antagomir
Copy link
Member

antagomir commented Apr 18, 2024

The idea would be to use this as a dependency, and just call the rbiom function (internally from mia), instead of the current solution which is to call phyloseq internally -> you wouldn't need to know how rbiom works internally if you can replace the phyloseq unifrac call with the rbiom unifrac call.

@TuomasBorman
Copy link
Contributor

@thpralas has checked this unifrac thing little bit

@antagomir
Copy link
Member

Ok - then let @thpralas take care

@antagomir antagomir assigned thpral and unassigned Insaynoah Apr 18, 2024
@antagomir
Copy link
Member

It was not assigned to anyone, so I had passed it on. Let's aim to use the assignments on the top right to indicate if someone works on something, this way we can avoid overlaps.

@TuomasBorman
Copy link
Contributor

Yes, that's ok. As we have now more contributors, it is better to switch using the "assignees" field.

@thpral
Copy link
Collaborator

thpral commented Apr 18, 2024

I have not checked the rbiom implementation of unifrac yet but I looked at the picante implementation (picante::unifrac).
My initial goal was to replace mia runUnifrac function by calling picante::unifrac internally in calculateUnifrac.

The picante implementation takes as parameters a community matrix and a tree. I think that the aggregation of the matrix based on nodeLab should be done in calculateUnifrac function (instead of doing it in the runUnifrac function) because it is necessary to aggregate the matrix even though we use picante::unifrac.
However, with @TuomasBorman help, I realized while debugging calculateUnifrac and picante::unifrac that picante::unifrac may be too slow for big TreeSummarizedExperiment (e.g. GlobalPatterns), and I did not manage to get a result.
The picante package does not have a fast implementation of unifrac so I do not think we could use it internally to replace runUnifrac.
I am going to look at the rbiom implementation of unifrac.

@TuomasBorman
Copy link
Contributor

TuomasBorman commented Apr 18, 2024

Little bit related to this same thing: also faith index implementation could be changed so that we use external package instead of our own implementation.

Good thing about picante was that it included also faith alpha diversity index, rbiom seems not to include it.

@antagomir
Copy link
Member

Note the original discussion above. It included both picante and rbiom, and the latter seemed better overall: "The conclusion seem to be that rbiom matches best with QIIME2 (and gives similar results to 2 other independent R implementations as well but not with phyloseq). It is also the fastest R implementation."

@antagomir
Copy link
Member

I opened a distinct issue on Faith in #522

@antagomir
Copy link
Member

My primary preference is rbiom for the above mentioned reason

@antagomir
Copy link
Member

@thpralas is this issue ready & could be closed?

@thpral
Copy link
Collaborator

thpral commented Jul 22, 2024

@thpralas is this issue ready & could be closed?

Yes I think this issue is now ready and can be closed since mia unifrac implementation now internally calls rbiom instead of phyloseq (#523)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants