Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default: update.tree = TRUE #661

Open
TuomasBorman opened this issue Nov 27, 2024 · 9 comments
Open

Default: update.tree = TRUE #661

TuomasBorman opened this issue Nov 27, 2024 · 9 comments

Comments

@TuomasBorman
Copy link
Contributor

Agglomeration and subsetting functions have update.tree parameter that controls whether to simplify the the tree to correspond the data after agglomeration/subsetting. The default choice should be that the tree is updated, i.e., update.tree = TRUE.

@TuomasBorman
Copy link
Contributor Author

Also update examples and OMA to reflect this new default choice

@antagomir
Copy link
Member

How about the node labels?

@TuomasBorman
Copy link
Contributor Author

Not sure if I answer to your question, but the names of tree nodes are not updated. The pruning is done internally by utilizing TreeSummarizedExperiment::subsetByLeaf(). It selects those nodes that are represent in nodes. Moreover, it simplifies the tree by removing those branches that have only one descendant.

x <- do.call(subsetByLeaf, args)

@antagomir
Copy link
Member

What I meant was whether it would make sense to also update node labels so that the user could, by default, avoid cryptic errors like this:

library(mia)
data(GlobalPatterns)
tse <- GlobalPatterns

# Agglomerate by Species and subset by prevalence
tse <- agglomerateByRank(tse, rank="Family", update.tree = TRUE)

# Run MDS with UniFrac
library(scater)
tse <- runMDS(tse,
              FUN = getDissimilarity,
              name = "Unifrac",
              method = "unifrac",
              tree = rowTree(tse),
              ntop = nrow(tse),
              assay.type = "counts",
              weighted = TRUE)

Error: Incompatible tree and abundance table! Please try to provide 'node.label'.

@TuomasBorman
Copy link
Contributor Author

Ahh, yes. It should be relatively easy to do that. And I think this is good idea.

Related to runMDS, we could have addMDS and getMDS that have default value for the dissimilarity function. That would make the usage much simpler.

@antagomir
Copy link
Member

Sounds like a good idea!

@RiboRings
Copy link
Member

Just as a note, the cryptic error doesn't go away when using update.tree = FALSE or update.tree = TRUE alike:

library(mia)
library(scater)

data(GlobalPatterns)
tse <- GlobalPatterns

# update.tree = TRUE

tse2 <- agglomerateByRank(tse, rank = "Family", update.tree = TRUE)
 
tse2 <- runMDS(tse2,
               FUN = getDissimilarity,
               name = "Unifrac",
               method = "unifrac",
               tree = rowTree(tse2),
               ntop = nrow(tse2),
               assay.type = "counts",
               weighted = TRUE)

Error: Incompatible tree and abundance table! Please try to provide 'node.label'.

# update.tree = FALSE

tse2 <- agglomerateByRank(tse, rank = "Family", update.tree = FALSE)

tse2 <- runMDS(tse2,
               FUN = getDissimilarity,
               name = "Unifrac",
               method = "unifrac",
               tree = rowTree(tse2),
               ntop = nrow(tse2),
               assay.type = "counts",
               weighted = TRUE)

Error: Incompatible tree and abundance table! Please try to provide 'node.label'.

@TuomasBorman
Copy link
Contributor Author

Yep, that is because the names of nodes are not updated even though the tree is pruned

@antagomir
Copy link
Member

And based on the above discussion the consensus seems to be to change this so that also node labels would be updated by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants