Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function naming, logic, aliases #474

Open
TuomasBorman opened this issue Nov 24, 2023 · 7 comments
Open

Function naming, logic, aliases #474

TuomasBorman opened this issue Nov 24, 2023 · 7 comments
Assignees

Comments

@TuomasBorman
Copy link
Contributor

We have to think about basic principles of functions.

  1. Less functions is better --> remove aliases, combine similar functions
  2. Utilize capabilities of TreeSE more --> return TreeSE if possible
  3. Function naming --> Consider key words (import, run...). Make names shorter (runCorrelation()...).
  4. Make feature-related functions support columns

1. *cols() & *rows() and related functions

We have some functions that are doing some operation to row- or column-wise. One of these is mergeCols() / mergeRows() function. I think we should combine these functions and add MARGIN parameter like we have done in cluster() function. --> merge(MAGIN = "rows")

Moreover, we have some functions that are doing similar things but slightly differently (mergeRows() vs mergeRowsByRanks()). I think we could combine these function by adding by.rank parameter. --> agglomerate(MARGIN = "rows", by.rank = TRUE)
- by.rank parameter? TRUE --> agglomerateByRank; FALSE --> mergeRows
- warning if by.rank TRUE and column is not a rank.

These affect also splitOn()/unsplitOn() and splitByRanks() functions --> split()

We have multiple aliases for transformAssay() function. Why don't we have just transformAssay() (or maybe transform()) and unexport all aliases. It is more unclear when we have lots of aliases.

2. Return TreeSE

Some functions (such as getExperimentCrossAssociation()) does not return TreeSE but only the table or list of matrices. We could consider if it is better to always return TreeSE or if it makes things more complicated to do so. Results that cannot be stored to reducedDIm, colData or rowData (or other commonly used slot) can be stored to metadata.

Good things:

  • For example, calculating correlation can be take a while, so it might be good to store it to main object so that it is somewhere safe and stored
  • It would simplify things --> always return TreeSE
  • I think it is best practice to store calculations to the object. All the results can be found from same place.

Negative thing:

  • For example, getRareFeatures() return a vector of rare taxa. Storing it to metadata might complicate things because the function can be used to find rows to subset.

3. Function naming

I think the function naming can be improved. We could use key words for tasks

  • run --> calculate something and add results to TreeSE: runAlpha, runPCA...
  • calculate --> calculate something and return only result (not stored in TreeSE); calculatePCA, calculateRDA
  • get --> get something from the object: getTaxonomyTree
  • add --> add something to the object: addTaxonomyTree
  • import --> import data files to TreeSE: importDADA2
  • makeSomething or convertTo --> convert TreeSE to some other object: makePhyloseq

Also, we can stick with rows and cols in function naming. getRareRows() for instance (or perhaps getRare(MARGIN = "rows")).

4. Make feature-related functions support columns

For example, addTaxonomyTree() adds tree only to rows. However, since columns can have also tree, we could add support for columns. We might have to consider ranks for rows and columns separately: TAXONOMY_RANKS --> ROW_RANKS & COL_RANKS


These are things that we might want to discuss before we finalize miaverse, After formal publication, I think it is not good that we change names and these conventions. This is quite free thoughts but hopefully initialize some discussion from where we can start.

@antagomir
Copy link
Member

I agree with all suggestions here.

@TuomasBorman
Copy link
Contributor Author

addTaxonomyTree() and taxonomyTree() are calculating tree based on taxonomy info provided in rowData. This is misleading users because the calculated tree is not really a taxonomy tree. Taxonomy tree refers usually phylogeny / phylogenetic tree that is calculated based on genetic distances.

The calculated tree takes into account only "hierarchy distances" without genetic distances "species A is x edges away from species B". --> So the calculated tree is "hierarchy tree"

I propose that we rename taxonomyTree() to getTree() (highlights the returned value which is the tree itself) and addTaxonomyTree() to addTree().

@antagomir @thpralas

@antagomir
Copy link
Member

Even that could be confused with phylogenetic tree.. how about getHierarchy() or mapRanksToTree() or makeTreeFromRanks()?

If the tree itself is formally similar to phylogenetic trees, do we need a separate "add tree" function for this? Can we just use the same procedure than with phylogenetic trees. Then there would be only one additional function, which is "get tree". Users would then need to add this separately. This is an extra step but perhaps it would help some users better understand the procedure?

@TuomasBorman
Copy link
Contributor Author

I would go then with getHierarchyTree --> explains quite well what is happening

I think add* is quite convenient and we should not remove it. Rather we should focus on documentation which is not too clear. See for example reference page https://microbiome.github.io/mia/reference/index.html It could be much more clear especially when we don't even have that many functions.

I think we could move addHierarchyTree under separate page of manual (not under taxonomy). In the (excel) table that I sent you, there was some ideas how to group these functions more clearly.

@TuomasBorman
Copy link
Contributor Author

In the documetation, we could refer this as hierarchy tree and not taxonomy tree. The possibility for misunderstanding is not that big then anymore

@antagomir
Copy link
Member

Ok

@thpral
Copy link
Collaborator

thpral commented Feb 21, 2024

I made a commit but I forgot referencing this issue in the commit name : 1e0b96f

@thpral thpral self-assigned this Feb 21, 2024
thpral added a commit that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants