Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

Open
qqwxp opened this issue Jul 3, 2024 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@qqwxp
Copy link
Contributor

qqwxp commented Jul 3, 2024

Current Feature
The current mStat_aggregate_by_taxonomy() outputs where the sample order of the table is different from the sample order in the input table. This behavior is not documented, which can lead to errors in subsequent analyses.

Proposed Enhancement
Modify the function to ensure that the sample order in the output table matches the sample order in the input table.

Benefits of the Enhancement
Ensuring the sample order consistency between the input and output tables will:

  • Prevent potential errors in subsequent analyses, such as incorrect calculations when combining results with the original table.
  • Improve the usability and reliability of the function for users.
  • Reduce the need for users to manually check and reorder samples, saving time and reducing the risk of human error.

If a user calculates a metric like the F/B ratio base on the outputs of mStat_aggregate_by_taxonomy(data.obj, feature.level = "phylum") and directly cbind()s this result to the original table without noticing the order change, the resulting table will have mismatched data.

Ensuring the output order matches the input order will help users avoid such pitfalls.

Thank you for considering this enhancement.

@qqwxp qqwxp added the enhancement New feature or request label Jul 3, 2024
@cafferychen777
Copy link
Owner

Dear qqwxp,

Thank you for bringing this important issue to our attention and for your detailed explanation of the problem and its potential impacts. We greatly appreciate your contribution to improving the MicrobiomeStat package.

I'm pleased to inform you that we have addressed this issue in our latest update to the mStat_aggregate_by_taxonomy() function. The function has been modified to ensure that the sample order in the output table now matches the sample order in the input table.

Key changes implemented:

  1. We now store the original sample order at the beginning of the function.
  2. The aggregation process has been updated to use more recent tidyr functions for improved performance.
  3. After aggregation, we explicitly reorder the output to match the original sample order.
  4. We've also made sure that the output maintains its matrix structure even when only one row remains after removing zero-sum rows.

These changes should resolve the issues you highlighted, including:

  • Preventing potential errors in subsequent analyses
  • Improving the usability and reliability of the function
  • Eliminating the need for manual sample reordering

Your example of calculating the F/B ratio was particularly helpful in understanding the potential consequences of this issue. With this update, users should no longer encounter mismatched data when combining results with the original table.

Thank you again for your valuable feedback. It's contributions like yours that help us continually improve MicrobiomeStat. We encourage you to update to the latest version and let us know if you have any further suggestions or if you encounter any issues.

Best regards,
Chen Yang
MicrobiomeStat Development Team

@cafferychen777 cafferychen777 added the good first issue Good for newcomers label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants