Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

qqwxp · 2024-07-03T08:02:57Z

Current Feature
The current mStat_aggregate_by_taxonomy() outputs where the sample order of the table is different from the sample order in the input table. This behavior is not documented, which can lead to errors in subsequent analyses.

Proposed Enhancement
Modify the function to ensure that the sample order in the output table matches the sample order in the input table.

Benefits of the Enhancement
Ensuring the sample order consistency between the input and output tables will:

Prevent potential errors in subsequent analyses, such as incorrect calculations when combining results with the original table.
Improve the usability and reliability of the function for users.
Reduce the need for users to manually check and reorder samples, saving time and reducing the risk of human error.

If a user calculates a metric like the F/B ratio base on the outputs of mStat_aggregate_by_taxonomy(data.obj, feature.level = "phylum") and directly cbind()s this result to the original table without noticing the order change, the resulting table will have mismatched data.

Ensuring the output order matches the input order will help users avoid such pitfalls.

Thank you for considering this enhancement.

The text was updated successfully, but these errors were encountered:

cafferychen777 · 2024-09-23T06:42:11Z

Dear qqwxp,

Thank you for bringing this important issue to our attention and for your detailed explanation of the problem and its potential impacts. We greatly appreciate your contribution to improving the MicrobiomeStat package.

I'm pleased to inform you that we have addressed this issue in our latest update to the mStat_aggregate_by_taxonomy() function. The function has been modified to ensure that the sample order in the output table now matches the sample order in the input table.

Key changes implemented:

We now store the original sample order at the beginning of the function.
The aggregation process has been updated to use more recent tidyr functions for improved performance.
After aggregation, we explicitly reorder the output to match the original sample order.
We've also made sure that the output maintains its matrix structure even when only one row remains after removing zero-sum rows.

These changes should resolve the issues you highlighted, including:

Preventing potential errors in subsequent analyses
Improving the usability and reliability of the function
Eliminating the need for manual sample reordering

Your example of calculating the F/B ratio was particularly helpful in understanding the potential consequences of this issue. With this update, users should no longer encounter mismatched data when combining results with the original table.

Thank you again for your valuable feedback. It's contributions like yours that help us continually improve MicrobiomeStat. We encourage you to update to the latest version and let us know if you have any further suggestions or if you encounter any issues.

Best regards,
Chen Yang
MicrobiomeStat Development Team

qqwxp added the enhancement New feature or request label Jul 3, 2024

cafferychen777 added the good first issue Good for newcomers label Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

qqwxp commented Jul 3, 2024

cafferychen777 commented Sep 23, 2024

Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

Maintain Sample Order Consistency in mStat_aggregate_by_taxonomy() Output #60

Comments

qqwxp commented Jul 3, 2024

cafferychen777 commented Sep 23, 2024