Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for anndata #94

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Added support for anndata #94

wants to merge 13 commits into from

Conversation

enriquea
Copy link
Contributor

@enriquea enriquea commented Feb 6, 2025

PR Type

Enhancement, Tests


Description

  • Added support for exporting corrected iBAQ values to AnnData objects.

  • Introduced a new utility function create_anndata for AnnData creation.

  • Updated batch correction CLI to include AnnData export option.

  • Added and updated tests for AnnData export and related utilities.


Changes walkthrough 📝

Relevant files
Enhancement
correct_batches.py
Enable AnnData export in batch correction                               

ibaqpy/commands/correct_batches.py

  • Added export_anndata option to batch correction function.
  • Integrated AnnData export functionality in the batch correction
    workflow.
  • Updated CLI options to support AnnData export.
  • +56/-13 
    file_utils.py
    Introduce AnnData creation and file combination utilities

    ibaqpy/ibaq/file_utils.py

  • Added create_anndata function for AnnData object creation.
  • Implemented utility to combine multiple TSV files.
  • +149/-0 
    ibaqpy_postprocessing.py
    Refactor postprocessing utilities and remove redundancy   

    ibaqpy/ibaq/ibaqpy_postprocessing.py

  • Removed redundant combine_ibaq_tsv_files function.
  • Refactored existing functions for better readability.
  • +35/-61 
    Tests
    test_batch_correction.py
    Add tests for AnnData export in batch correction                 

    tests/test_batch_correction.py

  • Added tests for AnnData export functionality in batch correction.
  • Validated AnnData object creation and structure.
  • +17/-3   
    test_file_utils.py
    Add tests for AnnData creation and file utilities               

    tests/test_file_utils.py

  • Added tests for create_anndata function.
  • Added tests for combining multiple TSV files.
  • +48/-0   
    test_ibaqpy_postprocessing.py
    Update and clean up postprocessing tests                                 

    tests/test_ibaqpy_postprocessing.py

  • Removed redundant test for combine_ibaq_tsv_files.
  • Updated tests for postprocessing functions.
  • +0/-9     
    Dependencies
    requirements.txt
    Add AnnData dependency                                                                     

    requirements.txt

    • Added anndata dependency for AnnData support.
    +2/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • New Features
      • Enhanced the data correction workflow with additional customizable options for handling raw versus corrected values.
      • Introduced an option to export processing results as extended data files (.h5ad) for advanced downstream analysis.
      • Updated command-line options for improved clarity and flexibility.
      • Streamlined file processing utilities for robust data structure creation.
    • Bug Fixes
      • Improved error handling for file operations during data export.
    • Tests
      • Expanded test coverage to verify proper data export and processing behavior.
    • Chores
      • Updated package dependencies to support the new export capabilities.

    Copy link
    Contributor

    coderabbitai bot commented Feb 6, 2025

    Warning

    Rate limit exceeded

    @enriquea has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 52 seconds before requesting another review.

    ⌛ How to resolve this issue?

    After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

    We recommend that you space out your commits to avoid hitting the rate limit.

    🚦 How do rate limits work?

    CodeRabbit enforces hourly rate limits for each developer per organization.

    Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

    Please see our FAQ for further information.

    📥 Commits

    Reviewing files that changed from the base of the PR and between 7308831 and 3bfd5a4.

    📒 Files selected for processing (1)
    • ibaqpy/ibaq/file_utils.py (1 hunks)

    Walkthrough

    The changes update the batch correction command by enhancing the run_batch_correction function with new parameters and renaming an existing one, while updating its default values to constants from ibaqpy_commons. The functionality now includes exporting iBAQ data as an AnnData object if requested. Additional utility functions for creating AnnData objects and combining TSV files have been added, with related tests updated or removed accordingly. Formatting adjustments were applied in the postprocessing module and a new dependency (anndata) was added.

    Changes

    File(s) Change Summary
    ibaqpy/commands/correct_batches.py Enhanced run_batch_correction: renamed ibaq_column to ibaq_raw_column; added parameters (ibaq_corrected_column, export_anndata), updated default values with constants, integrated AnnData export logic, and updated CLI options.
    ibaqpy/ibaq/file_utils.py Introduced new functions: create_anndata (for constructing an AnnData object) and combine_ibaq_tsv_files (for merging TSV files).
    ibaqpy/ibaq/ibaqpy_postprocessing.py Removed combine_ibaq_tsv_files; reformatted function signatures for improved readability in various functions.
    requirements.txt Added dependency: anndata with version constraint ~=0.10.9.
    tests/test_batch_correction.py, tests/test_file_utils.py, tests/test_ibaqpy_postprocessing.py Updated tests for batch correction to reflect new parameters and AnnData export; added tests for file utilities; removed obsolete test for combine_ibaq_tsv_files.

    Sequence Diagram(s)

    sequenceDiagram
        participant U as User/CLI
        participant RC as run_batch_correction
        participant CA as create_anndata
        participant FS as FileSystem
    
        U->>RC: Invoke batch correction with export_anndata=True
        RC->>FS: Validate output file exists
        RC->>CA: Create AnnData object from DataFrame
        CA-->>RC: Return AnnData object
        RC->>FS: Save AnnData file (.h5ad)
        FS-->>RC: Confirm file saved
    
    Loading

    Possibly related PRs

    • update: ibaqpy #73: The changes in the main PR are related to the modifications in the create_anndata function, which is utilized in the main PR's run_batch_correction function to export data to an AnnData object.
    • Dev #93: The changes in the main PR are related to those in the retrieved PR as both involve the introduction and use of the IBAQ_BEC constant in the run_batch_correction function, enhancing the data transformation process.
    • little change #80: The changes in the main PR are related to the new create_anndata function introduced in the retrieved PR, as it is directly utilized in the updated run_batch_correction function to export data to an AnnData object.

    Suggested reviewers

    • ypriverol

    Poem

    Hoppity, hoppity through lines of code I roam,
    New parameters and functions now make a brighter home.
    AnnData sings its tale as files gracefully save,
    Refactored paths and options—oh, how they behave!
    With a twitch of my nose and a leap of delight,
    I cheer on these changes, coding day and night!


    Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link
    Contributor

    qodo-merge-pro bot commented Feb 6, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Error Handling

    The AnnData export code checks if output file exists after writing it, which is redundant. It should handle write errors directly in the try-except block.

    if not output_path.exists():
        raise FileNotFoundError(f"Output file {output} does not exist!")
    Performance

    The create_anndata function performs multiple pivot operations which could be optimized by reusing the pivoted data frame for layers.

    df_matrix = pivot_wider(
        df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
    )
    
    # Create the AnnData object with the main data matrix.
    adata = an.AnnData(
        X=df_matrix.to_numpy(),
        obs=df_matrix.index.to_frame(),
        var=df_matrix.columns.to_frame(),
    )
    
    def add_metadata(
        metadata_df: pd.DataFrame, key: str, cols: List[str]
    ) -> pd.DataFrame:
        """
        Add metadata columns to a DataFrame by mapping values from the original long dataframe.
    
        Parameters:
            metadata_df (pd.DataFrame): DataFrame (either adata.obs or adata.var) to update.
            key (str): The column name used as key (obs_col for observations, var_col for variables).
            cols (List[str]): List of metadata columns to add.
    
        Returns:
            pd.DataFrame: The updated metadata DataFrame.
        """
        for col in cols:
            if col not in df.columns:
                warnings.warn(
                    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'."
                )
                continue
            # Create a mapping from key to metadata values.
            mapping = df[[key, col]].drop_duplicates().set_index(key)[col]
            metadata_df[col] = metadata_df.index.map(mapping)
        return metadata_df
    
    # Add observation metadata, if provided.
    if obs_metadata_cols:
        adata.obs = add_metadata(adata.obs, obs_col, obs_metadata_cols)
    
    # Add variable metadata, if provided.
    if var_metadata_cols:
        adata.var = add_metadata(adata.var, var_col, var_metadata_cols)
    
    # Add additional layers (if any) using a similar pivot operation.
    if layer_cols:
        for layer_col in layer_cols:
            if layer_col not in df.columns:
                warnings.warn(
                    f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'."
                )
                continue
            df_layer = pivot_wider(
                df, row_name=obs_col, col_name=var_col, values=layer_col, fillna=True
            )
            adata.layers[layer_col] = df_layer.to_numpy()
    Data Validation

    The combine_ibaq_tsv_files function should validate that all input files have consistent column structure before concatenating.

    for file_path in file_paths:
        # Read the TSV file, skipping lines that start with the comment character
        df = pd.read_csv(file_path, sep=sep, comment=comment)
        dataframes.append(df)

    Copy link
    Contributor

    qodo-merge-pro bot commented Feb 6, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Ensure output directory exists

    Add error handling for the case when the output file path's parent directory
    doesn't exist. Currently, attempting to save to a non-existent directory will
    fail silently.

    ibaqpy/commands/correct_batches.py [148-152]

     if output:
         try:
    -        df_ibaq.to_csv(output, sep=sep, index=False)
    +        output_path = Path(output)
    +        output_path.parent.mkdir(parents=True, exist_ok=True)
    +        df_ibaq.to_csv(output_path, sep=sep, index=False)
         except Exception as e:
             raise ValueError(f"Failed to save output file: {str(e)}")

    [Suggestion has been applied]

    Suggestion importance[1-10]: 8

    __

    Why: Important enhancement that prevents silent failures when saving output files to non-existent directories, improving error handling and user experience.

    Medium
    Possible issue
    Validate input data not empty

    Add validation for empty DataFrames in create_anndata to prevent creation of
    invalid AnnData objects with zero dimensions.

    ibaqpy/ibaq/file_utils.py [11-19]

     def create_anndata(
         df: pd.DataFrame,
         obs_col: str,
         var_col: str,
         value_col: str,
         layer_cols: Optional[List[str]] = None,
         obs_metadata_cols: Optional[List[str]] = None,
         var_metadata_cols: Optional[List[str]] = None,
     ) -> an.AnnData:
    +    if df.empty:
    +        raise ValueError("Cannot create AnnData object from empty DataFrame")
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    __

    Why: Adds crucial validation to prevent creation of invalid AnnData objects with zero dimensions, which could cause downstream issues in data analysis.

    Medium

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 0

    🧹 Nitpick comments (7)
    tests/test_file_utils.py (2)

    18-25: Enhance test coverage for combine_ibaq_tsv_files.

    The test only verifies the shape of the DataFrame. Consider adding assertions for:

    • Content validation (e.g., specific column names, data types)
    • Edge cases (e.g., empty files, malformed files)
    • Error conditions (e.g., invalid separators, non-existent paths)

    28-48: Enhance test coverage for create_anndata.

    While the test verifies basic functionality, consider adding assertions for:

    • Content validation of the layers (e.g., specific values, data types)
    • Edge cases (e.g., missing metadata columns, empty DataFrames)
    • Error conditions (e.g., invalid column names, duplicate indices)
    tests/test_batch_correction.py (1)

    37-45: Enhance test coverage for AnnData object validation.

    While the test verifies the creation and shape of the AnnData object, consider adding assertions for:

    • Content validation of the layers (e.g., specific values, data types)
    • Metadata validation (e.g., column names, data types)
    • Error conditions (e.g., invalid export paths)
    ibaqpy/ibaq/file_utils.py (2)

    71-73: Add stacklevel to warnings.

    Add stacklevel=2 to the warnings to help users identify the source of the warning in their code.

    -                warnings.warn(
    -                    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'."
    -                )
    +                warnings.warn(
    +                    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'.",
    +                    stacklevel=2
    +                )
    -                warnings.warn(
    -                    f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'."
    -                )
    +                warnings.warn(
    +                    f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'.",
    +                    stacklevel=2
    +                )

    Also applies to: 92-94

    🧰 Tools
    🪛 Ruff (0.8.2)

    71-71: No explicit stacklevel keyword argument found

    (B028)


    104-149: Enhance error handling in combine_ibaq_tsv_files.

    Consider adding error handling for:

    • Invalid file formats or corrupted files
    • Permission issues when reading files
    • Memory constraints when handling large files
     def combine_ibaq_tsv_files(
         dir_path: str, pattern: str = "*", comment: str = "#", sep: str = "\t"
     ) -> pd.DataFrame:
         file_paths = glob.glob(f"{dir_path}/{pattern}")
    
         if not file_paths:
             raise FileNotFoundError(
                 f"No files found in the directory '{dir_path}' matching the pattern '{pattern}'."
             )
    
         dataframes = []
    
         for file_path in file_paths:
    -        # Read the TSV file, skipping lines that start with the comment character
    -        df = pd.read_csv(file_path, sep=sep, comment=comment)
    -        dataframes.append(df)
    +        try:
    +            # Read the TSV file, skipping lines that start with the comment character
    +            df = pd.read_csv(file_path, sep=sep, comment=comment)
    +            dataframes.append(df)
    +        except pd.errors.EmptyDataError:
    +            warnings.warn(f"Skipping empty file: {file_path}", stacklevel=2)
    +            continue
    +        except (pd.errors.ParserError, pd.errors.ParserWarning) as e:
    +            raise ValueError(f"Error parsing file {file_path}: {str(e)}")
    +        except PermissionError:
    +            raise PermissionError(f"Permission denied when reading file: {file_path}")
    +        except MemoryError:
    +            raise MemoryError(f"Insufficient memory to read file: {file_path}")
    
         # Concatenate all DataFrames
         combined_df = pd.concat(dataframes, ignore_index=True)
    
         return combined_df
    ibaqpy/commands/correct_batches.py (2)

    154-171: Improve error handling in AnnData export.

    The error handling in the AnnData export block could be improved by properly chaining exceptions.

    Apply this diff to improve error handling:

             try:
                 adata.write(adata_filename)
             except Exception as e:
    -            raise ValueError(f"Failed to write AnnData object: {e}")
    +            raise ValueError(f"Failed to write AnnData object: {e}") from e
    🧰 Tools
    🪛 Ruff (0.8.2)

    170-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    217-222: Fix formatting inconsistency in CLI option.

    The formatting of the ibaq_raw_column option is inconsistent with other options. The indentation and line breaks should match the style used in other options.

    Apply this diff to fix the formatting:

    -@click.option("-ibaq",
    -              "--ibaq_raw_column",
    -              help="Name of the raw iBAQ column",
    -              required=False,
    -              default=IBAQ
    -        )
    +@click.option(
    +    "-ibaq",
    +    "--ibaq_raw_column",
    +    help="Name of the raw iBAQ column",
    +    required=False,
    +    default=IBAQ,
    +)
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between de2f119 and a9e0731.

    📒 Files selected for processing (7)
    • ibaqpy/commands/correct_batches.py (9 hunks)
    • ibaqpy/ibaq/file_utils.py (1 hunks)
    • ibaqpy/ibaq/ibaqpy_postprocessing.py (7 hunks)
    • requirements.txt (1 hunks)
    • tests/test_batch_correction.py (3 hunks)
    • tests/test_file_utils.py (1 hunks)
    • tests/test_ibaqpy_postprocessing.py (0 hunks)
    💤 Files with no reviewable changes (1)
    • tests/test_ibaqpy_postprocessing.py
    ✅ Files skipped from review due to trivial changes (1)
    • requirements.txt
    🧰 Additional context used
    🪛 Ruff (0.8.2)
    ibaqpy/ibaq/file_utils.py

    71-71: No explicit stacklevel keyword argument found

    (B028)


    92-92: No explicit stacklevel keyword argument found

    (B028)

    ibaqpy/commands/correct_batches.py

    170-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    ⏰ Context from checks skipped due to timeout of 90000ms (2)
    • GitHub Check: build
    • GitHub Check: Codacy Static Code Analysis
    🔇 Additional comments (3)
    ibaqpy/ibaq/ibaqpy_postprocessing.py (1)

    18-20: LGTM!

    The formatting changes improve readability while maintaining functionality.

    Also applies to: 57-60, 146-153, 156-158, 221-223

    ibaqpy/commands/correct_batches.py (2)

    2-2: LGTM!

    The new imports appropriately support the AnnData export functionality and the use of constants for default values.

    Also applies to: 8-9


    80-106: LGTM!

    The function signature updates are well-documented and use appropriate default values from constants.

    Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 4

    🧹 Nitpick comments (2)
    ibaqpy/ibaq/file_utils.py (2)

    78-79: Add validation for metadata values.

    The metadata mapping could result in NaN values if the mapping is incomplete. Add validation to ensure all metadata values are mapped correctly.

    Apply this diff to add validation:

                 mapping = df[[key, col]].drop_duplicates().set_index(key)[col]
    +            # Check for duplicate keys with different values
    +            if len(mapping) != len(df[[key, col]].drop_duplicates()):
    +                warnings.warn(
    +                    f"Found duplicate keys with different values in metadata column '{col}'.",
    +                    stacklevel=2
    +                )
                 metadata_df[col] = metadata_df.index.map(mapping)
    +            # Check for unmapped values
    +            if metadata_df[col].isna().any():
    +                warnings.warn(
    +                    f"Some metadata values in column '{col}' could not be mapped.",
    +                    stacklevel=2
    +                )

    149-151: Add validation for concatenation result.

    The concatenation operation should be validated to ensure the result is not empty and has the expected shape.

    Apply this diff to add validation:

         # Concatenate all DataFrames
         combined_df = pd.concat(dataframes, ignore_index=True)
     
    +    # Validate concatenation result
    +    if combined_df.empty:
    +        raise ValueError("Concatenation resulted in an empty DataFrame")
    +    
    +    # Validate that the number of rows matches the sum of input DataFrames
    +    expected_rows = sum(len(df) for df in dataframes)
    +    if len(combined_df) != expected_rows:
    +        warnings.warn(
    +            f"Expected {expected_rows} rows after concatenation, but got {len(combined_df)}",
    +            stacklevel=2
    +        )
    +
         return combined_df
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between a9e0731 and 7308831.

    📒 Files selected for processing (1)
    • ibaqpy/ibaq/file_utils.py (1 hunks)
    🧰 Additional context used
    🪛 Ruff (0.8.2)
    ibaqpy/ibaq/file_utils.py

    73-73: No explicit stacklevel keyword argument found

    (B028)


    94-94: No explicit stacklevel keyword argument found

    (B028)

    ⏰ Context from checks skipped due to timeout of 90000ms (2)
    • GitHub Check: Codacy Static Code Analysis
    • GitHub Check: build

    Comment on lines +73 to +75
    warnings.warn(
    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'."
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Add stacklevel to warnings.

    The warnings should include a stacklevel parameter to ensure they point to the correct line in the user's code.

    Apply this diff to add stacklevel:

                     warnings.warn(
    -                    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'."
    +                    f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'.",
    +                    stacklevel=2
                     )
    
                     warnings.warn(
    -                    f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'."
    +                    f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'.",
    +                    stacklevel=2
                     )

    Also applies to: 94-96

    🧰 Tools
    🪛 Ruff (0.8.2)

    73-73: No explicit stacklevel keyword argument found

    (B028)

    ibaqpy/ibaq/file_utils.py Outdated Show resolved Hide resolved
    Comment on lines +46 to +48
    df_matrix = pivot_wider(
    df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Add validation for pivot result.

    The pivot_wider operation could fail or produce unexpected results. Add validation to ensure the pivoted DataFrame is not empty and has the expected shape.

    Apply this diff to add validation:

         df_matrix = pivot_wider(
             df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
         )
    +    if df_matrix.empty:
    +        raise ValueError("Pivot operation resulted in an empty DataFrame")
    +    if df_matrix.shape[0] == 0 or df_matrix.shape[1] == 0:
    +        raise ValueError("Pivot operation resulted in a DataFrame with zero dimensions")
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    df_matrix = pivot_wider(
    df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
    )
    df_matrix = pivot_wider(
    df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
    )
    if df_matrix.empty:
    raise ValueError("Pivot operation resulted in an empty DataFrame")
    if df_matrix.shape[0] == 0 or df_matrix.shape[1] == 0:
    raise ValueError("Pivot operation resulted in a DataFrame with zero dimensions")

    ibaqpy/ibaq/file_utils.py Outdated Show resolved Hide resolved
    enriquea and others added 5 commits February 6, 2025 23:27
    Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    @enriquea enriquea self-assigned this Feb 6, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant