-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for anndata #94
base: master
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@enriquea has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 52 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe changes update the batch correction command by enhancing the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User/CLI
participant RC as run_batch_correction
participant CA as create_anndata
participant FS as FileSystem
U->>RC: Invoke batch correction with export_anndata=True
RC->>FS: Validate output file exists
RC->>CA: Create AnnData object from DataFrame
CA-->>RC: Return AnnData object
RC->>FS: Save AnnData file (.h5ad)
FS-->>RC: Confirm file saved
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
tests/test_file_utils.py (2)
18-25
: Enhance test coverage forcombine_ibaq_tsv_files
.The test only verifies the shape of the DataFrame. Consider adding assertions for:
- Content validation (e.g., specific column names, data types)
- Edge cases (e.g., empty files, malformed files)
- Error conditions (e.g., invalid separators, non-existent paths)
28-48
: Enhance test coverage forcreate_anndata
.While the test verifies basic functionality, consider adding assertions for:
- Content validation of the layers (e.g., specific values, data types)
- Edge cases (e.g., missing metadata columns, empty DataFrames)
- Error conditions (e.g., invalid column names, duplicate indices)
tests/test_batch_correction.py (1)
37-45
: Enhance test coverage for AnnData object validation.While the test verifies the creation and shape of the AnnData object, consider adding assertions for:
- Content validation of the layers (e.g., specific values, data types)
- Metadata validation (e.g., column names, data types)
- Error conditions (e.g., invalid export paths)
ibaqpy/ibaq/file_utils.py (2)
71-73
: Add stacklevel to warnings.Add
stacklevel=2
to the warnings to help users identify the source of the warning in their code.- warnings.warn( - f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'." - ) + warnings.warn( + f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'.", + stacklevel=2 + )- warnings.warn( - f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'." - ) + warnings.warn( + f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'.", + stacklevel=2 + )Also applies to: 92-94
🧰 Tools
🪛 Ruff (0.8.2)
71-71: No explicit
stacklevel
keyword argument found(B028)
104-149
: Enhance error handling incombine_ibaq_tsv_files
.Consider adding error handling for:
- Invalid file formats or corrupted files
- Permission issues when reading files
- Memory constraints when handling large files
def combine_ibaq_tsv_files( dir_path: str, pattern: str = "*", comment: str = "#", sep: str = "\t" ) -> pd.DataFrame: file_paths = glob.glob(f"{dir_path}/{pattern}") if not file_paths: raise FileNotFoundError( f"No files found in the directory '{dir_path}' matching the pattern '{pattern}'." ) dataframes = [] for file_path in file_paths: - # Read the TSV file, skipping lines that start with the comment character - df = pd.read_csv(file_path, sep=sep, comment=comment) - dataframes.append(df) + try: + # Read the TSV file, skipping lines that start with the comment character + df = pd.read_csv(file_path, sep=sep, comment=comment) + dataframes.append(df) + except pd.errors.EmptyDataError: + warnings.warn(f"Skipping empty file: {file_path}", stacklevel=2) + continue + except (pd.errors.ParserError, pd.errors.ParserWarning) as e: + raise ValueError(f"Error parsing file {file_path}: {str(e)}") + except PermissionError: + raise PermissionError(f"Permission denied when reading file: {file_path}") + except MemoryError: + raise MemoryError(f"Insufficient memory to read file: {file_path}") # Concatenate all DataFrames combined_df = pd.concat(dataframes, ignore_index=True) return combined_dfibaqpy/commands/correct_batches.py (2)
154-171
: Improve error handling in AnnData export.The error handling in the AnnData export block could be improved by properly chaining exceptions.
Apply this diff to improve error handling:
try: adata.write(adata_filename) except Exception as e: - raise ValueError(f"Failed to write AnnData object: {e}") + raise ValueError(f"Failed to write AnnData object: {e}") from e🧰 Tools
🪛 Ruff (0.8.2)
170-170: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
217-222
: Fix formatting inconsistency in CLI option.The formatting of the
ibaq_raw_column
option is inconsistent with other options. The indentation and line breaks should match the style used in other options.Apply this diff to fix the formatting:
-@click.option("-ibaq", - "--ibaq_raw_column", - help="Name of the raw iBAQ column", - required=False, - default=IBAQ - ) +@click.option( + "-ibaq", + "--ibaq_raw_column", + help="Name of the raw iBAQ column", + required=False, + default=IBAQ, +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ibaqpy/commands/correct_batches.py
(9 hunks)ibaqpy/ibaq/file_utils.py
(1 hunks)ibaqpy/ibaq/ibaqpy_postprocessing.py
(7 hunks)requirements.txt
(1 hunks)tests/test_batch_correction.py
(3 hunks)tests/test_file_utils.py
(1 hunks)tests/test_ibaqpy_postprocessing.py
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test_ibaqpy_postprocessing.py
✅ Files skipped from review due to trivial changes (1)
- requirements.txt
🧰 Additional context used
🪛 Ruff (0.8.2)
ibaqpy/ibaq/file_utils.py
71-71: No explicit stacklevel
keyword argument found
(B028)
92-92: No explicit stacklevel
keyword argument found
(B028)
ibaqpy/commands/correct_batches.py
170-170: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (3)
ibaqpy/ibaq/ibaqpy_postprocessing.py (1)
18-20
: LGTM!The formatting changes improve readability while maintaining functionality.
Also applies to: 57-60, 146-153, 156-158, 221-223
ibaqpy/commands/correct_batches.py (2)
2-2
: LGTM!The new imports appropriately support the AnnData export functionality and the use of constants for default values.
Also applies to: 8-9
80-106
: LGTM!The function signature updates are well-documented and use appropriate default values from constants.
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
ibaqpy/ibaq/file_utils.py (2)
78-79
: Add validation for metadata values.The metadata mapping could result in
NaN
values if the mapping is incomplete. Add validation to ensure all metadata values are mapped correctly.Apply this diff to add validation:
mapping = df[[key, col]].drop_duplicates().set_index(key)[col] + # Check for duplicate keys with different values + if len(mapping) != len(df[[key, col]].drop_duplicates()): + warnings.warn( + f"Found duplicate keys with different values in metadata column '{col}'.", + stacklevel=2 + ) metadata_df[col] = metadata_df.index.map(mapping) + # Check for unmapped values + if metadata_df[col].isna().any(): + warnings.warn( + f"Some metadata values in column '{col}' could not be mapped.", + stacklevel=2 + )
149-151
: Add validation for concatenation result.The concatenation operation should be validated to ensure the result is not empty and has the expected shape.
Apply this diff to add validation:
# Concatenate all DataFrames combined_df = pd.concat(dataframes, ignore_index=True) + # Validate concatenation result + if combined_df.empty: + raise ValueError("Concatenation resulted in an empty DataFrame") + + # Validate that the number of rows matches the sum of input DataFrames + expected_rows = sum(len(df) for df in dataframes) + if len(combined_df) != expected_rows: + warnings.warn( + f"Expected {expected_rows} rows after concatenation, but got {len(combined_df)}", + stacklevel=2 + ) + return combined_df
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibaqpy/ibaq/file_utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
ibaqpy/ibaq/file_utils.py
73-73: No explicit stacklevel
keyword argument found
(B028)
94-94: No explicit stacklevel
keyword argument found
(B028)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
warnings.warn( | ||
f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add stacklevel to warnings.
The warnings should include a stacklevel
parameter to ensure they point to the correct line in the user's code.
Apply this diff to add stacklevel
:
warnings.warn(
- f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'."
+ f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'.",
+ stacklevel=2
)
warnings.warn(
- f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'."
+ f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'.",
+ stacklevel=2
)
Also applies to: 94-96
🧰 Tools
🪛 Ruff (0.8.2)
73-73: No explicit stacklevel
keyword argument found
(B028)
df_matrix = pivot_wider( | ||
df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for pivot result.
The pivot_wider
operation could fail or produce unexpected results. Add validation to ensure the pivoted DataFrame is not empty and has the expected shape.
Apply this diff to add validation:
df_matrix = pivot_wider(
df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True
)
+ if df_matrix.empty:
+ raise ValueError("Pivot operation resulted in an empty DataFrame")
+ if df_matrix.shape[0] == 0 or df_matrix.shape[1] == 0:
+ raise ValueError("Pivot operation resulted in a DataFrame with zero dimensions")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
df_matrix = pivot_wider( | |
df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True | |
) | |
df_matrix = pivot_wider( | |
df, row_name=obs_col, col_name=var_col, values=value_col, fillna=True | |
) | |
if df_matrix.empty: | |
raise ValueError("Pivot operation resulted in an empty DataFrame") | |
if df_matrix.shape[0] == 0 or df_matrix.shape[1] == 0: | |
raise ValueError("Pivot operation resulted in a DataFrame with zero dimensions") |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This reverts commit 01843c7.
PR Type
Enhancement, Tests
Description
Added support for exporting corrected iBAQ values to AnnData objects.
Introduced a new utility function
create_anndata
for AnnData creation.Updated batch correction CLI to include AnnData export option.
Added and updated tests for AnnData export and related utilities.
Changes walkthrough 📝
correct_batches.py
Enable AnnData export in batch correction
ibaqpy/commands/correct_batches.py
export_anndata
option to batch correction function.workflow.
file_utils.py
Introduce AnnData creation and file combination utilities
ibaqpy/ibaq/file_utils.py
create_anndata
function for AnnData object creation.ibaqpy_postprocessing.py
Refactor postprocessing utilities and remove redundancy
ibaqpy/ibaq/ibaqpy_postprocessing.py
combine_ibaq_tsv_files
function.test_batch_correction.py
Add tests for AnnData export in batch correction
tests/test_batch_correction.py
test_file_utils.py
Add tests for AnnData creation and file utilities
tests/test_file_utils.py
create_anndata
function.test_ibaqpy_postprocessing.py
Update and clean up postprocessing tests
tests/test_ibaqpy_postprocessing.py
combine_ibaq_tsv_files
.requirements.txt
Add AnnData dependency
requirements.txt
anndata
dependency for AnnData support.Summary by CodeRabbit