add code to save model predictions to BigQuery #3807

Mnoble-19 · 2024-11-05T09:37:13Z

Description

[Adds code to save model predictions to BigQuery]

Related Issues

Changes Made

Add code to enable satellite model predictions to be saved to BigQuery
Brief description of change 2
Brief description of change 3

Testing

Tested locally
Tested against staging environment
Relevant tests passed: [List test names]

Affected Services

Which services were modified:
- Service 1
- Service 2
- Other...

Endpoints Ready for Testing

New endpoints ready for testing:
- Endpoint 1
- Endpoint 2
- Other...

API Documentation Updated?

Yes, API documentation was updated
No, API documentation does not need updating

Additional Notes

[Add any additional notes or comments here]

Summary by CodeRabbit

New Features
- Introduced a new environment variable for enhanced configuration options related to Google BigQuery satellite model predictions.
- Enhanced the make_predictions method to save prediction results to Google BigQuery, improving data persistence.
Bug Fixes
- Improved error handling in various methods, ensuring better readability and consistent JSON responses for internal errors.

coderabbitai · 2024-11-05T09:37:21Z

📝 Walkthrough

Walkthrough

The changes in this pull request introduce new features and enhancements across several files. A new environment variable, BIGQUERY_SATELLITE_MODEL_PREDICTIONS, is added to the Config class to improve configuration options. Additionally, the error handling in various methods within the PM25View class is reformatted for better readability. The make_predictions method in the SatellitePredictionView class is significantly modified to include data persistence capabilities by saving predictions to Google BigQuery, along with improved error handling.

Changes

File Path	Change Summary
`src/spatial/configure.py`	Added environment variable `BIGQUERY_SATELLITE_MODEL_PREDICTIONS` in the `Config` class.
`src/spatial/views/derived_pm2_5.py`	Reformatted error handling return statements in `get_pm25`, `get_aod_for_dates`, and `get_pollutants_data` methods.
`src/spatial/views/satellite_predictions.py`	Modified `make_predictions` method to save results to BigQuery, added imports, and enhanced error handling.

Assessment against linked issues

Objective	Addressed	Explanation
Introduce a GitHub template for PRs (#123)	❌	No relevant changes related to GitHub templates were made.
Calculate exceedances (#456)	❓	The changes do not explicitly address exceedance calculations.

Possibly related PRs

setup backend code to make satellite predictions #3613: This PR modifies the Config class in src/spatial/configure.py by adding a new variable, PROJECT_BUCKET, which is similar to the addition of BIGQUERY_SATELLITE_MODEL_PREDICTIONS in the main PR.

Suggested labels

ready for review

Suggested reviewers

Baalmart

🎉 In code we trust, with variables anew,
BigQuery awaits, for predictions to brew.
Errors now clearer, in methods they shine,
Configuration expanded, all working in line.
So let’s merge this magic, and watch data flow,
With each little change, our project will grow! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2024-11-05T09:38:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 11.71%. Comparing base (4766463) to head (271a2a9).
Report is 42 commits behind head on staging.

Additional details and impacted files

@@           Coverage Diff            @@
##           staging    #3807   +/-   ##
========================================
  Coverage    11.71%   11.71%           
========================================
  Files          113      113           
  Lines        15154    15154           
  Branches       274      274           
========================================
  Hits          1776     1776           
  Misses       13378    13378

github-actions · 2024-11-05T09:42:06Z

Spatial changes in this PR available for preview here

github-actions · 2024-11-05T09:53:43Z

Spatial changes in this PR available for preview here

Mnoble-19 · 2024-11-05T10:23:17Z

@coderabbitai review

coderabbitai · 2024-11-05T10:24:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)

src/spatial/configure.py (1)
24-26: LGTM! Consider adding validation and documentation.

The implementation follows the existing pattern consistently. However, consider these enhancements:

Add docstring documentation about the expected format and purpose of this configuration

Add validation to ensure the value is properly set before BigQuery operations

Here's a suggested enhancement:
     BIGQUERY_SATELLITE_MODEL_PREDICTIONS = os.getenv(
         "BIGQUERY_SATELLITE_MODEL_PREDICTIONS"
     )
+    if BIGQUERY_SATELLITE_MODEL_PREDICTIONS is None:
+        raise ValueError(
+            "BIGQUERY_SATELLITE_MODEL_PREDICTIONS environment variable must be set"
+        )
src/spatial/views/derived_pm2_5.py (1)
Line range hint 1-62: Consider architectural improvements for better maintainability.

The code exhibits several patterns that could benefit from architectural improvements:

Parameter validation and error handling are duplicated across classes

Logging configuration could be enhanced

Example usage comments could be moved to documentation

Consider these architectural improvements:

Create a base view class with common validation and error handling:
class BaseAPIView:
    @staticmethod
    def validate_request(required_params: list) -> tuple[dict, tuple]:
        """Centralized request validation."""
        if not request.is_json:
            return None, (
                jsonify({"error": "Request content type must be application/json"}),
                400
            )
        
        data = request.get_json()
        for param in required_params:
            if param not in data:
                return None, (
                    jsonify({"error": f"Missing parameter: {param}"}),
                    400
                )
        return data, None
Enhance logging configuration:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('api.log')
    ]
)
Move example usage to docstrings or separate documentation files.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4766463 and 271a2a9.

📒 Files selected for processing (3)

src/spatial/configure.py (1 hunks)
src/spatial/views/derived_pm2_5.py (2 hunks)
src/spatial/views/satellite_predictions.py (2 hunks)

🔇 Additional comments (6)

src/spatial/configure.py (1)

24-26: Verify the configuration usage across the codebase.

Let's ensure this configuration is properly utilized where needed.

✅ Verification successful

Let me gather more specific information about the usage of this configuration variable.

This will help us understand:

How the configuration is actually used in the code
Whether there are any BigQuery queries that might need this configuration
The complete context of the Config class and its usage pattern

Configuration is properly utilized and follows the established pattern

The configuration is correctly used in src/spatial/views/satellite_predictions.py as a destination table for BigQuery operations, following the same pattern as other BigQuery-related configurations in the codebase. The implementation:

Uses the configuration through the Config class, consistent with other environment variables
Is properly integrated into the configuration hierarchy (Production/Development/Testing)
Has a clear purpose as a BigQuery table destination for satellite model predictions

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of the new configuration variable
echo "Searching for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS..."
rg "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" --type python

echo -e "\nSearching for Config class usage in files that might need this configuration..."
rg "from.*configure import.*Config" --type python -l | xargs rg "Config\." -l

Length of output: 2134

Script:

#!/bin/bash
# Search for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS in satellite_predictions.py
echo "Checking satellite_predictions.py for configuration usage..."
rg -A 3 "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" src/spatial/views/satellite_predictions.py

# Search for any SQL queries or BigQuery table references in satellite_predictions.py
echo -e "\nChecking for BigQuery table references..."
rg "SELECT|FROM.*bigquery" src/spatial/views/satellite_predictions.py

# Check the Config class definition to understand how this env var is used
echo -e "\nChecking Config class definition..."
cat src/spatial/configure.py

Length of output: 5092

src/spatial/views/derived_pm2_5.py (1)

13-13: LGTM!

The added spacing improves readability between imports and class definitions.

src/spatial/views/satellite_predictions.py (4)

4-4: Importing pandas is appropriate

The addition of import pandas as pd is necessary for creating DataFrames used in saving predictions to BigQuery.

6-6: Service account import added correctly

Importing service_account from google.oauth2 is appropriate for handling authentication when connecting to BigQuery.

44-49: Well-structured result dictionary

The result dictionary is correctly constructed to include all relevant prediction data and metadata, which will be returned to the user and saved to BigQuery.

50-62: Efficiently saving predictions to BigQuery

The code effectively saves the prediction results to BigQuery using df.to_gbq(). Credentials are correctly loaded from the service account file specified in the configuration.

coderabbitai · 2024-11-05T10:33:59Z

src/spatial/views/derived_pm2_5.py

+            return (
+                jsonify({"error": "An internal error has occurred!"}),
+                500,
+                {"Content-Type": "application/json"},
+            )


🛠️ Refactor suggestion

Consider enhancing error handling and cleanup.

While the multi-line formatting improves readability, there are several potential improvements:

Remove the commented-out code that's no longer needed

Consider making error messages more informative while maintaining security

Consider extracting the common error handling pattern into a shared utility function

Enhance logging with more context (e.g., request parameters)

Here's a suggested implementation:

def handle_internal_error(e: Exception, context: dict = None) -> tuple: """Centralized error handling for API endpoints.""" error_id = uuid.uuid4() logging.error( "Error ID: %s, Context: %s, Error: %s", error_id, context, str(e) ) return ( jsonify({ "error": "An internal error has occurred", "error_id": str(error_id), "contact_support": "Please contact support with this error ID" }), 500, {"Content-Type": "application/json"} ) # Usage in exception handler: except Exception as e: return handle_internal_error( e, context={ "longitude": longitude, "latitude": latitude, "start_date": start_date, "end_date": end_date } )

Consider implementing a global exception handler for Flask to standardize error handling across all endpoints:

@app.errorhandler(Exception) def handle_exception(e): return handle_internal_error(e)

coderabbitai · 2024-11-05T10:33:59Z

src/spatial/views/satellite_predictions.py

+            except Exception as e:
+                print(f"Error saving predictions to BigQuery: {e}")
+


🛠️ Refactor suggestion

Use logging instead of print statements for error handling

Using print statements for error messages is not recommended in production environments. It's better to use the logging module to log errors, which provides more flexibility and control over logging levels and outputs.

Apply this diff to implement logging:

+import logging +logger = logging.getLogger(__name__) ... except Exception as e: - print(f"Error saving predictions to BigQuery: {e}") + logger.error(f"Error saving predictions to BigQuery: {e}")

Committable suggestion skipped: line range outside the PR's diff.

⚠️ Potential issue

Avoid exposing sensitive information in exception messages

Printing exception details can inadvertently expose sensitive information. Ensure that exception messages do not leak sensitive data. Consider logging exceptions without including sensitive details.

Modify the error handling to avoid exposing sensitive information:

except Exception as e: - print(f"Error saving predictions to BigQuery: {e}") + logger.error("Error saving predictions to BigQuery", exc_info=False)

Committable suggestion skipped: line range outside the PR's diff.

Baalmart

Thanks @Mnoble-19 , please make the PR description more descriptive, start by editing the PR template information.

Baalmart

thanks @Mnoble-19

add code to save model predictions

2c8665d

Mnoble-19 self-assigned this Nov 5, 2024

black formatter

271a2a9

Mnoble-19 added the ready for review label Nov 5, 2024

Mnoble-19 requested review from Baalmart and wabinyai November 5, 2024 10:28

coderabbitai bot reviewed Nov 5, 2024

View reviewed changes

Baalmart reviewed Nov 5, 2024

View reviewed changes

Baalmart approved these changes Nov 7, 2024

View reviewed changes

Baalmart merged commit 03a4385 into staging Nov 7, 2024
50 checks passed

Baalmart deleted the save-satellite-predictions branch November 7, 2024 04:13

Baalmart mentioned this pull request Nov 7, 2024

move to production #3814

Merged

3 tasks

This was referenced Nov 14, 2024

using AI for auto reporting #3796

Closed

auto reporting #3846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add code to save model predictions to BigQuery #3807

add code to save model predictions to BigQuery #3807

Mnoble-19 commented Nov 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2024 •

edited

Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov bot commented Nov 5, 2024 •

edited

Loading

github-actions bot commented Nov 5, 2024

github-actions bot commented Nov 5, 2024

Mnoble-19 commented Nov 5, 2024

coderabbitai bot commented Nov 5, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

Baalmart left a comment

Baalmart left a comment

		except Exception as e:
		print(f"Error saving predictions to BigQuery: {e}")

add code to save model predictions to BigQuery #3807

add code to save model predictions to BigQuery #3807

Conversation

Mnoble-19 commented Nov 5, 2024 • edited by coderabbitai bot Loading

Description

Related Issues

Changes Made

Testing

Affected Services

Endpoints Ready for Testing

API Documentation Updated?

Additional Notes

Summary by CodeRabbit

coderabbitai bot commented Nov 5, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov bot commented Nov 5, 2024 • edited Loading

Codecov Report

github-actions bot commented Nov 5, 2024

github-actions bot commented Nov 5, 2024

Mnoble-19 commented Nov 5, 2024

coderabbitai bot commented Nov 5, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

Baalmart left a comment

Choose a reason for hiding this comment

Baalmart left a comment

Choose a reason for hiding this comment

Mnoble-19 commented Nov 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Nov 5, 2024 •

edited

Loading