Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add code to save model predictions to BigQuery #3807

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

Mnoble-19
Copy link
Contributor

@Mnoble-19 Mnoble-19 commented Nov 5, 2024

Description

[Adds code to save model predictions to BigQuery]

Related Issues

Changes Made

  • Add code to enable satellite model predictions to be saved to BigQuery
  • Brief description of change 2
  • Brief description of change 3

Testing

  • Tested locally
  • Tested against staging environment
  • Relevant tests passed: [List test names]

Affected Services

  • Which services were modified:
    • Service 1
    • Service 2
    • Other...

Endpoints Ready for Testing

  • New endpoints ready for testing:
    • Endpoint 1
    • Endpoint 2
    • Other...

API Documentation Updated?

  • Yes, API documentation was updated
  • No, API documentation does not need updating

Additional Notes

[Add any additional notes or comments here]

Summary by CodeRabbit

  • New Features

    • Introduced a new environment variable for enhanced configuration options related to Google BigQuery satellite model predictions.
    • Enhanced the make_predictions method to save prediction results to Google BigQuery, improving data persistence.
  • Bug Fixes

    • Improved error handling in various methods, ensuring better readability and consistent JSON responses for internal errors.

@Mnoble-19 Mnoble-19 self-assigned this Nov 5, 2024
Copy link
Contributor

coderabbitai bot commented Nov 5, 2024

📝 Walkthrough

Walkthrough

The changes in this pull request introduce new features and enhancements across several files. A new environment variable, BIGQUERY_SATELLITE_MODEL_PREDICTIONS, is added to the Config class to improve configuration options. Additionally, the error handling in various methods within the PM25View class is reformatted for better readability. The make_predictions method in the SatellitePredictionView class is significantly modified to include data persistence capabilities by saving predictions to Google BigQuery, along with improved error handling.

Changes

File Path Change Summary
src/spatial/configure.py Added environment variable BIGQUERY_SATELLITE_MODEL_PREDICTIONS in the Config class.
src/spatial/views/derived_pm2_5.py Reformatted error handling return statements in get_pm25, get_aod_for_dates, and get_pollutants_data methods.
src/spatial/views/satellite_predictions.py Modified make_predictions method to save results to BigQuery, added imports, and enhanced error handling.

Assessment against linked issues

Objective Addressed Explanation
Introduce a GitHub template for PRs (#123) No relevant changes related to GitHub templates were made.
Calculate exceedances (#456) The changes do not explicitly address exceedance calculations.

Possibly related PRs

Suggested labels

ready for review

Suggested reviewers

  • Baalmart

🎉 In code we trust, with variables anew,
BigQuery awaits, for predictions to brew.
Errors now clearer, in methods they shine,
Configuration expanded, all working in line.
So let’s merge this magic, and watch data flow,
With each little change, our project will grow! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 11.71%. Comparing base (4766463) to head (271a2a9).
Report is 42 commits behind head on staging.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           staging    #3807   +/-   ##
========================================
  Coverage    11.71%   11.71%           
========================================
  Files          113      113           
  Lines        15154    15154           
  Branches       274      274           
========================================
  Hits          1776     1776           
  Misses       13378    13378           

Copy link
Contributor

github-actions bot commented Nov 5, 2024

Spatial changes in this PR available for preview here

Copy link
Contributor

github-actions bot commented Nov 5, 2024

Spatial changes in this PR available for preview here

@Mnoble-19
Copy link
Contributor Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Nov 5, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
src/spatial/configure.py (1)

24-26: LGTM! Consider adding validation and documentation.

The implementation follows the existing pattern consistently. However, consider these enhancements:

  1. Add docstring documentation about the expected format and purpose of this configuration
  2. Add validation to ensure the value is properly set before BigQuery operations

Here's a suggested enhancement:

     BIGQUERY_SATELLITE_MODEL_PREDICTIONS = os.getenv(
         "BIGQUERY_SATELLITE_MODEL_PREDICTIONS"
     )
+    if BIGQUERY_SATELLITE_MODEL_PREDICTIONS is None:
+        raise ValueError(
+            "BIGQUERY_SATELLITE_MODEL_PREDICTIONS environment variable must be set"
+        )
src/spatial/views/derived_pm2_5.py (1)

Line range hint 1-62: Consider architectural improvements for better maintainability.

The code exhibits several patterns that could benefit from architectural improvements:

  1. Parameter validation and error handling are duplicated across classes
  2. Logging configuration could be enhanced
  3. Example usage comments could be moved to documentation

Consider these architectural improvements:

  1. Create a base view class with common validation and error handling:
class BaseAPIView:
    @staticmethod
    def validate_request(required_params: list) -> tuple[dict, tuple]:
        """Centralized request validation."""
        if not request.is_json:
            return None, (
                jsonify({"error": "Request content type must be application/json"}),
                400
            )
        
        data = request.get_json()
        for param in required_params:
            if param not in data:
                return None, (
                    jsonify({"error": f"Missing parameter: {param}"}),
                    400
                )
        return data, None
  1. Enhance logging configuration:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('api.log')
    ]
)
  1. Move example usage to docstrings or separate documentation files.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4766463 and 271a2a9.

📒 Files selected for processing (3)
  • src/spatial/configure.py (1 hunks)
  • src/spatial/views/derived_pm2_5.py (2 hunks)
  • src/spatial/views/satellite_predictions.py (2 hunks)
🔇 Additional comments (6)
src/spatial/configure.py (1)

24-26: Verify the configuration usage across the codebase.

Let's ensure this configuration is properly utilized where needed.

✅ Verification successful

Let me gather more specific information about the usage of this configuration variable.

This will help us understand:

  1. How the configuration is actually used in the code
  2. Whether there are any BigQuery queries that might need this configuration
  3. The complete context of the Config class and its usage pattern

Configuration is properly utilized and follows the established pattern

The configuration is correctly used in src/spatial/views/satellite_predictions.py as a destination table for BigQuery operations, following the same pattern as other BigQuery-related configurations in the codebase. The implementation:

  • Uses the configuration through the Config class, consistent with other environment variables
  • Is properly integrated into the configuration hierarchy (Production/Development/Testing)
  • Has a clear purpose as a BigQuery table destination for satellite model predictions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of the new configuration variable
echo "Searching for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS..."
rg "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" --type python

echo -e "\nSearching for Config class usage in files that might need this configuration..."
rg "from.*configure import.*Config" --type python -l | xargs rg "Config\." -l

Length of output: 2134


Script:

#!/bin/bash
# Search for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS in satellite_predictions.py
echo "Checking satellite_predictions.py for configuration usage..."
rg -A 3 "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" src/spatial/views/satellite_predictions.py

# Search for any SQL queries or BigQuery table references in satellite_predictions.py
echo -e "\nChecking for BigQuery table references..."
rg "SELECT|FROM.*bigquery" src/spatial/views/satellite_predictions.py

# Check the Config class definition to understand how this env var is used
echo -e "\nChecking Config class definition..."
cat src/spatial/configure.py

Length of output: 5092

src/spatial/views/derived_pm2_5.py (1)

13-13: LGTM!

The added spacing improves readability between imports and class definitions.

src/spatial/views/satellite_predictions.py (4)

4-4: Importing pandas is appropriate

The addition of import pandas as pd is necessary for creating DataFrames used in saving predictions to BigQuery.


6-6: Service account import added correctly

Importing service_account from google.oauth2 is appropriate for handling authentication when connecting to BigQuery.


44-49: Well-structured result dictionary

The result dictionary is correctly constructed to include all relevant prediction data and metadata, which will be returned to the user and saved to BigQuery.


50-62: Efficiently saving predictions to BigQuery

The code effectively saves the prediction results to BigQuery using df.to_gbq(). Credentials are correctly loaded from the service account file specified in the configuration.

Comment on lines +58 to +62
return (
jsonify({"error": "An internal error has occurred!"}),
500,
{"Content-Type": "application/json"},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing error handling and cleanup.

While the multi-line formatting improves readability, there are several potential improvements:

  1. Remove the commented-out code that's no longer needed
  2. Consider making error messages more informative while maintaining security
  3. Consider extracting the common error handling pattern into a shared utility function
  4. Enhance logging with more context (e.g., request parameters)

Here's a suggested implementation:

def handle_internal_error(e: Exception, context: dict = None) -> tuple:
    """Centralized error handling for API endpoints."""
    error_id = uuid.uuid4()
    logging.error(
        "Error ID: %s, Context: %s, Error: %s",
        error_id,
        context,
        str(e)
    )
    return (
        jsonify({
            "error": "An internal error has occurred",
            "error_id": str(error_id),
            "contact_support": "Please contact support with this error ID"
        }),
        500,
        {"Content-Type": "application/json"}
    )

# Usage in exception handler:
except Exception as e:
    return handle_internal_error(
        e,
        context={
            "longitude": longitude,
            "latitude": latitude,
            "start_date": start_date,
            "end_date": end_date
        }
    )

Consider implementing a global exception handler for Flask to standardize error handling across all endpoints:

@app.errorhandler(Exception)
def handle_exception(e):
    return handle_internal_error(e)

Comment on lines +63 to +65
except Exception as e:
print(f"Error saving predictions to BigQuery: {e}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use logging instead of print statements for error handling

Using print statements for error messages is not recommended in production environments. It's better to use the logging module to log errors, which provides more flexibility and control over logging levels and outputs.

Apply this diff to implement logging:

+import logging
+logger = logging.getLogger(__name__)

 ...

 except Exception as e:
-    print(f"Error saving predictions to BigQuery: {e}")
+    logger.error(f"Error saving predictions to BigQuery: {e}")

Committable suggestion skipped: line range outside the PR's diff.


⚠️ Potential issue

Avoid exposing sensitive information in exception messages

Printing exception details can inadvertently expose sensitive information. Ensure that exception messages do not leak sensitive data. Consider logging exceptions without including sensitive details.

Modify the error handling to avoid exposing sensitive information:

 except Exception as e:
-    print(f"Error saving predictions to BigQuery: {e}")
+    logger.error("Error saving predictions to BigQuery", exc_info=False)

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@Baalmart Baalmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Mnoble-19 , please make the PR description more descriptive, start by editing the PR template information.

Copy link
Contributor

@Baalmart Baalmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Mnoble-19

@Baalmart Baalmart merged commit 03a4385 into staging Nov 7, 2024
50 checks passed
@Baalmart Baalmart deleted the save-satellite-predictions branch November 7, 2024 04:13
@Baalmart Baalmart mentioned this pull request Nov 7, 2024
3 tasks
This was referenced Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants