-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add code to save model predictions to BigQuery #3807
Conversation
📝 WalkthroughWalkthroughThe changes in this pull request introduce new features and enhancements across several files. A new environment variable, Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## staging #3807 +/- ##
========================================
Coverage 11.71% 11.71%
========================================
Files 113 113
Lines 15154 15154
Branches 274 274
========================================
Hits 1776 1776
Misses 13378 13378 |
Spatial changes in this PR available for preview here |
Spatial changes in this PR available for preview here |
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
src/spatial/configure.py (1)
24-26
: LGTM! Consider adding validation and documentation.The implementation follows the existing pattern consistently. However, consider these enhancements:
- Add docstring documentation about the expected format and purpose of this configuration
- Add validation to ensure the value is properly set before BigQuery operations
Here's a suggested enhancement:
BIGQUERY_SATELLITE_MODEL_PREDICTIONS = os.getenv( "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" ) + if BIGQUERY_SATELLITE_MODEL_PREDICTIONS is None: + raise ValueError( + "BIGQUERY_SATELLITE_MODEL_PREDICTIONS environment variable must be set" + )src/spatial/views/derived_pm2_5.py (1)
Line range hint
1-62
: Consider architectural improvements for better maintainability.The code exhibits several patterns that could benefit from architectural improvements:
- Parameter validation and error handling are duplicated across classes
- Logging configuration could be enhanced
- Example usage comments could be moved to documentation
Consider these architectural improvements:
- Create a base view class with common validation and error handling:
class BaseAPIView: @staticmethod def validate_request(required_params: list) -> tuple[dict, tuple]: """Centralized request validation.""" if not request.is_json: return None, ( jsonify({"error": "Request content type must be application/json"}), 400 ) data = request.get_json() for param in required_params: if param not in data: return None, ( jsonify({"error": f"Missing parameter: {param}"}), 400 ) return data, None
- Enhance logging configuration:
logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.StreamHandler(), logging.FileHandler('api.log') ] )
- Move example usage to docstrings or separate documentation files.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/spatial/configure.py
(1 hunks)src/spatial/views/derived_pm2_5.py
(2 hunks)src/spatial/views/satellite_predictions.py
(2 hunks)
🔇 Additional comments (6)
src/spatial/configure.py (1)
24-26
: Verify the configuration usage across the codebase.
Let's ensure this configuration is properly utilized where needed.
✅ Verification successful
Let me gather more specific information about the usage of this configuration variable.
This will help us understand:
- How the configuration is actually used in the code
- Whether there are any BigQuery queries that might need this configuration
- The complete context of the Config class and its usage pattern
Configuration is properly utilized and follows the established pattern
The configuration is correctly used in src/spatial/views/satellite_predictions.py
as a destination table for BigQuery operations, following the same pattern as other BigQuery-related configurations in the codebase. The implementation:
- Uses the configuration through the
Config
class, consistent with other environment variables - Is properly integrated into the configuration hierarchy (Production/Development/Testing)
- Has a clear purpose as a BigQuery table destination for satellite model predictions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of the new configuration variable
echo "Searching for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS..."
rg "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" --type python
echo -e "\nSearching for Config class usage in files that might need this configuration..."
rg "from.*configure import.*Config" --type python -l | xargs rg "Config\." -l
Length of output: 2134
Script:
#!/bin/bash
# Search for direct usage of BIGQUERY_SATELLITE_MODEL_PREDICTIONS in satellite_predictions.py
echo "Checking satellite_predictions.py for configuration usage..."
rg -A 3 "BIGQUERY_SATELLITE_MODEL_PREDICTIONS" src/spatial/views/satellite_predictions.py
# Search for any SQL queries or BigQuery table references in satellite_predictions.py
echo -e "\nChecking for BigQuery table references..."
rg "SELECT|FROM.*bigquery" src/spatial/views/satellite_predictions.py
# Check the Config class definition to understand how this env var is used
echo -e "\nChecking Config class definition..."
cat src/spatial/configure.py
Length of output: 5092
src/spatial/views/derived_pm2_5.py (1)
13-13
: LGTM!
The added spacing improves readability between imports and class definitions.
src/spatial/views/satellite_predictions.py (4)
4-4
: Importing pandas is appropriate
The addition of import pandas as pd
is necessary for creating DataFrames used in saving predictions to BigQuery.
6-6
: Service account import added correctly
Importing service_account
from google.oauth2
is appropriate for handling authentication when connecting to BigQuery.
44-49
: Well-structured result dictionary
The result
dictionary is correctly constructed to include all relevant prediction data and metadata, which will be returned to the user and saved to BigQuery.
50-62
: Efficiently saving predictions to BigQuery
The code effectively saves the prediction results to BigQuery using df.to_gbq()
. Credentials are correctly loaded from the service account file specified in the configuration.
return ( | ||
jsonify({"error": "An internal error has occurred!"}), | ||
500, | ||
{"Content-Type": "application/json"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing error handling and cleanup.
While the multi-line formatting improves readability, there are several potential improvements:
- Remove the commented-out code that's no longer needed
- Consider making error messages more informative while maintaining security
- Consider extracting the common error handling pattern into a shared utility function
- Enhance logging with more context (e.g., request parameters)
Here's a suggested implementation:
def handle_internal_error(e: Exception, context: dict = None) -> tuple:
"""Centralized error handling for API endpoints."""
error_id = uuid.uuid4()
logging.error(
"Error ID: %s, Context: %s, Error: %s",
error_id,
context,
str(e)
)
return (
jsonify({
"error": "An internal error has occurred",
"error_id": str(error_id),
"contact_support": "Please contact support with this error ID"
}),
500,
{"Content-Type": "application/json"}
)
# Usage in exception handler:
except Exception as e:
return handle_internal_error(
e,
context={
"longitude": longitude,
"latitude": latitude,
"start_date": start_date,
"end_date": end_date
}
)
Consider implementing a global exception handler for Flask to standardize error handling across all endpoints:
@app.errorhandler(Exception)
def handle_exception(e):
return handle_internal_error(e)
except Exception as e: | ||
print(f"Error saving predictions to BigQuery: {e}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use logging instead of print statements for error handling
Using print
statements for error messages is not recommended in production environments. It's better to use the logging
module to log errors, which provides more flexibility and control over logging levels and outputs.
Apply this diff to implement logging:
+import logging
+logger = logging.getLogger(__name__)
...
except Exception as e:
- print(f"Error saving predictions to BigQuery: {e}")
+ logger.error(f"Error saving predictions to BigQuery: {e}")
Committable suggestion skipped: line range outside the PR's diff.
Avoid exposing sensitive information in exception messages
Printing exception details can inadvertently expose sensitive information. Ensure that exception messages do not leak sensitive data. Consider logging exceptions without including sensitive details.
Modify the error handling to avoid exposing sensitive information:
except Exception as e:
- print(f"Error saving predictions to BigQuery: {e}")
+ logger.error("Error saving predictions to BigQuery", exc_info=False)
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Mnoble-19 , please make the PR description more descriptive, start by editing the PR template information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Mnoble-19
Description
[Adds code to save model predictions to BigQuery]
Related Issues
Changes Made
Testing
Affected Services
Endpoints Ready for Testing
API Documentation Updated?
Additional Notes
[Add any additional notes or comments here]
Summary by CodeRabbit
New Features
make_predictions
method to save prediction results to Google BigQuery, improving data persistence.Bug Fixes