Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto reporting #3846

Merged
merged 17 commits into from
Nov 27, 2024
Merged
6 changes: 4 additions & 2 deletions src/spatial/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ class Config:
BIGQUERY_SATELLITE_MODEL_PREDICTIONS = os.getenv(
"BIGQUERY_SATELLITE_MODEL_PREDICTIONS"
)


ANALTICS_URL = os.getenv("ANALTICS_URL")
HUGGING_FACE_TOKEN = os.getenv("HUGGING_FACE_TOKEN")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
class ProductionConfig(Config):
DEBUG = False
TESTING = False
Expand Down
9 changes: 9 additions & 0 deletions src/spatial/controllers/controllers.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from views.satellite_predictions import SatellitePredictionView
from views.site_category_view import SiteCategorizationView
from views.site_selection_views import SiteSelectionView
from views.report_view import ReportView


controller_bp = Blueprint("controller", __name__)
Expand Down Expand Up @@ -66,3 +67,11 @@ def site_selection():
@controller_bp.route("/satellite_prediction", methods=["POST"])
def get_satellite_prediction():
return SatellitePredictionView.make_predictions()

@controller_bp.route("/air_quality_report", methods=["POST"])
def fetch_air_quality():
return ReportView.generate_air_quality_report_with_gemini()

Comment on lines +71 to +74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing the route implementations with proper error handling and documentation.

While the implementations are concise, there are several opportunities for improvement:

  1. Add error handling to gracefully handle potential exceptions from the report generation process
  2. Include request validation to ensure required parameters are present
  3. Add API documentation to describe the expected request format and response structure

Here's a suggested implementation pattern:

 @controller_bp.route("/air_quality_report", methods=["POST"])
 def fetch_air_quality():
-    return ReportView.generate_air_quality_report_with_gemini()
+    """Generate air quality report using Gemini model.
+    
+    Expected payload:
+        {
+            "grid_id": str,
+            "start_time": str,
+            "end_time": str
+        }
+    """
+    try:
+        return ReportView.generate_air_quality_report_with_gemini()
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500

Apply similar patterns to the other two routes as well.

Also applies to: 75-78, 79-81

@controller_bp.route("/air_quality_report_without_llm", methods=["POST"])
def fetch_air_quality_without_llm():
return ReportView.generate_air_quality_report_without_llm()
235 changes: 235 additions & 0 deletions src/spatial/models/report_datafetcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
import requests
import openai
from transformers import AutoModelForCausalLM, AutoTokenizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused imports from the transformers library.

The AutoModelForCausalLM and AutoTokenizer imports are not used in the current implementation.

-from transformers import AutoModelForCausalLM, AutoTokenizer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from transformers import AutoModelForCausalLM, AutoTokenizer
🧰 Tools
🪛 Ruff

3-3: transformers.AutoModelForCausalLM imported but unused

Remove unused import

(F401)


3-3: transformers.AutoTokenizer imported but unused

Remove unused import

(F401)

from huggingface_hub import login
from configure import Config
import google.generativeai as genai
import logging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clean up imports and improve error handling.

  1. Remove unused imports from the transformers library.
  2. Use logging consistently instead of print statements for error handling.

Apply this diff:

 import requests
 import openai
-from transformers import AutoModelForCausalLM, AutoTokenizer
 from huggingface_hub import login
 from configure import Config
 import google.generativeai as genai 
 import logging
+
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import requests
import openai
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
from configure import Config
import google.generativeai as genai
import logging
import requests
import openai
from huggingface_hub import login
from configure import Config
import google.generativeai as genai
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
🧰 Tools
🪛 Ruff

3-3: transformers.AutoModelForCausalLM imported but unused

Remove unused import

(F401)


3-3: transformers.AutoTokenizer imported but unused

Remove unused import

(F401)


# Configure API keys
GOOGLE_API_KEY = Config.GOOGLE_API_KEY
genai.configure(api_key=GOOGLE_API_KEY)
hf_token = Config.HUGGING_FACE_TOKEN


if hf_token:
login(hf_token)
else:
print("Hugging Face token is missing. Set the 'HUGGING_FACE_TOKEN' environment variable.")

Comment on lines +11 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve API configuration error handling.

The current implementation uses print statements and doesn't properly handle configuration errors.

Apply this diff:

 # Configure API keys
-GOOGLE_API_KEY = Config.GOOGLE_API_KEY
-genai.configure(api_key=GOOGLE_API_KEY)
-hf_token = Config.HUGGING_FACE_TOKEN
+try:
+    GOOGLE_API_KEY = Config.GOOGLE_API_KEY
+    if not GOOGLE_API_KEY:
+        raise ValueError("GOOGLE_API_KEY is not set")
+    genai.configure(api_key=GOOGLE_API_KEY)
 
+    hf_token = Config.HUGGING_FACE_TOKEN
+    if hf_token:
+        login(hf_token)
+    else:
+        logger.warning("Hugging Face token is missing. Set the 'HUGGING_FACE_TOKEN' environment variable.")
+except Exception as e:
+    logger.error(f"Failed to configure API keys: {e}")
+    raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Configure API keys
GOOGLE_API_KEY = Config.GOOGLE_API_KEY
genai.configure(api_key=GOOGLE_API_KEY)
hf_token = Config.HUGGING_FACE_TOKEN
if hf_token:
login(hf_token)
else:
print("Hugging Face token is missing. Set the 'HUGGING_FACE_TOKEN' environment variable.")
# Configure API keys
try:
GOOGLE_API_KEY = Config.GOOGLE_API_KEY
if not GOOGLE_API_KEY:
raise ValueError("GOOGLE_API_KEY is not set")
genai.configure(api_key=GOOGLE_API_KEY)
hf_token = Config.HUGGING_FACE_TOKEN
if hf_token:
login(hf_token)
else:
logger.warning("Hugging Face token is missing. Set the 'HUGGING_FACE_TOKEN' environment variable.")
except Exception as e:
logger.error(f"Failed to configure API keys: {e}")
raise

class DataFetcher:
@staticmethod
def fetch_air_quality_data_a(grid_id, start_time, end_time):
token = Config.AIRQO_API_TOKEN
analytics_url = Config.ANALTICS_URL
if token is None:
print("Error: AIRQO_API_TOKEN environment variable is not set.")
return None

url= f"{analytics_url}?token={token}"
payload = {"grid_id": grid_id, "start_time": start_time, "end_time": end_time}

try:
response = requests.post(url, json=payload)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
logging.error(f"HTTP error occurred: {http_err}")
except requests.exceptions.RequestException as req_err:
print(f"Request error occurred: {req_err}")
logging.error(f"Request error occurred: {req_err}")
except ValueError as json_err:
print(f"JSON decoding error: {json_err}")
logging.error(f"JSON decoding error: {json_err}")

return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance security and robustness of data fetching.

Several improvements needed for security and reliability:

  1. Move token from URL to headers
  2. Add request timeout
  3. Use consistent logging

Apply this diff:

     @staticmethod
     def fetch_air_quality_data_a(grid_id, start_time, end_time):
         token = Config.AIRQO_API_TOKEN  
         analytics_url = Config.ANALTICS_URL
         if token is None:
-            print("Error: AIRQO_API_TOKEN environment variable is not set.")
+            logger.error("AIRQO_API_TOKEN environment variable is not set.")
             return None

-        url= f"{analytics_url}?token={token}"
+        url = analytics_url
         payload = {"grid_id": grid_id, "start_time": start_time, "end_time": end_time}
+        headers = {"Authorization": f"Bearer {token}"}

         try:
-            response = requests.post(url, json=payload)
+            response = requests.post(url, json=payload, headers=headers, timeout=30)
             response.raise_for_status()
             return response.json()
         except requests.exceptions.HTTPError as http_err:
-            print(f"HTTP error occurred: {http_err}")
             logging.error(f"HTTP error occurred: {http_err}")
         except requests.exceptions.RequestException as req_err:
-            print(f"Request error occurred: {req_err}")
             logging.error(f"Request error occurred: {req_err}")
         except ValueError as json_err:
-            print(f"JSON decoding error: {json_err}")
             logging.error(f"JSON decoding error: {json_err}")

         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class DataFetcher:
@staticmethod
def fetch_air_quality_data_a(grid_id, start_time, end_time):
token = Config.AIRQO_API_TOKEN
analytics_url = Config.ANALTICS_URL
if token is None:
print("Error: AIRQO_API_TOKEN environment variable is not set.")
return None
url= f"{analytics_url}?token={token}"
payload = {"grid_id": grid_id, "start_time": start_time, "end_time": end_time}
try:
response = requests.post(url, json=payload)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
logging.error(f"HTTP error occurred: {http_err}")
except requests.exceptions.RequestException as req_err:
print(f"Request error occurred: {req_err}")
logging.error(f"Request error occurred: {req_err}")
except ValueError as json_err:
print(f"JSON decoding error: {json_err}")
logging.error(f"JSON decoding error: {json_err}")
return None
class DataFetcher:
@staticmethod
def fetch_air_quality_data_a(grid_id, start_time, end_time):
token = Config.AIRQO_API_TOKEN
analytics_url = Config.ANALTICS_URL
if token is None:
logger.error("AIRQO_API_TOKEN environment variable is not set.")
return None
url = analytics_url
payload = {"grid_id": grid_id, "start_time": start_time, "end_time": end_time}
headers = {"Authorization": f"Bearer {token}"}
try:
response = requests.post(url, json=payload, headers=headers, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as http_err:
logging.error(f"HTTP error occurred: {http_err}")
except requests.exceptions.RequestException as req_err:
logging.error(f"Request error occurred: {req_err}")
except ValueError as json_err:
logging.error(f"JSON decoding error: {json_err}")
return None

class AirQualityReport:
def __init__(self, data):
self.data = data
self.grid_name = data.get('airquality', {}).get('sites', {}).get('grid name', [None])
self.annual_data = data.get('airquality', {}).get('annual_pm', [None])[0]
self.daily_mean_data = data.get('airquality', {}).get('daily_mean_pm', [])
self.diurnal = data.get('airquality', {}).get('diurnal', [])
self.monthly_data = data.get('airquality', {}).get('site_monthly_mean_pm', [])
self.monthly_name_data = data.get('airquality', {}).get('pm_by_month_name', [])
main_site_info = self.monthly_data[0] if self.monthly_data else {}
self.main_site = main_site_info.get('site_name')
self.site_names = [item.get('site_name', None) for item in self.data.get('airquality', {}).get('site_annual_mean_pm', [])]
self.site_latitude = main_site_info.get('site_latitude')
self.site_longitude = main_site_info.get('site_longitude')
self.num_sites = data.get('airquality', {}).get('sites', {}).get('number_of_sites')
self.starttime = data.get('airquality', {}).get('period', {}).get('startTime', '')[:10]
self.endtime = data.get('airquality', {}).get('period', {}).get('endTime', '')[:10]

self.annual_pm2_5_calibrated_value = self.annual_data.get("pm2_5_calibrated_value")
self.annual_pm10_calibrated_value = self.annual_data.get("pm10_calibrated_value")

# Finding the minimum and maximum values
if self.daily_mean_data:
self.daily_min_pm2_5 = min(self.daily_mean_data, key=lambda x: x['pm2_5_calibrated_value'])
self.daily_max_pm2_5 = max(self.daily_mean_data, key=lambda x: x['pm2_5_calibrated_value'])
else:
self.daily_min_pm2_5 = None
self.daily_max_pm2_5 = None


# Initialize models once in the constructor
self.gemini_model = genai.GenerativeModel('gemini-pro')
openai.api_key = Config.OPENAI_API_KEY
Comment on lines +91 to +93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for model initialization.

The model initialization could fail silently if API keys are invalid or if there are connection issues.

         # Initialize models once in the constructor
-        self.gemini_model = genai.GenerativeModel('gemini-pro')
-        openai.api_key = Config.OPENAI_API_KEY
+        try:
+            self.gemini_model = genai.GenerativeModel('gemini-pro')
+            openai.api_key = Config.OPENAI_API_KEY
+        except Exception as e:
+            logging.error(f"Failed to initialize AI models: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Initialize models once in the constructor
self.gemini_model = genai.GenerativeModel('gemini-pro')
openai.api_key = Config.OPENAI_API_KEY
# Initialize models once in the constructor
try:
self.gemini_model = genai.GenerativeModel('gemini-pro')
openai.api_key = Config.OPENAI_API_KEY
except Exception as e:
logging.error(f"Failed to initialize AI models: {e}")
raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add proper data validation and error handling.

The class initialization needs several improvements:

  1. Validate data structure before access
  2. Handle model initialization errors
  3. Add type hints for better maintainability

Apply this diff:

 class AirQualityReport:  
+    def __validate_data(self, data):
+        if not isinstance(data, dict):
+            raise ValueError("Input data must be a dictionary")
+        if 'airquality' not in data:
+            raise ValueError("Missing 'airquality' key in data")
+
     def __init__(self, data): 
+        self.__validate_data(data)
         self.data = data
-        self.grid_name = data.get('airquality', {}).get('sites', {}).get('grid name', [None])
+        self.grid_name = data.get('airquality', {}).get('sites', {}).get('grid name', ['Unknown'])[0]
         self.annual_data = data.get('airquality', {}).get('annual_pm', [{}])[0]
         self.daily_mean_data = data.get('airquality', {}).get('daily_mean_pm', [])
         self.diurnal = data.get('airquality', {}).get('diurnal', [])
         self.monthly_data = data.get('airquality', {}).get('site_monthly_mean_pm', [])
         self.monthly_name_data = data.get('airquality', {}).get('pm_by_month_name', [])
         main_site_info = self.monthly_data[0] if self.monthly_data else {}
         
         # Initialize models with error handling
         try:
             self.gemini_model = genai.GenerativeModel('gemini-pro')
             openai.api_key = Config.OPENAI_API_KEY
             if not openai.api_key:
                 raise ValueError("OpenAI API key is not set")
         except Exception as e:
             logger.error(f"Failed to initialize AI models: {e}")
             raise

Committable suggestion skipped: line range outside the PR's diff.


def _prepare_base_info(self):
return (

f"The air quality report is for {self.grid_name} for the period of {self.starttime} to {self.endtime}. "
f"These air quality monitoring sites are {self.site_names} and measure PM2.5 and PM10, "
f"at coordinates {self.site_latitude}°N, {self.site_longitude}°E. "
f"The annual PM2.5 concentration averages {self.annual_data} µg/m³."
)

def _generate_prompt(self, audience):
base_info = self._prepare_base_info()
if audience == "researcher":
return (
- f"Generate a comprehensive air quality assessment report for {self.grid_name} for the period of {self.starttime} to {self.endtime}. Begin with a detailed introduction (100-130 words) covering the city's geographical location, climate characteristics, population density, and major pollution sources. "
- f"{base_info} include the period under review."
- f"Daily mean measurements show: {self.daily_mean_data}. "
- f"Diurnal patterns indicate: {self.diurnal}. Monthly trends reveal: {self.monthly_data}. "
+ f"Generate a comprehensive air quality assessment report for {self.grid_name} for the period of {self.starttime} to {self.endtime}. Begin with a detailed introduction (100-130 words) covering the city's geographical location, climate characteristics, population density, and major pollution sources.\n"
+ f"{base_info}\n"
+ f"Daily mean measurements show values ranging from {self.daily_min_pm2_5['pm2_5_calibrated_value']} to {self.daily_max_pm2_5['pm2_5_calibrated_value']} µg/m³.\n"
+ f"Diurnal patterns indicate peak pollution levels at {self._format_diurnal_peak()}.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Define missing method _format_diurnal_peak().

The method _format_diurnal_peak() is called but not defined in the class.

Add the following implementation:

def _format_diurnal_peak(self):
    """Format the diurnal peak information."""
    if not self.diurnal:
        return "N/A"
    try:
        peak_data = max(self.diurnal, key=lambda x: x['pm2_5_calibrated_value'])
        return f"{peak_data['hour']}:00 hr with {peak_data['pm2_5_calibrated_value']} µg/m³"
    except (KeyError, ValueError) as e:
        logger.error(f"Error formatting diurnal peak: {e}")
        return "N/A"

+ f"Monthly trends reveal fluctuations correlated with seasonal changes.\n"
f"Provide a thorough analysis of spatial and temporal air quality variations, identify pollution hotspots and clean zones, examine seasonal patterns, and assess compliance with WHO guidelines. "
f"Conclude with actionable recommendations for air quality improvement and public health protection. Data source: AirQo monitoring network."
)

elif audience == "policymaker":
return (
f"Create an executive summary of air quality conditions in {self.grid_name} for the period of {self.starttime} to {self.endtime}. for policy decision-making. Begin with key findings and their policy implications (50-75 words). "
f"{base_info} include the period under review."
f"Highlight critical trends: {self.monthly_data}. Diurnal patterns indicate: {self.diurnal}. "
f"Focus on: 1) Areas exceeding air quality standards, 2) Population exposure risk assessment, "
f"3) Economic implications of poor air quality. Present clear, actionable policy recommendations with expected outcomes and implementation timeframes. "
f"Include cost-benefit considerations and potential regulatory measures. Data source: AirQo monitoring network."
)
elif audience == "general public":
return (
f"{base_info} include the period under review."
f"Create a clear, easy-to-understand report about air quality in {self.grid_name} for the period of {self.starttime} to {self.endtime}. Start with a simple explanation of why air quality matters for public health. "
f"We have {self.num_sites} air quality monitors in your area. The average PM2.5 level this year is {self.annual_pm2_5_calibrated_value} µg/m³. "
f"Diurnal patterns indicate: {self.diurnal}. Monthly trends reveal: {self.monthly_data}. "
f"Explain what these numbers mean for daily activities. Include: 1) When air quality is best and worst during the day, "
f"2) Which areas have better or worse air quality, 3) Simple steps people can take to protect their health, "
f"4) How to access daily air quality updates. Use plain language and avoid technical terms. "
f"Add practical tips for reducing exposure to air pollution. Data source: AirQo monitoring network."
)
Comment on lines +104 to +140
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Format data before including in prompts to improve readability

Including raw data structures like lists and dictionaries directly in the prompts may result in less coherent or unreadable text in the generated reports. It's advisable to format the data into human-readable strings before incorporating them into the prompts.

For example, transform self.daily_mean_data and self.diurnal into summarized strings:

# Format daily mean data
daily_means = ', '.join([f"{item['date']}: {item['pm2_5_calibrated_value']} µg/m³" for item in self.daily_mean_data if 'date' in item and 'pm2_5_calibrated_value' in item])

# Format diurnal patterns
diurnal_patterns = ', '.join([f"{item['hour']}h: {item['pm2_5_calibrated_value']} µg/m³" for item in self.diurnal if 'hour' in item and 'pm2_5_calibrated_value' in item])

Then update the prompts:

             if audience == "researcher":
                 return (   
                     f"Generate a comprehensive air quality assessment report for {self.grid_name} for the period of {self.starttime} to {self.endtime}. Begin with a detailed introduction (100-130 words) covering the city's geographical location, climate characteristics, population density, and major pollution sources.\n"  
                     f"{base_info}\n"  
-                    f"Daily mean measurements show values ranging from {self.daily_min_pm2_5['pm2_5_calibrated_value']} to {self.daily_max_pm2_5['pm2_5_calibrated_value']} µg/m³.\n"  
-                    f"Diurnal patterns indicate peak pollution levels at {self._format_diurnal_peak()}.\n"  
+                    f"Daily mean measurements: {daily_means}.\n"  
+                    f"Diurnal patterns: {diurnal_patterns}.\n"  
                     f"Monthly trends reveal fluctuations correlated with seasonal changes.\n"  
                     "Provide a thorough analysis of spatial and temporal air quality variations, identify pollution hotspots and clean zones, examine seasonal patterns, and assess compliance with WHO guidelines. "  
                     "Conclude with actionable recommendations for air quality improvement and public health protection. Data source: AirQo monitoring network."  
                 ) 
🧰 Tools
🪛 Ruff (0.8.0)

106-106: f-string without any placeholders

Remove extraneous f prefix

(F541)


107-107: f-string without any placeholders

Remove extraneous f prefix

(F541)


108-108: f-string without any placeholders

Remove extraneous f prefix

(F541)

else:
raise ValueError("Invalid audience type. Please specify 'researcher', 'policymaker', or 'general public'.")

def generate_report_with_gemini(self, audience):
prompt = self._generate_prompt(audience)
response = self.gemini_model.generate_content(prompt)
gemini_output = response.text
return self._prepare_report_json(gemini_output)
wabinyai marked this conversation as resolved.
Show resolved Hide resolved

def generate_report_with_openai(self, audience):
prompt = self._generate_prompt(audience)
try:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
openai_output = response.choices[0].message['content']
return self._prepare_report_json(openai_output)
except Exception as e:
print(f"Error: {e}")
return None


# Use non-LLM template text as report content
def generate_report_template_without_LLM(self, audience):
prompt = self._generate_prompt(audience)
report_content = prompt
return self._prepare_report_json(report_content)

def generate_report_without_llm(self):
# Determine peak time and least PM2.5 values
if self.diurnal:
peak_data = max(self.diurnal, key=lambda x: x['pm2_5_calibrated_value'])
peak_time = peak_data['hour']
peak_pm2_5 = peak_data['pm2_5_calibrated_value']
least_data = min(self.diurnal, key=lambda x: x['pm2_5_calibrated_value'])
least_pm2_5 = least_data['pm2_5_calibrated_value']
least_pm2_5_time = least_data['hour']
else:
peak_time = None
peak_pm2_5 = None
least_pm2_5 = None
least_pm2_5_time = None


introduction = (
f"The air quality report for {self.grid_name} covers the period from {self.starttime} to {self.endtime}. "
f"The {self.num_sites} monitored sites include: {', '.join(self.site_names)}. "
f"Measurements are taken for PM2.5 and PM10 concentrations. "
f"The annual average PM2.5 concentration is {self.annual_pm2_5_calibrated_value} µg/m³."
)

diurnal_description = (
f"Diurnal patterns observed include the following: {self.diurnal}. "
f"These patterns provide insight into air quality fluctuations throughout the day. "
f"The peak in PM2.5 {peak_pm2_5} levels occurs around {peak_time}:00 hr, indicating a period of higher pollution, often associated with increased activity or traffic. "
f"Conversely, the period with the least PM2.5 {least_pm2_5} µg/m³ levels is around {least_pm2_5_time} :00 hr , "
f"which usually represents a period of lower activity or better atmospheric dispersion."
f"Understanding the patterns of pollution and their impacts on public health is crucial for effective environmental management and policy-making. "
f"Throughout this report, we will explore key trends in PM2.5 and PM10 concentrations, the diurnal variations, and the impact of these levels on air quality across the region."

)

daily_mean_description = (
f"Daily mean PM2.5 measurements during the period were recorded as follows: {self.daily_mean_data}. "
f"This data reveals variations in air quality on a day-to-day basis."
)

site_pm25_description = (
f"The concentration of PM2.5 across different sites shows variability: "
f"{', '.join([f'{site} with PM2.5 levels' for site in self.site_names])}. "
f"These variations indicate site-specific air quality differences for the known grids."
)
conclusion = (
f"Overall, the air quality report highlights the importance of monitoring and understanding the patterns of PM2.5 and PM10 concentrations in the {self.grid_name} "
f"The analysis of the data reveals that air quality varies significantly over time, with periods of both moderate and unhealthy conditions. "
f"It’s observed that these fluctuations may be influenced by various factors, including seasonal changes. For instance, the washout effect during the rainy"
f" season could potentially contribute to these variations. Specifically, for the period from {self.starttime} to {self.endtime},"
f" the PM2.5 raw values ranged from {self.daily_min_pm2_5['pm2_5_raw_value']} µg/m³ on {self.daily_min_pm2_5['date']} to {self.daily_max_pm2_5['pm2_5_raw_value']} µg/m³ on {self.daily_max_pm2_5['date']}. respectively."
f"This pattern underscores the importance of continuous monitoring and the implementation of"
f"effective interventions to maintain air quality within safe limits. Ensuring good air quality is crucial for "
f"the well-being of both residents and visitors. Therefore, it’s imperative to adopt long-term"
f"strategies and measures that can effectively mitigate the impact of factors leading to poor airquality."
f"In conclusion, continuous monitoring, timely intervention, and effective policies are key to maintaining good air quality and safeguarding public health. "
)

report_content = (
f"{introduction}\n\n"
f"{diurnal_description}\n\n"
f"{daily_mean_description}\n\n"
f"{site_pm25_description}\n\n"
f"{conclusion}"
)


return self._prepare_report_json(report_content)

def _prepare_report_json(self, report_content):
return {
"grid_name": self.grid_name,
"main_site": self.main_site,
"annual_data": self.annual_data,
"daily_mean_data": self.daily_mean_data,
"diurnal": self.diurnal,
"monthly_data": self.monthly_data,
"report": report_content
}
10 changes: 9 additions & 1 deletion src/spatial/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,12 @@ scikit-learn~=1.5.2
gcsfs~=2024.9.0.post1
joblib~=1.4.2
lightgbm~=4.1.0
numpy~=1.25.2
numpy~=1.25.2
numpy
torch
transformers
datasets
sentencepiece
huggingface_hub
google-generativeai
openai
86 changes: 86 additions & 0 deletions src/spatial/views/report_view.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
from flask import request, jsonify
from models.report_datafetcher import DataFetcher, AirQualityReport

class ReportView:
@staticmethod
def _fetch_and_validate_request_data():
"""Extract and validate request data."""
data = request.json
grid_id = data.get("grid_id")
start_time = data.get("start_time")
end_time = data.get("end_time")
audience = data.get("audience", "general public")

# Validate input parameters
if not all([grid_id, start_time, end_time, audience]):
return None, jsonify({
"error": "Missing required parameters: grid_id, start_time, end_time, audience"
}), 400

# Fetch air quality data
air_quality_data = DataFetcher.fetch_air_quality_data_a(grid_id, start_time, end_time)
if air_quality_data is None:
return None, jsonify({
"error": "No data found for the given parameters"
}), 404

return data, air_quality_data, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance method return type clarity

The method returns a tuple with unclear types. Consider:

  1. Adding a proper return type annotation
  2. Using a TypedDict for structured data
  3. Documenting the return value format
     @staticmethod
-    def _fetch_and_validate_request_data():
+    def _fetch_and_validate_request_data() -> Tuple[Optional[Dict[str, Any]], Optional[Dict[str, Any]], Optional[Response]]:
         """Extract and validate request data.
+
+        Returns:
+            Tuple containing:
+            - Request data (Dict | None): The validated request parameters
+            - Air quality data (Dict | None): The fetched air quality data
+            - Error response (Response | None): Flask response in case of error
+        """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _fetch_and_validate_request_data():
"""Extract and validate request data."""
data = request.json
grid_id = data.get("grid_id")
start_time = data.get("start_time")
end_time = data.get("end_time")
audience = data.get("audience", "general public")
# Validate input parameters
if not all([grid_id, start_time, end_time, audience]):
return None, jsonify({
"error": "Missing required parameters: grid_id, start_time, end_time, audience"
}), 400
# Fetch air quality data
air_quality_data = DataFetcher.fetch_air_quality_data_a(grid_id, start_time, end_time)
if air_quality_data is None:
return None, jsonify({
"error": "No data found for the given parameters"
}), 404
return data, air_quality_data, None
@staticmethod
def _fetch_and_validate_request_data() -> Tuple[Optional[Dict[str, Any]], Optional[Dict[str, Any]], Optional[Response]]:
"""Extract and validate request data.
Returns:
Tuple containing:
- Request data (Dict | None): The validated request parameters
- Air quality data (Dict | None): The fetched air quality data
- Error response (Response | None): Flask response in case of error
"""
data = request.json
grid_id = data.get("grid_id")
start_time = data.get("start_time")
end_time = data.get("end_time")
audience = data.get("audience", "general public")
# Validate input parameters
if not all([grid_id, start_time, end_time, audience]):
return None, jsonify({
"error": "Missing required parameters: grid_id, start_time, end_time, audience"
}), 400
# Fetch air quality data
air_quality_data = DataFetcher.fetch_air_quality_data_a(grid_id, start_time, end_time)
if air_quality_data is None:
return None, jsonify({
"error": "No data found for the given parameters"
}), 404
return data, air_quality_data, None

@staticmethod
def _handle_error(exception):
return jsonify({
# "error": str(exception)
"error": "An internal error has occurred!"
}), 500
Comment on lines +30 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling with secure logging

While hiding exception details from responses is good for security, we should log them for debugging:

    @staticmethod
-   def _handle_error(exception):
+   def _handle_error(exception: Exception) -> Tuple[Response, int]:
+       """Handle exceptions securely while preserving debug information."""
+       # Log the full error for debugging
+       current_app.logger.error(
+           f"Error in report generation: {str(exception)}",
+           exc_info=True,
+           extra={
+               "request_id": request.headers.get("X-Request-ID", "unknown")
+           }
+       )
        return jsonify({
            "error": "An internal error has occurred!",
+           "request_id": request.headers.get("X-Request-ID", "unknown")
        }), 500
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _handle_error(exception):
return jsonify({
# "error": str(exception)
"error": "An internal error has occurred!"
}), 500
@staticmethod
def _handle_error(exception: Exception) -> Tuple[Response, int]:
"""Handle exceptions securely while preserving debug information."""
# Log the full error for debugging
current_app.logger.error(
f"Error in report generation: {str(exception)}",
exc_info=True,
extra={
"request_id": request.headers.get("X-Request-ID", "unknown")
}
)
return jsonify({
"error": "An internal error has occurred!",
"request_id": request.headers.get("X-Request-ID", "unknown")
}), 500


@staticmethod
def generate_air_quality_report_with_gemini():
"""Generate a report using Gemini model."""
data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
if error_response:
return error_response

try:
# Create an air quality report
report = AirQualityReport(air_quality_data)
# Generate the report with Gemini
json_report = report.generate_report_with_gemini(data.get("audience", "general public"))

if json_report is None:
return jsonify({
"error": "Failed to generate report with Gemini"
}), 500

return jsonify({
"report": json_report,
"model": "gemini"
}), 200

except Exception as e:
return ReportView._handle_error(e)

Comment on lines +36 to +61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract common report generation logic

There's significant code duplication between the Gemini and non-LLM report generation methods. Consider extracting the common logic into a private method:

+    @staticmethod
+    def _generate_report(generator_func: callable, model_name: str) -> Tuple[Response, int]:
+        """Generate a report using the specified generator function.
+        
+        Args:
+            generator_func: Function to generate the report
+            model_name: Name of the model being used
+        """
+        data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
+        if error_response:
+            return error_response
+
+        try:
+            report = AirQualityReport(air_quality_data)
+            json_report = generator_func()
+
+            if json_report is None:
+                return jsonify({
+                    "error": f"Failed to generate report with {model_name}"
+                }), 500
+
+            return jsonify({
+                "report": json_report,
+                "model": model_name
+            }), 200
+
+        except Exception as e:
+            return ReportView._handle_error(e)

     @staticmethod
     def generate_air_quality_report_with_gemini():
-        """Generate a report using Gemini model."""
-        data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
-        if error_response:
-            return error_response
-
-        try:
-            report = AirQualityReport(air_quality_data)
-            json_report = report.generate_report_with_gemini(data.get("audience", "general public"))
-
-            if json_report is None:
-                return jsonify({
-                    "error": "Failed to generate report with Gemini"
-                }), 500
-
-            return jsonify({
-                "report": json_report,
-                "model": "gemini"
-            }), 200
-
-        except Exception as e:
-            return ReportView._handle_error(e)
+        """Generate a report using Gemini model."""
+        return ReportView._generate_report(
+            lambda: report.generate_report_with_gemini(data.get("audience", "general public")),
+            "gemini"
+        )

Committable suggestion skipped: line range outside the PR's diff.

@staticmethod
def generate_air_quality_report_without_llm():
"""Generate a report without using LLM."""
data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
if error_response:
return error_response

try:
# Create an air quality report
report = AirQualityReport(air_quality_data)
# Generate the report without LLM
json_report = report.generate_report_without_llm()

if json_report is None:
return jsonify({
"error": "Failed to generate report"
}), 500

return jsonify({
"report": json_report,
"model": "rule_based"
}), 200

except Exception as e:
return ReportView._handle_error(e)
Comment on lines +62 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce code duplication in report generation methods

The non-LLM report generation method shares significant code with the Gemini version. Consider extracting common functionality.

+    @staticmethod
+    def _generate_report(generator_func: callable, model_name: str) -> Tuple[Any, int]:
+        """Generic report generation method.
+        
+        Args:
+            generator_func: Function to generate the report
+            model_name: Name of the model being used
+        Returns:
+            Tuple of (response, status_code)
+        """
+        data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
+        if error_response:
+            return error_response
+
+        try:
+            with ThreadPoolExecutor() as executor:
+                future = executor.submit(generator_func)
+                try:
+                    json_report = future.result(timeout=30)
+                except TimeoutError:
+                    return jsonify({
+                        "error": f"Report generation with {model_name} timed out",
+                        "details": "The request took too long to process"
+                    }), 504
+
+            if json_report is None:
+                return jsonify({
+                    "error": f"Failed to generate report with {model_name}",
+                    "details": "The model failed to generate a valid response"
+                }), 500
+
+            return jsonify({
+                "report": json_report,
+                "model": model_name
+            }), 200
+
+        except Exception as e:
+            return ReportView._handle_error(e)

     @staticmethod
     def generate_air_quality_report_without_llm():
-        """Generate a report without using LLM."""
-        data, air_quality_data, error_response = ReportView._fetch_and_validate_request_data()
-        if error_response:
-            return error_response
-
-        try:
-            # Create an air quality report
-            report = AirQualityReport(air_quality_data)
-            # Generate the report without LLM
-            json_report = report.generate_report_without_llm()
-
-            if json_report is None:
-                return jsonify({
-                    "error": "Failed to generate report"
-                }), 500
-
-            return jsonify({
-                "report": json_report,
-                "model": "rule_based"
-            }), 200
-
-        except Exception as e:
-            return ReportView._handle_error(e)
+        """Generate a report without using LLM.
+        
+        Returns:
+            Tuple of (response, status_code)
+        """
+        report = AirQualityReport(air_quality_data)
+        return ReportView._generate_report(
+            generator_func=report.generate_report_without_llm,
+            model_name="rule_based"
+        )

Committable suggestion skipped: line range outside the PR's diff.


Loading