-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/kafka implementations #3760
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,7 @@ def __init__(self): | |
# Note: This should be updated in case the number of partions used changes. | ||
self.partition_loads = {int(p): 0 for p in self.__partitions} | ||
self.config = { | ||
"bootstrap.servers": self.__bootstrap_servers, | ||
"bootstrap.servers": "35.187.179.27:30200,34.79.78.204:30200,34.78.90.92:30200", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Critical: Remove hardcoded Kafka bootstrap servers. The bootstrap servers should not be hardcoded in the source code. This practice:
Replace the hardcoded servers with configuration reference: - "bootstrap.servers": "35.187.179.27:30200,34.79.78.204:30200,34.78.90.92:30200",
+ "bootstrap.servers": self.__bootstrap_servers,
|
||
"metadata.max.age.ms": 60000, | ||
} | ||
|
||
|
@@ -205,7 +205,7 @@ def consume_from_topic( | |
max_messages: Optional[int] = None, | ||
auto_commit: bool = True, | ||
offset: Optional[int] = None, | ||
wait_time_sec: int = 30, | ||
wait_time_sec: int = 40, | ||
streaming: bool = False, | ||
) -> Any: | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -96,12 +96,16 @@ def send_hourly_measurements_to_message_broker( | |
from airqo_etl_utils.message_broker_utils import MessageBrokerUtils | ||
from airqo_etl_utils.data_validator import DataValidationUtils | ||
from airqo_etl_utils.constants import Tenant | ||
from datetime import datetime | ||
|
||
now = datetime.now() | ||
unique_str = str(now.date()) + "-" + str(now.hour) | ||
Comment on lines
+99
to
+102
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider using a more robust unique identifier format While using date and hour for uniqueness is a good start, there could be edge cases where multiple DAG runs occur within the same hour. Consider enhancing the unique string generation to include minutes and a random component: -unique_str = str(now.date()) + "-" + str(now.hour)
+unique_str = f"{now.strftime('%Y-%m-%d-%H-%M')}-{hash(now.timestamp())}" This would provide better guarantees of uniqueness while maintaining temporal ordering. Also applies to: 372-375 |
||
|
||
data = DataValidationUtils.process_data_for_message_broker( | ||
data=data, | ||
tenant=Tenant.AIRQO, | ||
topic=configuration.HOURLY_MEASUREMENTS_TOPIC, | ||
caller=kwargs["dag"].dag_id, | ||
caller=kwargs["dag"].dag_id + unique_str, | ||
Comment on lines
+99
to
+108
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider extracting unique string generation to a utility function The unique string generation logic is duplicated in both DAGs. Consider extracting this to a utility function in Example implementation: # In airqo_etl_utils/date.py
def generate_unique_caller_id(dag_id: str) -> str:
now = datetime.now()
unique_str = f"{now.strftime('%Y-%m-%d-%H-%M')}-{hash(now.timestamp())}"
return f"{dag_id}{unique_str}" Then in the DAGs: -now = datetime.now()
-unique_str = str(now.date()) + "-" + str(now.hour)
-caller=kwargs["dag"].dag_id + unique_str,
+caller=generate_unique_caller_id(kwargs["dag"].dag_id), Also applies to: 372-381 |
||
) | ||
broker = MessageBrokerUtils() | ||
broker.publish_to_topic( | ||
|
@@ -365,12 +369,16 @@ def send_hourly_measurements_to_message_broker(data: pd.DataFrame, **kwargs): | |
from airqo_etl_utils.message_broker_utils import MessageBrokerUtils | ||
from airqo_etl_utils.data_validator import DataValidationUtils | ||
from airqo_etl_utils.constants import Tenant | ||
from datetime import datetime | ||
|
||
now = datetime.now() | ||
unique_str = str(now.date()) + "-" + str(now.hour) | ||
|
||
data = DataValidationUtils.process_data_for_message_broker( | ||
data=data, | ||
tenant=Tenant.AIRQO, | ||
topic=configuration.HOURLY_MEASUREMENTS_TOPIC, | ||
caller=kwargs["dag"].dag_id, | ||
caller=kwargs["dag"].dag_id + unique_str, | ||
) | ||
broker = MessageBrokerUtils() | ||
broker.publish_to_topic( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding error handling for XCom operations.
The XCom operations could fail silently if there are issues with the Airflow context or XCom backend. Consider adding try-except blocks to handle potential errors gracefully.