Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kafka producer config #3726

Merged

Conversation

NicholasTurner23
Copy link
Contributor

@NicholasTurner23 NicholasTurner23 commented Oct 22, 2024

Description

Improve kafka producer persistence by adding retries and other configs.

Summary by CodeRabbit

  • New Features

    • Enhanced configuration management for Kafka producer with additional parameters.
    • Improved message sending logic with message count tracking and logging.
    • Updated DataFrame handling for better integration of device metadata.
  • Bug Fixes

    • Improved error handling during message delivery.
    • Refined partition selection logic for message publishing.
  • Documentation

    • Updated logging practices for consistency throughout the class.

Copy link
Contributor

coderabbitai bot commented Oct 22, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request focus on enhancing the MessageBrokerUtils class within the src/workflows/airqo_etl_utils/message_broker_utils.py file. Key updates include renaming the configuration dictionary, improving logging practices, refining message sending logic, and updating error handling mechanisms. Additionally, the method for updating hourly data topics has been modified to utilize pd.concat instead of pd.merge. These modifications aim to improve functionality and robustness in configuration management, logging, and message processing.

Changes

File Path Change Summary
src/workflows/airqo_etl_utils/message_broker_utils.py - Renamed self.conf to self.config.
- Added new configuration parameters for Kafka.
- Replaced print statements with logging.
- Updated publish_to_topic to track message count.
- Enhanced _send_message for better error handling.
- Refined partition selection logic.
- Updated consumer configuration to use config.
- Changed update_hourly_data_topic to update_hourly_data_topic_ and adjusted DataFrame merging method.

Possibly related PRs

  • Update/kafka implementations [WIP] #3683: This PR includes significant enhancements to the MessageBrokerUtils class, particularly in the Kafka interaction methods, which directly relates to the changes made in the main PR regarding message handling and the publish_to_topic method.
  • Clean up #3693: This PR also modifies the MessageBrokerUtils class, specifically the publish_to_topic method and introduces the consume_from_topic method, aligning closely with the changes in the main PR that involve similar functionalities and method updates.

Suggested reviewers

  • Mnoble-19
  • BenjaminSsempala
  • Psalmz777

🎉 In the world of code, changes take flight,
MessageBrokerUtils shines ever so bright.
With logging in place and configs anew,
Messages flow smoothly, like morning dew.
Kafka's embrace, with partitions refined,
In the realm of data, new paths we find! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8871d78 and d46b969.

📒 Files selected for processing (1)
  • src/workflows/airqo_etl_utils/message_broker_utils.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/workflows/airqo_etl_utils/message_broker_utils.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
src/workflows/airqo_etl_utils/message_broker_utils.py (2)

324-333: Review Kafka Producer Configuration Parameters

The updated configuration includes parameters like retries, batch.num.messages, retry.backoff.ms, debug, and message.timeout.ms. This enhances reliability, but please consider the following:

  • retry.backoff.ms: A value of 80,000 ms (80 seconds) might introduce long delays between retries. Ensure this aligns with your application's requirements.

  • debug: Setting debug to "msg" enables verbose logging, which can impact performance. Consider using it only during development or troubleshooting.


Line range hint 340-371: Ensure message_counts is Updated in All Scenarios

Currently, message_counts is incremented only when a column_key is provided. If column_key is None, and data is sent in chunks, message_counts is not updated, leading to inaccurate message count logging.

Apply this diff to increment message_counts in both scenarios:

 message_counts = 0

 if column_key:
     # Existing code
     message_counts += 1
 else:
     # Existing code
+    message_counts += len(chunk_data)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dffa902 and 8871d78.

📒 Files selected for processing (1)
  • src/workflows/airqo_etl_utils/message_broker_utils.py (7 hunks)
🧰 Additional context used
🔇 Additional comments (2)
src/workflows/airqo_etl_utils/message_broker_utils.py (2)

34-39: Configurations Initialized Correctly

The partition_loads and config dictionaries are appropriately initialized, enhancing the configuration management for the Kafka producer.


300-300: Appropriate Addition of producer.poll(1.0)

Adding producer.poll(1.0) ensures that the producer processes delivery reports, enhancing error handling and message delivery confirmation.

Comment on lines 82 to 85
logger.info("Dataframe info : ")
logger.info(data.info())
logger.info("Dataframe description : ")
logger.info(data.describe())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Logging of DataFrame Information

The methods data.info() and data.describe() print their output to the console and return None. As a result, logger.info(data.info()) will log None. To correctly log this information, you need to capture and log their outputs.

Apply this diff to fix the logging:

+import io

 logger.info("Dataframe info : ")
-buffer
-logger.info(data.info())
+buffer = io.StringIO()
+data.info(buf=buffer)
+logger.info(buffer.getvalue())

 logger.info("Dataframe description : ")
-logger.info(data.describe())
+logger.info(f"\n{data.describe()}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info("Dataframe info : ")
logger.info(data.info())
logger.info("Dataframe description : ")
logger.info(data.describe())
import io
logger.info("Dataframe info : ")
buffer = io.StringIO()
data.info(buf=buffer)
logger.info(buffer.getvalue())
logger.info("Dataframe description : ")
logger.info(f"\n{data.describe()}")

Comment on lines 404 to 412
self.config.update(
{
"group.id": group_id,
"auto.offset.reset": auto_offset_reset,
"enable.auto.commit": "true" if auto_commit else "false",
}
)

consumer = Consumer(self.conf)
consumer = Consumer(self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid Modifying Shared self.config for Consumer Settings

Updating self.config with consumer-specific parameters can unintentionally affect producer configurations elsewhere in the class.

Consider using a separate configuration for the consumer:

 def consume_from_topic(
     self,
     # existing parameters
 ):
+    consumer_config = self.config.copy()
+    consumer_config.update(
+        {
+            "group.id": group_id,
+            "auto.offset.reset": auto_offset_reset,
+            "enable.auto.commit": "true" if auto_commit else "false",
+        }
+    )

-    self.config.update(
-        {
-            "group.id": group_id,
-            "auto.offset.reset": auto_offset_reset,
-            "enable.auto.commit": "true" if auto_commit else "false",
-        }
-    )
-    consumer = Consumer(self.config)
+    consumer = Consumer(consumer_config)

This ensures that producer configurations remain unaffected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.config.update(
{
"group.id": group_id,
"auto.offset.reset": auto_offset_reset,
"enable.auto.commit": "true" if auto_commit else "false",
}
)
consumer = Consumer(self.conf)
consumer = Consumer(self.config)
consumer_config = self.config.copy()
consumer_config.update(
{
"group.id": group_id,
"auto.offset.reset": auto_offset_reset,
"enable.auto.commit": "true" if auto_commit else "false",
}
)
consumer = Consumer(consumer_config)

Copy link
Contributor

@Baalmart Baalmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants