Update kafka producer config #3726

NicholasTurner23 · 2024-10-22T14:40:18Z

Description

Improve kafka producer persistence by adding retries and other configs.

Summary by CodeRabbit

New Features
- Enhanced configuration management for Kafka producer with additional parameters.
- Improved message sending logic with message count tracking and logging.
- Updated DataFrame handling for better integration of device metadata.
Bug Fixes
- Improved error handling during message delivery.
- Refined partition selection logic for message publishing.
Documentation
- Updated logging practices for consistency throughout the class.

coderabbitai · 2024-10-22T14:40:27Z

📝 Walkthrough

Walkthrough

The changes in this pull request focus on enhancing the MessageBrokerUtils class within the src/workflows/airqo_etl_utils/message_broker_utils.py file. Key updates include renaming the configuration dictionary, improving logging practices, refining message sending logic, and updating error handling mechanisms. Additionally, the method for updating hourly data topics has been modified to utilize pd.concat instead of pd.merge. These modifications aim to improve functionality and robustness in configuration management, logging, and message processing.

Changes

File Path	Change Summary
`src/workflows/airqo_etl_utils/message_broker_utils.py`	- Renamed `self.conf` to `self.config`. - Added new configuration parameters for Kafka. - Replaced print statements with logging. - Updated `publish_to_topic` to track message count. - Enhanced `_send_message` for better error handling. - Refined partition selection logic. - Updated consumer configuration to use `config`. - Changed `update_hourly_data_topic` to `update_hourly_data_topic_` and adjusted DataFrame merging method.

Possibly related PRs

Update/kafka implementations [WIP] #3683: This PR includes significant enhancements to the MessageBrokerUtils class, particularly in the Kafka interaction methods, which directly relates to the changes made in the main PR regarding message handling and the publish_to_topic method.
Clean up #3693: This PR also modifies the MessageBrokerUtils class, specifically the publish_to_topic method and introduces the consume_from_topic method, aligning closely with the changes in the main PR that involve similar functionalities and method updates.

Suggested reviewers

Mnoble-19
BenjaminSsempala
Psalmz777

🎉 In the world of code, changes take flight,
MessageBrokerUtils shines ever so bright.
With logging in place and configs anew,
Messages flow smoothly, like morning dew.
Kafka's embrace, with partitions refined,
In the realm of data, new paths we find! 🌟

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8871d78 and d46b969.

📒 Files selected for processing (1)

src/workflows/airqo_etl_utils/message_broker_utils.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/workflows/airqo_etl_utils/message_broker_utils.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

src/workflows/airqo_etl_utils/message_broker_utils.py (2)
324-333: Review Kafka Producer Configuration Parameters

The updated configuration includes parameters like retries, batch.num.messages, retry.backoff.ms, debug, and message.timeout.ms. This enhances reliability, but please consider the following:

retry.backoff.ms: A value of 80,000 ms (80 seconds) might introduce long delays between retries. Ensure this aligns with your application's requirements.

debug: Setting debug to "msg" enables verbose logging, which can impact performance. Consider using it only during development or troubleshooting.

Line range hint 340-371: Ensure message_counts is Updated in All Scenarios

Currently, message_counts is incremented only when a column_key is provided. If column_key is None, and data is sent in chunks, message_counts is not updated, leading to inaccurate message count logging.

Apply this diff to increment message_counts in both scenarios:
 message_counts = 0

 if column_key:
     # Existing code
     message_counts += 1
 else:
     # Existing code
+    message_counts += len(chunk_data)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dffa902 and 8871d78.

📒 Files selected for processing (1)

src/workflows/airqo_etl_utils/message_broker_utils.py (7 hunks)

🧰 Additional context used

🔇 Additional comments (2)

src/workflows/airqo_etl_utils/message_broker_utils.py (2)

34-39: Configurations Initialized Correctly

The partition_loads and config dictionaries are appropriately initialized, enhancing the configuration management for the Kafka producer.

300-300: Appropriate Addition of producer.poll(1.0)

Adding producer.poll(1.0) ensures that the producer processes delivery reports, enhancing error handling and message delivery confirmation.

coderabbitai · 2024-10-22T14:44:52Z

src/workflows/airqo_etl_utils/message_broker_utils.py

+        logger.info("Dataframe info : ")
+        logger.info(data.info())
+        logger.info("Dataframe description : ")
+        logger.info(data.describe())


⚠️ Potential issue

Fix Logging of DataFrame Information

The methods data.info() and data.describe() print their output to the console and return None. As a result, logger.info(data.info()) will log None. To correctly log this information, you need to capture and log their outputs.

Apply this diff to fix the logging:

+import io logger.info("Dataframe info : ") -buffer -logger.info(data.info()) +buffer = io.StringIO() +data.info(buf=buffer) +logger.info(buffer.getvalue()) logger.info("Dataframe description : ") -logger.info(data.describe()) +logger.info(f"\n{data.describe()}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info("Dataframe info : ")

logger.info(data.info())

logger.info("Dataframe description : ")

logger.info(data.describe())

import io

logger.info("Dataframe info : ")

buffer = io.StringIO()

data.info(buf=buffer)

logger.info(buffer.getvalue())

logger.info("Dataframe description : ")

logger.info(f"\n{data.describe()}")

coderabbitai · 2024-10-22T14:44:52Z

src/workflows/airqo_etl_utils/message_broker_utils.py

+        self.config.update(
            {
                "group.id": group_id,
                "auto.offset.reset": auto_offset_reset,
                "enable.auto.commit": "true" if auto_commit else "false",
            }
        )

-        consumer = Consumer(self.conf)
+        consumer = Consumer(self.config)


⚠️ Potential issue

Avoid Modifying Shared self.config for Consumer Settings

Updating self.config with consumer-specific parameters can unintentionally affect producer configurations elsewhere in the class.

Consider using a separate configuration for the consumer:

def consume_from_topic( self, # existing parameters ): + consumer_config = self.config.copy() + consumer_config.update( + { + "group.id": group_id, + "auto.offset.reset": auto_offset_reset, + "enable.auto.commit": "true" if auto_commit else "false", + } + ) - self.config.update( - { - "group.id": group_id, - "auto.offset.reset": auto_offset_reset, - "enable.auto.commit": "true" if auto_commit else "false", - } - ) - consumer = Consumer(self.config) + consumer = Consumer(consumer_config)

This ensures that producer configurations remain unaffected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.config.update(

{

"group.id": group_id,

"auto.offset.reset": auto_offset_reset,

"enable.auto.commit": "true" if auto_commit else "false",

}

)

consumer = Consumer(self.conf)

consumer = Consumer(self.config)

consumer_config = self.config.copy()

consumer_config.update(

{

"group.id": group_id,

"auto.offset.reset": auto_offset_reset,

"enable.auto.commit": "true" if auto_commit else "false",

}

)

consumer = Consumer(consumer_config)

Baalmart

thanks @NicholasTurner23

Update kafka producer config

8871d78

NicholasTurner23 requested review from Baalmart and BenjaminSsempala October 22, 2024 14:40

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

Separate kafka consumer and producer config

d46b969

Baalmart approved these changes Oct 22, 2024

View reviewed changes

Baalmart merged commit f4b412b into airqo-platform:staging Oct 22, 2024
44 checks passed

Baalmart mentioned this pull request Oct 22, 2024

move to production #3727

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update kafka producer config #3726

Update kafka producer config #3726

NicholasTurner23 commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 22, 2024

coderabbitai bot Oct 22, 2024

Baalmart left a comment

Update kafka producer config #3726

Update kafka producer config #3726

Conversation

NicholasTurner23 commented Oct 22, 2024 • edited by coderabbitai bot Loading

Description

Summary by CodeRabbit

coderabbitai bot commented Oct 22, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 22, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 22, 2024

Choose a reason for hiding this comment

Baalmart left a comment

Choose a reason for hiding this comment

NicholasTurner23 commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)