Releases · lensesio/stream-reactor

DataLakes (S3, GCP) source fixes

Polling Backoff

The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.

From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.

Avoid filtering by lastSeenFile where a post process action is configured

When ordering by LastModified and a post-process action is configured, avoid filtering to the latest result.

This change avoids bugs caused by inconsistent LastModified dates used for sorting.
If LastModified sorting is used, ensure objects do not arrive late, or use a post-processing step to handle them.

Add a flag to populate kafka headers with the watermark partition/offset

This adds a connector property for GCP Storage and S3 Sources:
connect.s3.source.write.watermark.header
connect.gcpstorage.source.write.watermark.header

If set to true then the headers in the source record produced will include details of the source and line number of the file.

If set to false (the default) then the headers won't be set.

Currently this does not apply when using the envelope mode.

🚀 New Features

Sequential Message Sending for Azure Service Bus

Introduced a new KCQL property: batch.enabled (default: true).
Users can now disable batching to send messages sequentially, addressing specific scenarios with large message sizes (e.g., >1 MB).
Why this matters: Batching improves performance but can fail for large messages. Sequential sending ensures reliability in such cases.
How to use: Configure batch.enabled=false in the KCQL mapping to enable sequential sending.

Post-Processing for Datalake Cloud Source Connectors

Added post-processing capabilities for AWS S3 and GCP Storage source connectors ( Azure Datalake Gen 2 support coming soon).
New KCQL properties:
- post.process.action: Defines the action (DELETE or MOVE) to perform on source files after successful processing.
- post.process.action.bucket: Specifies the target bucket for the MOVE action (required for MOVE).
- post.process.action.prefix: Specifies a new prefix for the file’s location when using the MOVE action (required for MOVE).
Use cases:
- Free up storage space by deleting files.
- Archive or organize processed files by moving them to a new location.
Example 1 : Delete Files:

INSERT INTO `my-bucket`
SELECT * FROM `my-topic`
PROPERTIES ('post.process.action'=`DELETE`)

Example 2: Move files to an archive bucket:

INSERT INTO `my-bucket:archive/`
SELECT * FROM `my-topic`
PROPERTIES (
    'post.process.action'=`MOVE`,
    'post.process.action.bucket'=`archive-bucket`,
    'post.process.action.prefix'=`archive/`
)

🛠 Dependency Updates

Updated Azure Service Bus Dependencies

azure-core updated to version 1.54.1.
azure-messaging-servicebus updated to version 7.17.6.

These updates ensure compatibility with the latest Azure SDKs and improve stability and performance.

Upgrade Notes

Review the new KCQL properties and configurations for Azure Service Bus and Datalake connectors.
Ensure compatibility with the updated Azure Service Bus dependencies if you use custom extensions or integrations.

Thank you to all contributors! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLakes (S3, GCP) source fixes

Polling Backoff

Avoid filtering by lastSeenFile where a post process action is configured

Add a flag to populate kafka headers with the watermark partition/offset

Enhance DataLake Source Connectors: Robust State Management and Move Location Path Handling

Azure Service Bus source

Sequential Message Sending for Azure Service Bus

Post-Processing for Datalake Cloud Source Connectors

🛠 Dependency Updates

Upgrade Notes

Releases: lensesio/stream-reactor

Stream Reactor 8.1.26

Stream Reactor 8.1.25

Stream Reactor 8.1.24

Stream Reactor 8.1.23

DataLakes (S3, GCP) source fixes

Polling Backoff

Avoid filtering by lastSeenFile where a post process action is configured

Add a flag to populate kafka headers with the watermark partition/offset

Stream Reactor 8.1.22

Enhance DataLake Source Connectors: Robust State Management and Move Location Path Handling

Stream Reactor 8.1.21

Azure Service Bus source

Stream Reactor 8.1.20

Stream Reactor 8.1.19

Stream Reactor 8.1.18

Sequential Message Sending for Azure Service Bus

Post-Processing for Datalake Cloud Source Connectors

🛠 Dependency Updates

Upgrade Notes

Stream Reactor 8.1.17