Releases: woodlee/sqlserver-cdc-to-kafka
CDC-to-Kafka 4.3.0
This release speeds up Avro serialization significantly, leading to a 2-3x increase in overall throughput. It also includes a few minor bugfixes, namely:
- Field truncation (when configured) now properly operates against encoded bytelength rather than string character length
- A fix was made for failures to skip unnecessary new snapshots whenever a DDL statement which added a new nullable column to a tracked table used bracket-quoting for the new column's name
- The
show_snapshot_history
tool no longer raises an exception when queried for a topic that does not yet exist
CDC-to-Kafka 4.2.0
This release includes a new feature which adds a Kafka message header to messages where fields have been truncated for length due to use of the TRUNCATE_FIELDS
config option. A header with key cdc_to_kafka_truncated_field__<column_name>
and value <original_byte_length>,<truncated_byte_length>
is added for each column where truncation was applied. Truncation is based on byte length, not string length, and respects whole-character boundaries for the UTF-8 encoded strings this tool publishes in accordance with the Avro specification.
CDC-to-Kafka 4.1.2
This release contains various stability and performance improvements:
- Switches from multiprocessing to multithreading, eliminating slowdowns due to Pickle (de)serialization.
- Ensures that Kafka consumers are closed immediately after use, to prevent delayed exceptions due to OAuth expiration.
- Gives librdkafka more time to destroy its objects at shutdown, to prevent exit-time hangs.
- Reduces the number of no-op Kafka transactions used.
- Tighter/faster looping over the specific table(s) that are falling behind whenever lag occurs due to high change data volume.
- Adds a local-file metrics reporter class, which can be used for e.g. Kubernetes liveness probes.
- Some bugfixes for the separate example
replayer.py
consumer script. - The
show_snapshot_history.py
script/tool can now handle multiple tables per invocation. - Fixes a
KeyError
that could occur in cases where a re-snapshot of an existing but altered table is being skipped.
CDC-to-Kafka 4.1.1
This release fixes two bugs:
- 790b618 (Fix for bursts of SASL auth errors at startup time) - This did not directly impact the process, but was confusing for logging and error-aggregating tools.
- b45ebee (Removing unnecessary consumer.close() that can cause shutdown hangs) - This would cause the process to not exit fully upon exceptions if the SASL auth token for the Kafka consumer had expired between startup and the time of the exception.
CDC-to-Kafka 4.1.0
This release adds two new features:
- Snapshot logging: You can now configure the process to optionally produce to a new logging topic, with messages that provide a history of table snapshot start/completion/reset events. This is meant to support cleaning up prior messages in a table-topic if and when a later re-snapshot of a table is performed.
- IAM authentication for AWS MSK clusters: New configuration parameters allow for SASL/OAUTHBEARER-based authentication to AWS MSK Kafka clusters secured via IAM. The code structure also provides for an easy way to add other OAuth-based authenticators in the future.
The process is now based on Python 3.12, and Python package dependencies have been updated to the latest currently available.
CDC-to-Kafka 4.0.3
This release fixes a bug causing startup hangs when the progress-tracking topic is new or empty.
CDC-to-Kafka 4.0.2
This release contains a minor fix improving detection of cases where a new snapshot is not needed following the creation of an updated capture instance for a table.
CDC-to-Kafka 4.0.1
This is a bugfix release, addressing the following bugs that appeared in v4.0.0:
- Crashes during initial start due to Kafka transactional producer fencing during long-running transactions when, e.g., several new tables are first added for tracking.
- Failure to correctly recognize that a topic is not preexisting, and needs creating.
- Typo in the name of a newly-added message header indicating a message's source table, in unified-topic messages.
- A
TypeError
raised when first evaluating whether an empty SQL table needs snapshotting. - Continued attempts to prevent occasional process hangs when the process exits due to repeated SQL query timeouts.
CDC-to-Kafka 4.0.0
The primary focus of this new version was to adopt Kafka transactions to improve the reliability of exactly-once producing, and also to ensure that messages produced to "unified" topics have the same guarantees as those produced to the corresponding single-table topics. Some breaking changes, noted below, led to the major-version bump:
- In the config options, "whitelist" / "blacklist" language has been removed, in favor of "include" / "exclude". So e.g., the option
TABLE_WHITELIST_REGEX
is nowTABLE_INCLUDE_REGEX
. - To support the use of Kafka transactions, a new required config parameter
KAFKA_TRANSACTIONAL_ID
has been added. You should set this to something that will remain stable across restarts of a particularly-configured instance of this process. - If you are currently using the
KafkaReporter
as one of the metrics reporters configured inMETRICS_REPORTERS
, be aware that the Avro schema of the messages it produces has changed in a breaking way in this release. You may wish to switch to a new metrics topic, or to temporarily relax schema compatibility checking in the schema registry for your existing metrics topic.
Other changes:
- Improved type checking in code; now passes mypy analysis.
- Upgraded to Python 3.11 for the Docker image
- Upgraded 3rd-party dependencies
- Faster startup time
- Improvements in validation/testing scripts
CDC-to-Kafka 3.4.0
New feature: Allow Avro type overriding for specifically chosen fields via new config parameter AVRO_TYPE_SPEC_OVERRIDES
Bugfix: Handle cases where certain kinds of SQL column type changes are reflected in-place on existing capture tables without requiring creation of a new capture instance. Previously such changes would not cause the process to restart, and therefore the corresponding topic's Avro schema would not get updated. This has been corrected.