CDC-to-Kafka 3.0.0
This release contains breaking changes.
CDC-to-Kafka 3.0.0 brings several dependency upgrades, performance improvements, and expanded SQL type support. It also improves the flexibility and schema management of "unified topics", which can contain change data messages from several different SQL tables produced in a transactionally-consistent order.
Changes:
- Upgrades the MS ODBC driver used in the Docker image (Breaking: this means that if you are using a Docker image built from this repo's
Dockerfile
, your DB connection strings will need to change to useDRIVER=ODBC Driver 18 for SQL Server
, and may also need to addTrustServerCertificate=yes;
. - Adds support for SQL data types
money
,smallmoney
,datetimeoffset
,smalldatetime
,xml
,rowversion
,float
, andreal
(hopefully addressing #17). - Breaking for users of unified topics: Previously, unified-topic messages were wrapped in a top-level object with fields
__source_table
and__change_data
, the latter of which was encoded with a single Avro schema that was a union type of all the tracked tables' schemas. With this release, the top-level wrapping is dropped and messages produced to unified topics are now Avro-encoded with multiple schemas, corresponding to the same per-table schemas that are used for messages in the single-table topics. This change greatly improves performance when unified topics are used, since additional re-serializations of the same change datum are no longer needed. Advances in schema management tooling (e.g. support for new subject naming strategies in the Confluent schema registry) made this a more attractive option. Breaking aspects:- The schemas of messages in any unified topics will change, and may now vary from message to message. Ensure consumers are prepared for this before switching.
- Configuration params for unified topics have changed.
UNIFIED_TOPICS_PARTITION_COUNT
andUNIFIED_TOPICS_EXTRA_CONFIG
have been dropped as top-level config parameters; instead, these options can now be specified for each unified topic separately within the expanded JSON object expected by parameterUNIFIED_TOPICS
(see the help string incdc_kafka/options.py
for details). - With the removal of the top-level wrapping and its
__source_table
field, consumers will now need to rely on knowledge of the Avro schema to determine what SQL table a given message corresponds to. The Avro schemaname
for message values produced by this tool follows format<source_table_schema_name>_<source_table_name>_cdc__value
; consumers may need to be prepared to parse this.
- ~30% maximum throughput increase (and more for those who also produce to unified topics!)
- PyPI package dependencies upgraded