Skip to content

CDC-to-Kafka 3.0.0

Compare
Choose a tag to compare
@woodlee woodlee released this 16 Feb 20:29
· 23 commits to master since this release
5125d3a

This release contains breaking changes.

CDC-to-Kafka 3.0.0 brings several dependency upgrades, performance improvements, and expanded SQL type support. It also improves the flexibility and schema management of "unified topics", which can contain change data messages from several different SQL tables produced in a transactionally-consistent order.

Changes:

  • Upgrades the MS ODBC driver used in the Docker image (Breaking: this means that if you are using a Docker image built from this repo's Dockerfile, your DB connection strings will need to change to use DRIVER=ODBC Driver 18 for SQL Server, and may also need to add TrustServerCertificate=yes;.
  • Adds support for SQL data types money, smallmoney, datetimeoffset, smalldatetime, xml, rowversion, float, and real (hopefully addressing #17).
  • Breaking for users of unified topics: Previously, unified-topic messages were wrapped in a top-level object with fields __source_table and __change_data, the latter of which was encoded with a single Avro schema that was a union type of all the tracked tables' schemas. With this release, the top-level wrapping is dropped and messages produced to unified topics are now Avro-encoded with multiple schemas, corresponding to the same per-table schemas that are used for messages in the single-table topics. This change greatly improves performance when unified topics are used, since additional re-serializations of the same change datum are no longer needed. Advances in schema management tooling (e.g. support for new subject naming strategies in the Confluent schema registry) made this a more attractive option. Breaking aspects:
    • The schemas of messages in any unified topics will change, and may now vary from message to message. Ensure consumers are prepared for this before switching.
    • Configuration params for unified topics have changed. UNIFIED_TOPICS_PARTITION_COUNT and UNIFIED_TOPICS_EXTRA_CONFIG have been dropped as top-level config parameters; instead, these options can now be specified for each unified topic separately within the expanded JSON object expected by parameter UNIFIED_TOPICS (see the help string in cdc_kafka/options.py for details).
    • With the removal of the top-level wrapping and its __source_table field, consumers will now need to rely on knowledge of the Avro schema to determine what SQL table a given message corresponds to. The Avro schema name for message values produced by this tool follows format <source_table_schema_name>_<source_table_name>_cdc__value; consumers may need to be prepared to parse this.
  • ~30% maximum throughput increase (and more for those who also produce to unified topics!)
  • PyPI package dependencies upgraded