Skip to content

Releases: woodlee/sqlserver-cdc-to-kafka

CDC-to-Kafka 3.3.1

02 Jun 20:20
ea09c01
Compare
Choose a tag to compare

BUGFIX: Fixes schema and table name quoting in some SQL queries to handle a bug that arose for tables whose name was also a SQL keyword.

CDC-to-Kafka 3.3.0

02 Jun 14:17
f42f18e
Compare
Choose a tag to compare

Feature additions:

  1. Adds the ALWAYS_USE_AVRO_LONGS option, which maps all SQL integer types to Avro long fields, easing future column type upgrades.
  2. Adds the replayer.py script as a demonstration of using the topics produced by this tool to create a copy of a table in another SQL Server database.
  3. Snapshot prevention! This release adds logic to detect when a new full-table snapshot is not necessary, avoiding the associated flood of produced messages for certain kinds of schema changes. For example, the addition (and/or new CDC-tracking) of a nullable column on a table as long as that column still contains only null values.
  4. LSN gaps when upgrading to a new capture instance no longer require a new snapshot, as long as no new change rows were published to the prior capture instance since the most recent messages that were produced to Kafka.

Other:

  1. Exceptions raised in the HttpPostReporter metrics reporter are caught and logged as warnings to reduce clutter in error-tracking tools like Sentry.
  2. Package upgrades, removal of some unused code, logging improvements, and refactoring to reduce the size of main.py.

CDC-to-Kafka 3.2.0

03 May 16:29
7b3f91e
Compare
Choose a tag to compare
  1. Reverts to using Python v3.8 in the Dockerfile, due to process-hang issues associated with changes that were released in Python v3.9
  2. Implements limited in-process retrying of timed out SQL queries, instead of always relying on process crash and supervisor restart mechanics
  3. Makes the row batch size for DB queries configurable (but still defaults to the original 2,000)
  4. Logging improvements and cleanups
  5. Fixes a bug that could cause progress hearbeat messages to be emitted with an LSN lower than the last previously published progress message for a table
  6. Moves HTTP-based metrics reporting to a separate thread to reduce latency impact on the main process
  7. Introduces progress_reset_tool.py, which can be used to delete progress entries for specific topic(s), e.g. to trigger re-taking a snapshot without needing to create a new capture instance in the DB
  8. Improves logic in both the progress-topic and regular-topic validator tools
  9. Allows use of pseudo-failover in cases where connections to the primary server time out entirely

CDC-to-Kafka 3.1.1

27 Mar 22:49
Compare
Choose a tag to compare

Fixes a bug from the prior release that caused snapshot completion recognition to fail when low values in the table PK had been deleted since the last snapshot.

CDC-to-Kafka 3.1.0

27 Mar 22:06
479e2ed
Compare
Choose a tag to compare

Changes in this version

  1. A new execution option, REPORT_PROGRESS_ONLY, was added. If set, the process starts, prints the table of its current progress against followed tables, and exits without making any changes.
  2. Tracking of snapshot completion was improved. Previously, tables that had a PK that was not monotonically increasing (such as a GUID) would, at process start, begin a new snapshot to pick up any rows with PK values lower than the lowest value seen by a prior invocation's snapshot. Since such rows also appear as inserts in the change data events, this represented an unnecessary duplication and caused confusion. In addition, it was possible that a table that contained no rows when it was first tracked would be snapshotted the next time the process restarted, if rows had been added in the interim. Again this was unnecessary since the added rows would be present in the change data events. Both issues have been fixed.
  3. Cleaned up confusing and overly verbose logging that would be printed when something changed in the tracked capture instances on the SQL Server side.
  4. Fixed a bug that could cause the process to incorrectly believe that there was a coverage gap in the LSNs between old and new capture instances for a given table, particularly for low-change-volume tables.
  5. Upgrade package dependencies and the Python version used for the Docker image.

CDC-to-Kafka 3.0.0

16 Feb 20:29
5125d3a
Compare
Choose a tag to compare

This release contains breaking changes.

CDC-to-Kafka 3.0.0 brings several dependency upgrades, performance improvements, and expanded SQL type support. It also improves the flexibility and schema management of "unified topics", which can contain change data messages from several different SQL tables produced in a transactionally-consistent order.

Changes:

  • Upgrades the MS ODBC driver used in the Docker image (Breaking: this means that if you are using a Docker image built from this repo's Dockerfile, your DB connection strings will need to change to use DRIVER=ODBC Driver 18 for SQL Server, and may also need to add TrustServerCertificate=yes;.
  • Adds support for SQL data types money, smallmoney, datetimeoffset, smalldatetime, xml, rowversion, float, and real (hopefully addressing #17).
  • Breaking for users of unified topics: Previously, unified-topic messages were wrapped in a top-level object with fields __source_table and __change_data, the latter of which was encoded with a single Avro schema that was a union type of all the tracked tables' schemas. With this release, the top-level wrapping is dropped and messages produced to unified topics are now Avro-encoded with multiple schemas, corresponding to the same per-table schemas that are used for messages in the single-table topics. This change greatly improves performance when unified topics are used, since additional re-serializations of the same change datum are no longer needed. Advances in schema management tooling (e.g. support for new subject naming strategies in the Confluent schema registry) made this a more attractive option. Breaking aspects:
    • The schemas of messages in any unified topics will change, and may now vary from message to message. Ensure consumers are prepared for this before switching.
    • Configuration params for unified topics have changed. UNIFIED_TOPICS_PARTITION_COUNT and UNIFIED_TOPICS_EXTRA_CONFIG have been dropped as top-level config parameters; instead, these options can now be specified for each unified topic separately within the expanded JSON object expected by parameter UNIFIED_TOPICS (see the help string in cdc_kafka/options.py for details).
    • With the removal of the top-level wrapping and its __source_table field, consumers will now need to rely on knowledge of the Avro schema to determine what SQL table a given message corresponds to. The Avro schema name for message values produced by this tool follows format <source_table_schema_name>_<source_table_name>_cdc__value; consumers may need to be prepared to parse this.
  • ~30% maximum throughput increase (and more for those who also produce to unified topics!)
  • PyPI package dependencies upgraded

CDC-to-Kafka 2.2.2

29 Sep 21:08
9b0ce2a
Compare
Choose a tag to compare
  • Fixes a bug whereby columns deleted from the base table but still present on the capture instance would cause snapshot SQL queries to incorrectly refer to the no-longer-extant columns
  • Upgrades some external dependencies
  • Improvement tweaks to messaging when running in validation mode
  • Style fixes

CDC-to-Kafka 2.2.1

11 Nov 23:11
Compare
Choose a tag to compare

Bugfix: Prevent errors when field truncation is configured for a nullable string field

CDC-to-Kafka 2.2.0

27 Oct 19:31
Compare
Choose a tag to compare

This release adds automatic creation of unified-messages topics, with strong encouragement to keep them as single-partition topics so that in-order consumption is simplified.

CDC-to-Kafka 2.1.2

08 Sep 18:15
Compare
Choose a tag to compare

Tries to better handle exceptions like:

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 392-393: illegal UTF-16 surrogate

...when the process encounters data in SQL Server that is not properly UTF-16-encoded.