Releases · woodlee/sqlserver-cdc-to-kafka

02 Jun 20:20

woodlee

v3.3.1

ea09c01

CDC-to-Kafka 3.3.1

BUGFIX: Fixes schema and table name quoting in some SQL queries to handle a bug that arose for tables whose name was also a SQL keyword.

Assets 2

02 Jun 14:17

woodlee

v3.3.0

f42f18e

CDC-to-Kafka 3.3.0

Feature additions:

Adds the ALWAYS_USE_AVRO_LONGS option, which maps all SQL integer types to Avro long fields, easing future column type upgrades.
Adds the replayer.py script as a demonstration of using the topics produced by this tool to create a copy of a table in another SQL Server database.
Snapshot prevention! This release adds logic to detect when a new full-table snapshot is not necessary, avoiding the associated flood of produced messages for certain kinds of schema changes. For example, the addition (and/or new CDC-tracking) of a nullable column on a table as long as that column still contains only null values.
LSN gaps when upgrading to a new capture instance no longer require a new snapshot, as long as no new change rows were published to the prior capture instance since the most recent messages that were produced to Kafka.

Other:

Exceptions raised in the HttpPostReporter metrics reporter are caught and logged as warnings to reduce clutter in error-tracking tools like Sentry.
Package upgrades, removal of some unused code, logging improvements, and refactoring to reduce the size of main.py.

Assets 2

03 May 16:29

woodlee

v3.2.0

7b3f91e

CDC-to-Kafka 3.2.0

Reverts to using Python v3.8 in the Dockerfile, due to process-hang issues associated with changes that were released in Python v3.9
Implements limited in-process retrying of timed out SQL queries, instead of always relying on process crash and supervisor restart mechanics
Makes the row batch size for DB queries configurable (but still defaults to the original 2,000)
Logging improvements and cleanups
Fixes a bug that could cause progress hearbeat messages to be emitted with an LSN lower than the last previously published progress message for a table
Moves HTTP-based metrics reporting to a separate thread to reduce latency impact on the main process
Introduces progress_reset_tool.py, which can be used to delete progress entries for specific topic(s), e.g. to trigger re-taking a snapshot without needing to create a new capture instance in the DB
Improves logic in both the progress-topic and regular-topic validator tools
Allows use of pseudo-failover in cases where connections to the primary server time out entirely

Assets 2

27 Mar 22:49

woodlee

v3.1.1

3bc9277

CDC-to-Kafka 3.1.1

Fixes a bug from the prior release that caused snapshot completion recognition to fail when low values in the table PK had been deleted since the last snapshot.

Assets 2

27 Mar 22:06

woodlee

v3.1.0

479e2ed

CDC-to-Kafka 3.1.0

Changes in this version

A new execution option, REPORT_PROGRESS_ONLY, was added. If set, the process starts, prints the table of its current progress against followed tables, and exits without making any changes.
Tracking of snapshot completion was improved. Previously, tables that had a PK that was not monotonically increasing (such as a GUID) would, at process start, begin a new snapshot to pick up any rows with PK values lower than the lowest value seen by a prior invocation's snapshot. Since such rows also appear as inserts in the change data events, this represented an unnecessary duplication and caused confusion. In addition, it was possible that a table that contained no rows when it was first tracked would be snapshotted the next time the process restarted, if rows had been added in the interim. Again this was unnecessary since the added rows would be present in the change data events. Both issues have been fixed.
Cleaned up confusing and overly verbose logging that would be printed when something changed in the tracked capture instances on the SQL Server side.
Fixed a bug that could cause the process to incorrectly believe that there was a coverage gap in the LSNs between old and new capture instances for a given table, particularly for low-change-volume tables.
Upgrade package dependencies and the Python version used for the Docker image.

Assets 2

16 Feb 20:29

woodlee

v3.0.0

5125d3a

CDC-to-Kafka 3.0.0

This release contains breaking changes.

CDC-to-Kafka 3.0.0 brings several dependency upgrades, performance improvements, and expanded SQL type support. It also improves the flexibility and schema management of "unified topics", which can contain change data messages from several different SQL tables produced in a transactionally-consistent order.

Changes:

Upgrades the MS ODBC driver used in the Docker image (Breaking: this means that if you are using a Docker image built from this repo's Dockerfile, your DB connection strings will need to change to use DRIVER=ODBC Driver 18 for SQL Server, and may also need to add TrustServerCertificate=yes;.
Adds support for SQL data types money, smallmoney, datetimeoffset, smalldatetime, xml, rowversion, float, and real (hopefully addressing #17).
Breaking for users of unified topics: Previously, unified-topic messages were wrapped in a top-level object with fields __source_table and __change_data, the latter of which was encoded with a single Avro schema that was a union type of all the tracked tables' schemas. With this release, the top-level wrapping is dropped and messages produced to unified topics are now Avro-encoded with multiple schemas, corresponding to the same per-table schemas that are used for messages in the single-table topics. This change greatly improves performance when unified topics are used, since additional re-serializations of the same change datum are no longer needed. Advances in schema management tooling (e.g. support for new subject naming strategies in the Confluent schema registry) made this a more attractive option. Breaking aspects:
- The schemas of messages in any unified topics will change, and may now vary from message to message. Ensure consumers are prepared for this before switching.
- Configuration params for unified topics have changed. UNIFIED_TOPICS_PARTITION_COUNT and UNIFIED_TOPICS_EXTRA_CONFIG have been dropped as top-level config parameters; instead, these options can now be specified for each unified topic separately within the expanded JSON object expected by parameter UNIFIED_TOPICS (see the help string in cdc_kafka/options.py for details).
- With the removal of the top-level wrapping and its __source_table field, consumers will now need to rely on knowledge of the Avro schema to determine what SQL table a given message corresponds to. The Avro schema name for message values produced by this tool follows format <source_table_schema_name>_<source_table_name>_cdc__value; consumers may need to be prepared to parse this.
~30% maximum throughput increase (and more for those who also produce to unified topics!)
PyPI package dependencies upgraded

Assets 2

29 Sep 21:08

woodlee

v2.2.2

9b0ce2a

CDC-to-Kafka 2.2.2

Fixes a bug whereby columns deleted from the base table but still present on the capture instance would cause snapshot SQL queries to incorrectly refer to the no-longer-extant columns
Upgrades some external dependencies
Improvement tweaks to messaging when running in validation mode
Style fixes

Assets 2

11 Nov 23:11

woodlee

v2.2.1

1de9836

CDC-to-Kafka 2.2.1

Bugfix: Prevent errors when field truncation is configured for a nullable string field

Assets 2

27 Oct 19:31

woodlee

v2.2.0

602c174

CDC-to-Kafka 2.2.0

This release adds automatic creation of unified-messages topics, with strong encouragement to keep them as single-partition topics so that in-order consumption is simplified.

Assets 2

08 Sep 18:15

woodlee

v2.1.2

7bb5542

CDC-to-Kafka 2.1.2

Tries to better handle exceptions like:

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 392-393: illegal UTF-16 surrogate

...when the process encounters data in SQL Server that is not properly UTF-16-encoded.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: woodlee/sqlserver-cdc-to-kafka

CDC-to-Kafka 3.3.1

CDC-to-Kafka 3.3.0

CDC-to-Kafka 3.2.0

CDC-to-Kafka 3.1.1

CDC-to-Kafka 3.1.0

CDC-to-Kafka 3.0.0

CDC-to-Kafka 2.2.2

CDC-to-Kafka 2.2.1

CDC-to-Kafka 2.2.0

CDC-to-Kafka 2.1.2