Skip to content

⚡️ v2.8.2 Add batches to `verify pipes`, allow for multiple instances from WebAPI, and memory improvements.

Compare
Choose a tag to compare
@bmeares bmeares released this 17 Jan 04:23
· 1 commit to main since this release
0b0a9ab

v2.8.0 – v2.8.2

  • Add batches to Pipe.verify().
    Verification syncs now run in sequential batches so that they may be interrupted and resumed. See Pipe.get_chunk_bounds_batches() for more information:

    from datetime import timedelta
    import meerschaum as mrsm
    
    pipe = mrsm.Pipe('demo', 'get_chunk_bounds', instance='sql:local')
    bounds = pipe.get_chunk_bounds(
        chunk_interval=timedelta(hours=10),
        begin='2025-01-10',
        end='2025-01-15',
        bounded=True,
    )
    batches = pipe.get_chunk_bounds_batches(bounds, workers=4)
    mrsm.pprint(
        [
            tuple(
                (str(bounds[0]), str(bounds[1]))
                for bounds in batch
            )
            for batch in batches
        ]
    ) 
    # [
    #     (
    #         ('2025-01-10 00:00:00+00:00', '2025-01-10 10:00:00+00:00'),
    #         ('2025-01-10 10:00:00+00:00', '2025-01-10 20:00:00+00:00'),
    #         ('2025-01-10 20:00:00+00:00', '2025-01-11 06:00:00+00:00'),
    #         ('2025-01-11 06:00:00+00:00', '2025-01-11 16:00:00+00:00')
    #     ),
    #     (
    #         ('2025-01-11 16:00:00+00:00', '2025-01-12 02:00:00+00:00'),
    #         ('2025-01-12 02:00:00+00:00', '2025-01-12 12:00:00+00:00'),
    #         ('2025-01-12 12:00:00+00:00', '2025-01-12 22:00:00+00:00'),
    #         ('2025-01-12 22:00:00+00:00', '2025-01-13 08:00:00+00:00')
    #     ),
    #     (
    #         ('2025-01-13 08:00:00+00:00', '2025-01-13 18:00:00+00:00'),
    #         ('2025-01-13 18:00:00+00:00', '2025-01-14 04:00:00+00:00'),
    #         ('2025-01-14 04:00:00+00:00', '2025-01-14 14:00:00+00:00'),
    #         ('2025-01-14 14:00:00+00:00', '2025-01-15 00:00:00+00:00')
    #     )
    # ]
  • Add --skip-chunks-with-greater-rowcounts to verify pipes.
    The flag --skip-chunks-with-greater-rowcounts will compare a chunk's rowcount with the rowcount of the remote table and skip if the chunk is greater than or equal to the remote count. This is only applicable for connectors which implement remote=True support for get_sync_time().

  • Add verify rowcounts.
    The action verify rowcounts (same as passing --check-rowcounts-only to verify pipes) will compare row-counts for a pipe's chunks against remote rowcounts. This is only applicable for connectors which implement get_pipe_rowcount() with support for remote=True.

  • Add remote to pipe.get_sync_time().
    For pipes which support it (i.e. the SQLConnector), the option remote is intended to return the sync time of a pipe's fetch definition, like the option remote in Pipe.get_rowcount().

  • Allow for the Web API to serve pipes from multiple instances.
    You can disable this behavior by setting system:api:permissions:instances:allow_multiple_instances to false. You may also explicitly allow which instances may be accessed by the WebAPI by setting the list system:api:permissions:instances:allowed_instance_keys (defaults to ["*"]).

  • Fix memory leak for retrying failed chunks.
    Failed chunks were kept in memory and retried later. In resource-intensive syncs with large chunks and high failures, this would result in large objects not being freed and hogging memory. This situation has been fixed.

  • Add negation to job actions.
    Prefix a job name with an underscore to select all other jobs. This is useful for filtering out noise for show logs.

  • Add Pipe.parent.
    As a quality-of-life improvement, the attribute Pipe.parent will return the first member of Pipe.parents (if available).

  • Use the current instance for new tabs in the Webterm.
    Clicking "New Tab" will open a new tmux window using the currently selected instance on the Web Console.

  • Other webterm quality-of-life improvements.
    Added a size toggle button to allow for the webterm to take the entire page.

  • Additional refactoring work.
    The API endpoints code has been cleaned up.

  • Added system configurations.
    New options have been added to the system configuration, such as max_response_row_limit, allow_multiple_instances, allowed_instance_keys.