Skip to content

Releases: pathwaycom/pathway

v0.8.0

01 Feb 14:51
Compare
Choose a tag to compare

Added

  • pw.io.http.rest_connector now supports multiple HTTP request types.
  • pw.io.http.PathwayWebserver now allows Cross-Origin Resource Sharing (CORS) to be enabled on newly added endpoints
  • Wrappers for LiteLLM and HuggingFace chat services and SentenceTransformers embedding service are now added to Pathway xpack for LLMs.

Changed

  • pw.run now includes an additional parameter runtime_typechecking that enables strict type checking at runtime.
  • Embedders in pathway.xpacks.llm.embedders now correctly process empty strings as queries.
  • BREAKING: pw.run and pw.run_all now only accept keyword arguments.

Fixed

  • pw.Duration can now be returned from User-Defined Functions (UDFs) or used as a constant value without resulting in errors.
  • pw.io.debezium.read now correctly handles tables that do not have a primary key.

v0.7.10

26 Jan 16:09
Compare
Choose a tag to compare

Added

  • pw.io.http.rest_connector can now generate Open API 3.0.3 schema that will be returned by the route /_schema.
  • Wrappers for OpenAI Chat and Embedding services are now added to Pathway xpack for LLMs.
  • A vector indexing pipeline that allows querying for the most similar documents. It is available as class VectorStore as part of Pathway xpack for LLMs.

Fixed

  • pw.debug.table_from_markdown now uses schema parameter (when set) to properly assign simple types (int, bool, float, str, bytes) and optional simple types to columns.

v0.7.9

18 Jan 13:40
Compare
Choose a tag to compare

Changed

  • pw.io.http.rest_connector now also accepts port as a string for backwards compatibility.

v0.7.8

18 Jan 11:24
Compare
Choose a tag to compare

Added

  • Support for comparisons of tuples has been added.
  • Standalone versions of methods such as pw.groupby, pw.join, pw.join_inner, pw.join_left, pw.join_right, and pw.join_outer are now available.
  • The abs function from Python can now be used on Pathway expressions.
  • The asof_join method now has configurable temporal behavior. The behavior parameter can be used to pass the configuration.
  • The state of the deduplicate operator can now be persisted.

Changed

  • interval_join can now work with intervals of zero length.
  • The pw.io.http.rest_connector can now open multiple endpoints on the same port using a new pw.io.http.PathwayWebserver class.
  • The pw.xpacks.connectors.sharepoint.read and pw.io.gdrive.read methods now support the size limit for a single object. If set, it will exclude too large files and won't read them.

v0.7.7

27 Dec 14:18
Compare
Choose a tag to compare

Added

  • pathway.xpacks.llm.splitter.TokenCountSplitter.

v0.7.6

22 Dec 13:44
Compare
Choose a tag to compare

New Features

Conversion Methods in pw.Json

  • Introducing new methods for strict conversion of pw.Json to desired types within a UDF body:
    • as_int()
    • as_float()
    • as_str()
    • as_bool()
    • as_list()
    • as_dict()

DateTime Functionality

  • Added table.col.dt.utc_from_timestamp method: Creates DateTimeUtc from timestamps represented as ints or floats.
  • Enhanced the table.col.dt.timestamp method with a new unit argument to specify the unit of the returned timestamp.

Experimental Features

  • Introduced an experimental xpack with a Microsoft SharePoint input connector.

Enhancements

Improved JSON Handling

  • Index operator ([]) can now be directly applied to pw.Json within UDFs to access elements of JSON objects, arrays, and strings.

Expanded Timestamp Functionality

  • Enhanced the table.col.dt.from_timestamp method to create DateTimeNaive from timestamps represented as ints or floats.
  • Deprecated not specifying the unit argument of the table.col.dt.timestamp method.

KNNIndex Enhancements

  • KNNIndex now supports returning computed distances.
  • Added support for cosine similarity in KNNIndex.

Deprecated Features

  • The offset argument of pw.stdlib.temporal.sliding and pw.stdlib.temporal.tumbling is deprecated. Use origin instead, as it represents a point in time, not a duration.

Bug Fixes

DateTime Fixes

  • Sliding window now works correctly with UTC Datetimes.

asof_join Improvements

  • Temporal column in asof_join no longer has to be named t.
  • asof_join includes rows with equal times for all values of the direction parameter.

Fixed Issues

  • Fixed an issue with pw.io.gdrive.read: Shared folders support is now working seamlessly.

v0.7.5

15 Dec 22:25
Compare
Choose a tag to compare

Added

  • Added Table.split() method for splitting table based on an expression into two tables.
  • Columns with datatype duration can now be multiplied and divided by floats.
  • Columns with datatype duration now support both true and floor division (/ and //) by integers.

Changed

  • Pathway is better at typing if_else expressions when optional types are involved.
  • table.flatten() operator now supports Json array.
  • Buffers (used to delay outputs, configured via delay in common_behavior) now flush the data when the computation is finished. The effect of this change can be seen when run in bounded (batch / multi-revision) mode.
  • pw.io.subscribe() takes additional argument on_time_end - the callback function to be called on each closed time of computation.
  • pw.io.subscribe() is now a single-worker operator, guaranteeing that on_end is triggered at most once.
  • KNNIndex supports now metadata filtering. Each query can specify it's own filter in the JMESPath format.

Fixed

  • Resolved an optimization bug causing pw.iterate to malfunction when handling columns effectively pointing to the same data.

v0.7.4

05 Dec 23:10
Compare
Choose a tag to compare

Fixed

  • Fixed issues with standalone panel+Bokeh dashboards to ensure optimal functionality and performance.

v0.7.3

30 Nov 11:49
Compare
Choose a tag to compare

Added

  • A method weekday has been added to the dt namespace, that can be called on column expressions containing datetime data. This method returns an integer that represents the day of the week.
  • EXPERIMENTAL: Methods show and plot on Tables, providing visualizations of data using HoloViz Panel.
  • Added support for instance parameter to groupby, join, windowby and temporal join methods.
  • pw.PersistenceMode.UDF_CACHING persistence mode enabling automatic caching of AsyncTransformer invocations.

Changed

  • Methods round and floor on columns with datetimes now accept duration argument to be a string.
  • pw.debug.compute_and_print and pw.debug.compute_and_print_update_stream have a new argument n_rows that limits the number of rows printed.
  • pw.debug.table_to_pandas has a new argument include_id (by default True). If set to False, creates a new index for the Pandas DataFrame, rather than using the keys of the Pathway Table.
  • windowby function shard argument is now deprecated and instance should be used.
  • Special column name _pw_shard is now deprecated, and _pw_instance should be used.
  • pw.ReplayMode now can be accessed as pw.PersistenceMode, while the SPEEDRUN and REALTIME variants are now accessible as SPEEDRUN_REPLAY and REALTIME_REPLAY.
  • EXPERIMENTAL: pw.io.gdrive.read has a new argument with_metadata (by default False). If set to True, adds a _metadata column containing file metadata to the resulting table.
  • Methods get_nearest_items and get_nearest_items_asof_now of KNNIndex allow to specify k (number of returned elements) separately in each query.

v0.7.2

24 Nov 12:43
Compare
Choose a tag to compare

Added

  • Added ability of creating custom reducers using pw.reducers.udf_reducer decorator. Use pw.BaseCustomAccumulator as a base class
    for creating accumulators. Decorating accumulator returns reducer following custom logic.
  • A function pw.debug.compute_and_print_update_stream that computes and prints the update stream of the table.
  • SQLite input connector (pw.io.sqlite).

Changed

  • pw.debug.parse_to_table is now deprecated, pw.debug.table_from_markdown should be used instead.
  • pw.schema_from_csv now has quote and double_quote_escapes arguments.

Fixed

  • Schema returned from pw.schema_from_csv will have quotes removed from column names, so it will now work properly with pw.io.csv.read.