Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.8.0
Added
pw.io.http.rest_connector
now supports multiple HTTP request types.pw.io.http.PathwayWebserver
now allows Cross-Origin Resource Sharing (CORS) to be enabled on newly added endpoints- Wrappers for LiteLLM and HuggingFace chat services and SentenceTransformers embedding service are now added to Pathway xpack for LLMs.
Changed
pw.run
now includes an additional parameterruntime_typechecking
that enables strict type checking at runtime.- Embedders in pathway.xpacks.llm.embedders now correctly process empty strings as queries.
- BREAKING:
pw.run
andpw.run_all
now only accept keyword arguments.
Fixed
pw.Duration
can now be returned from User-Defined Functions (UDFs) or used as a constant value without resulting in errors.pw.io.debezium.read
now correctly handles tables that do not have a primary key.
v0.7.10
Added
pw.io.http.rest_connector
can now generate Open API 3.0.3 schema that will be returned by the route/_schema
.- Wrappers for OpenAI Chat and Embedding services are now added to Pathway xpack for LLMs.
- A vector indexing pipeline that allows querying for the most similar documents. It is available as class
VectorStore
as part of Pathway xpack for LLMs.
Fixed
pw.debug.table_from_markdown
now uses schema parameter (when set) to properly assign simple types (int, bool, float, str, bytes
) and optional simple types to columns.
v0.7.9
Changed
pw.io.http.rest_connector
now also accepts port as a string for backwards compatibility.
v0.7.8
Added
- Support for comparisons of tuples has been added.
- Standalone versions of methods such as
pw.groupby
,pw.join
,pw.join_inner
,pw.join_left
,pw.join_right
, andpw.join_outer
are now available. - The
abs
function from Python can now be used on Pathway expressions. - The
asof_join
method now has configurable temporal behavior. Thebehavior
parameter can be used to pass the configuration. - The state of the
deduplicate
operator can now be persisted.
Changed
interval_join
can now work with intervals of zero length.- The
pw.io.http.rest_connector
can now open multiple endpoints on the same port using a newpw.io.http.PathwayWebserver
class. - The
pw.xpacks.connectors.sharepoint.read
andpw.io.gdrive.read
methods now support the size limit for a single object. If set, it will exclude too large files and won't read them.
v0.7.7
Added
- pathway.xpacks.llm.splitter.TokenCountSplitter.
v0.7.6
New Features
Conversion Methods in pw.Json
- Introducing new methods for strict conversion of
pw.Json
to desired types within a UDF body:as_int()
as_float()
as_str()
as_bool()
as_list()
as_dict()
DateTime Functionality
- Added
table.col.dt.utc_from_timestamp
method: CreatesDateTimeUtc
from timestamps represented asint
s orfloat
s. - Enhanced the
table.col.dt.timestamp
method with a newunit
argument to specify the unit of the returned timestamp.
Experimental Features
- Introduced an experimental xpack with a Microsoft SharePoint input connector.
Enhancements
Improved JSON Handling
- Index operator (
[]
) can now be directly applied topw.Json
within UDFs to access elements of JSON objects, arrays, and strings.
Expanded Timestamp Functionality
- Enhanced the
table.col.dt.from_timestamp
method to createDateTimeNaive
from timestamps represented asint
s orfloat
s. - Deprecated not specifying the
unit
argument of thetable.col.dt.timestamp
method.
KNNIndex Enhancements
KNNIndex
now supports returning computed distances.- Added support for cosine similarity in
KNNIndex
.
Deprecated Features
- The
offset
argument ofpw.stdlib.temporal.sliding
andpw.stdlib.temporal.tumbling
is deprecated. Useorigin
instead, as it represents a point in time, not a duration.
Bug Fixes
DateTime Fixes
- Sliding window now works correctly with UTC Datetimes.
asof_join
Improvements
- Temporal column in
asof_join
no longer has to be namedt
. asof_join
includes rows with equal times for all values of thedirection
parameter.
Fixed Issues
- Fixed an issue with
pw.io.gdrive.read
: Shared folders support is now working seamlessly.
v0.7.5
Added
- Added Table.split() method for splitting table based on an expression into two tables.
- Columns with datatype duration can now be multiplied and divided by floats.
- Columns with datatype duration now support both true and floor division (
/
and//
) by integers.
Changed
- Pathway is better at typing if_else expressions when optional types are involved.
table.flatten()
operator now supports Json array.- Buffers (used to delay outputs, configured via delay in
common_behavior
) now flush the data when the computation is finished. The effect of this change can be seen when run in bounded (batch / multi-revision) mode. pw.io.subscribe()
takes additional argumenton_time_end
- the callback function to be called on each closed time of computation.pw.io.subscribe()
is now a single-worker operator, guaranteeing thaton_end
is triggered at most once.KNNIndex
supports now metadata filtering. Each query can specify it's own filter in the JMESPath format.
Fixed
- Resolved an optimization bug causing
pw.iterate
to malfunction when handling columns effectively pointing to the same data.
v0.7.4
Fixed
- Fixed issues with standalone panel+Bokeh dashboards to ensure optimal functionality and performance.
v0.7.3
Added
- A method
weekday
has been added to thedt
namespace, that can be called on column expressions containing datetime data. This method returns an integer that represents the day of the week. - EXPERIMENTAL: Methods
show
andplot
on Tables, providing visualizations of data using HoloViz Panel. - Added support for
instance
parameter togroupby
,join
,windowby
and temporal join methods. pw.PersistenceMode.UDF_CACHING
persistence mode enabling automatic caching ofAsyncTransformer
invocations.
Changed
- Methods
round
andfloor
on columns with datetimes now accept duration argument to be a string. pw.debug.compute_and_print
andpw.debug.compute_and_print_update_stream
have a new argumentn_rows
that limits the number of rows printed.pw.debug.table_to_pandas
has a new argumentinclude_id
(by defaultTrue
). If set toFalse
, creates a new index for the Pandas DataFrame, rather than using the keys of the Pathway Table.windowby
functionshard
argument is now deprecated andinstance
should be used.- Special column name
_pw_shard
is now deprecated, and_pw_instance
should be used. pw.ReplayMode
now can be accessed aspw.PersistenceMode
, while theSPEEDRUN
andREALTIME
variants are now accessible asSPEEDRUN_REPLAY
andREALTIME_REPLAY
.- EXPERIMENTAL:
pw.io.gdrive.read
has a new argumentwith_metadata
(by defaultFalse
). If set toTrue
, adds a_metadata
column containing file metadata to the resulting table. - Methods
get_nearest_items
andget_nearest_items_asof_now
ofKNNIndex
allow to specifyk
(number of returned elements) separately in each query.
v0.7.2
Added
- Added ability of creating custom reducers using
pw.reducers.udf_reducer
decorator. Usepw.BaseCustomAccumulator
as a base class
for creating accumulators. Decorating accumulator returns reducer following custom logic. - A function
pw.debug.compute_and_print_update_stream
that computes and prints the update stream of the table. - SQLite input connector (
pw.io.sqlite
).
Changed
pw.debug.parse_to_table
is now deprecated,pw.debug.table_from_markdown
should be used instead.pw.schema_from_csv
now hasquote
anddouble_quote_escapes
arguments.
Fixed
- Schema returned from
pw.schema_from_csv
will have quotes removed from column names, so it will now work properly withpw.io.csv.read
.