This release consists of 263 commits from 64 contributors. See credits at the end of this changelog for more information.
Breaking changes:
- Convert
StringAgg
to UDAF #10945 (lewiszlw) - Convert
bool_and
&bool_or
to UDAF #11009 (jcsherin) - Convert Average to UDAF #10942 #10964 (dharanad)
- fix: remove the Sized requirement on ExecutionPlan::name() #11047 (waynexia)
- Return
&Arc
reference to inner trait object #11103 (linhr) - Support COPY TO Externally Defined File Formats, add FileType trait #11060 (devinjdangelo)
- expose table name in proto extension codec #11139 (leoyvens)
- fix(typo): unqualifed to unqualified #11159 (waynexia)
- Consolidate
Filter::remove_aliases
intoExpr::unalias_nested
#11001 (alamb) - Convert
nth_value
to UDAF #11287 (jcsherin)
Implemented enhancements:
- feat: Add support for Int8 and Int16 data types in data page statistics #10931 (Weijun-H)
- feat: add CliSessionContext trait for cli #10890 (tshauck)
- feat(optimizer): handle partial anchored regex cases and improve doc #10977 (waynexia)
- feat: support uint data page extraction #11018 (tshauck)
- feat: propagate EmptyRelation for more join types #10963 (tshauck)
- feat: Add method to add analyzer rules to SessionContext #10849 (pingsutw)
- feat: Support duplicate column names in Joins in Substrait consumer #11049 (Blizzara)
- feat: Add support for Timestamp data types in data page statistics. #11123 (efredine)
- feat: Add support for
Binary
/LargeBinary
/Utf8
/LargeUtf8
data types in data page statistics #11136 (PsiACE) - feat: Support Map type in Substrait conversions #11129 (Blizzara)
- feat: Conditionally allow to keep partition_by columns when using PARTITIONED BY enhancement #11107 (hveiga)
- feat: enable "substring" as a UDF in addition to "substr" #11277 (Blizzara)
Fixed bugs:
- fix: use total ordering in the min & max accumulator for floats #10627 (westonpace)
- fix: Support double quotes in
date_part
#10833 (Weijun-H) - fix: Ignore nullability of list elements when consuming Substrait #10874 (Blizzara)
- fix: Support
NOT <field> IN (<subquery>)
via anti join #10936 (akoshchiy) - fix: CTEs defined in a subquery can escape their scope #10954 (jonahgao)
- fix: Fix the incorrect null joined rows for SMJ outer join with join filter #10892 (viirya)
- fix: gcd returns negative results #11099 (jonahgao)
- fix: LCM panicked due to overflow #11131 (jonahgao)
- fix: Support dictionary type in parquet metadata statistics. #11169 (efredine)
- fix: Ignore nullability in Substrait structs #11130 (Blizzara)
- fix: typo in comment about FinalPhysicalPlan #11181 (c8ef)
- fix: Support Substrait's compound names also for window functions #11163 (Blizzara)
- fix: Incorrect LEFT JOIN evaluation result on OR conditions #11203 (viirya)
- fix: Be more lenient in interpreting input args for builtin window functions #11199 (Blizzara)
- fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) #11278 (Blizzara)
- fix: When consuming Substrait, temporarily rename clashing duplicate columns #11329 (Blizzara)
Documentation updates:
- Minor: Clarify
SessionContext::state
docs #10847 (alamb) - Minor: Update SIGMOD paper reference url #10860 (alamb)
- docs(variance): Correct typos in comments #10844 (pingsutw)
- Add missing code close tick in LiteralGuarantee docs #10859 (adriangb)
- Minor: Add more docs and examples for
Transformed
andTransformedResult
#11003 (alamb) - doc: Update links in the documantation #11044 (Weijun-H)
- Minor: Examples cleanup + more docs in pruning example #11086 (alamb)
- Minor: refine documentation pointing to examples #11110 (alamb)
- Fix running in Docker instructions #11141 (findepi)
- docs: add example for custom file format with
COPY TO
#11174 (tshauck) - Fix docs wordings #11226 (findepi)
- Fix count() docs around including null values #11293 (findepi)
Other:
- chore: Prepare 39.0.0-rc1 #10828 (andygrove)
- Remove expr_fn::sum and replace them with function stub #10816 (jayzhan211)
- Debug print as many fields as possible for
SessionState
#10818 (lewiszlw) - Prune Parquet RowGroup in a single call to
PruningPredicate::prune
, update StatisticsExtractor API #10802 (alamb) - Remove Built-in sum and Rename to lowercase
sum
#10831 (jayzhan211) - Convert
stddev
andstddev_pop
to UDAF #10834 (goldmedal) - Introduce expr builder for aggregate function #10560 (jayzhan211)
- chore: Improve change log generator #10841 (andygrove)
- Support user defined
ParquetAccessPlan
inParquetExec
, validation toParquetAccessPlan::select
#10813 (alamb) - Convert
VariancePopulation
to UDAF #10836 (mknaw) - Convert
approx_median
to UDAF #10840 (goldmedal) - MINOR: use workspace deps in proto-common (upgrade object store dependency) #10848 (waynexia)
- Minor: add
Window::try_new_with_schema
constructor #10850 (sadboy) - Add support for reading CSV files with comments #10467 (bbannier)
- Convert approx_distinct to UDAF #10851 (Lordworms)
- minor: add proto-common crate to release instructions #10858 (andygrove)
- Implement TPCH substrait integration teset, support tpch_1 #10842 (Lordworms)
- Remove unecessary passing around of
suffix: &str
inpruning.rs
'sRequiredColumns
#10863 (adriangb) - chore: Make DFSchema::datatype_is_logically_equal function public #10867 (advancedxy)
- Bump braces from 3.0.2 to 3.0.3 in /datafusion/wasmtest/datafusion-wasm-app #10865 (dependabot[bot])
- Docs: Add
unnest
to SQL Reference #10839 (gloomweaver) - Support correct output column names and struct field names when consuming/producing Substrait #10829 (Blizzara)
- Make Logical Plans more readable by removing extra aliases #10832 (MohamedAbdeen21)
- Minor: Improve
ListingTable
documentation #10854 (alamb) - Extending join fuzz tests to support join filtering #10728 (edmondop)
- replace and(, not()) with and_not(*) #10885 (RTEnzyme)
- Disabling test for semi join with filters #10887 (edmondop)
- Minor: Update
min_statistics
andmax_statistics
to be helpers, update docs #10866 (alamb) - Remove
Interval
column test // parquet extraction #10888 (marvinlanhenke) - Minor: SMJ fuzz tests fix for rowcounts #10891 (comphead)
- Move
Count
tofunctions-aggregate
, update MSRV to rust 1.75 #10484 (jayzhan211) - refactor: fetch statistics for a given ParquetMetaData #10880 (NGA-TRAN)
- Move FileSinkExec::metrics to the correct place #10901 (joroKr21)
- Refine ParquetAccessPlan comments and tests #10896 (alamb)
- ci: fix clippy failures on main #10903 (jonahgao)
- Minor: disable flaky fuzz test #10904 (comphead)
- Remove builtin count #10893 (jayzhan211)
- Move Regr_* functions to use UDAF #10898 (eejbyfeldt)
- Docs: clarify when the parquet reader will read from object store when using cached metadata #10909 (alamb)
- Minor: Fix
bench.sh tpch data
#10905 (alamb) - Minor: use venv in benchmark compare #10894 (tmi)
- Support explicit type and name during table creation #10273 (duongcongtoai)
- Simplify Join Partition Rules #10911 (berkaysynnada)
- Move
Literal
tophysical-expr-common
#10910 (lewiszlw) - chore: update some error messages for clarity #10916 (jeffreyssmith2nd)
- Initial Extract parquet data page statistics API #10852 (marvinlanhenke)
- Add contains function, and support in datafusion substrait consumer #10879 (Lordworms)
- Minor: Improve
arrow_statistics
tests #10927 (alamb) - Minor: Remove
prefer_hash_join
env variable for clickbench #10933 (jayzhan211) - Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF #10917 (goldmedal)
- refactor: remove extra default in max rows #10941 (tshauck)
- chore: Improve performance of Parquet statistics conversion #10932 (Weijun-H)
- Add catalog::resolve_table_references #10876 (leoyvens)
- Convert BitAnd, BitOr, BitXor to UDAF #10930 (dharanad)
- refactor: improve PoolType argument handling for CLI #10940 (tshauck)
- Minor: remove potential string copy from Column::from_qualified_name #10947 (alamb)
- Fix: StatisticsConverter
counts
for missing columns #10946 (marvinlanhenke) - Add initial support for Utf8View and BinaryView types #10925 (XiangpengHao)
- Use shorter aliases in CSE #10939 (peter-toth)
- Substrait support for ParquetExec round trip for simple select #10949 (xinlifoobar)
- Support to unparse
ScalarValue::IntervalMonthDayNano
to String #10956 (goldmedal) - Minor: Return option from row_group_row_count #10973 (marvinlanhenke)
- Minor: Add routine to debug join fuzz tests #10970 (comphead)
- Support to unparse
ScalarValue::TimestampNanosecond
to String #10984 (goldmedal) - build(deps-dev): bump ws from 8.14.2 to 8.17.1 in /datafusion/wasmtest/datafusion-wasm-app #10988 (dependabot[bot])
- Minor: reuse Rows buffer in GroupValuesRows #10980 (alamb)
- Add example for writing SQL analysis using DataFusion structures #10938 (LorrensP-2158466)
- Push down filter for Unnest plan #10974 (jayzhan211)
- Add parquet page stats for float{16, 32, 64} #10982 (tmi)
- Fix
file_stream_provider
example compilation failure on windows #10975 (lewiszlw) - Stop copying LogicalPlan and Exprs in
CommonSubexprEliminate
(2-3% planning speed improvement) #10835 (alamb) - chore: Update documentation link in
PhysicalOptimizerRule
comment #11002 (Weijun-H) - Push down filter plan for unnest on non-unnest column only #10991 (jayzhan211)
- Minor: add test for pushdown past unnest #11017 (alamb)
- Update docs for
protoc
minimum installed version #11006 (jcsherin) - propagate error instead of panicking on out of bounds in physical-expr/src/analysis.rs #10992 (LorrensP-2158466)
- Add drop_columns to dataframe api #11010 (Omega359)
- Push down filter plan for non-unnest column #11019 (jayzhan211)
- Consider timezones with
UTC
and+00:00
to be the same #10960 (marvinlanhenke) - Deprecate
OptimizerRule::try_optimize
#11022 (lewiszlw) - Relax combine partial final rule #10913 (mustafasrepo)
- Compute gcd with u64 instead of i64 because of overflows #11036 (LorrensP-2158466)
- Add distinct_on to dataframe api #11012 (Omega359)
- chore: add test to show current behavior of
AT TIME ZONE
for string vs. timestamp #11056 (appletreeisyellow) - Boolean parquet get datapage stat #11054 (LorrensP-2158466)
- Using display_name for Expr::Aggregation #11020 (Lordworms)
- Minor: Convert
Count
's name to lowercase #11028 (jayzhan211) - Minor: Move
function::Hint
todatafusion-expr
crate to avoid physical-expr dependency fordatafusion-function
crate #11061 (jayzhan211) - Support to unparse ScalarValue::TimestampMillisecond to String #11046 (pingsutw)
- Support to unparse IntervalYearMonth and IntervalDayTime to String #11065 (goldmedal)
- SMJ: fix streaming row concurrency issue for LEFT SEMI filtered join #11041 (comphead)
- Add
advanced_parquet_index.rs
example of index in into parquet files #10701 (alamb) - Add Expr::column_refs to find column references without copying #10948 (alamb)
- Give
OptimizerRule::try_optimize
default implementation and cleanup duplicated custom implementations #11059 (lewiszlw) - Fix
FormatOptions::CSV
propagation #10912 (svranesevic) - Support parsing SQL strings to Exprs #10995 (xinlifoobar)
- Support dictionary data type in array_to_string #10908 (EduardoVega)
- Implement min/max for interval types #11015 (maxburke)
- Improve LIKE performance for Dictionary arrays #11058 (Lordworms)
- handle overflow in gcd and return this as an error #11057 (LorrensP-2158466)
- Convert Correlation to UDAF #11064 (pingsutw)
- Migrate more code from
Expr::to_columns
toExpr::column_refs
#11067 (alamb) - decimal support for unparser #11092 (y-f-u)
- Improve
CommonSubexprEliminate
identifier management (10% faster planning) #10473 (peter-toth) - Change wildcard qualifier type from
String
toTableReference
#11073 (linhr) - Allow access to UDTF in
SessionContext
#11071 (linhr) - Strip table qualifiers from schema in
UNION ALL
for unparser #11082 (phillipleblanc) - Update ListingTable to use StatisticsConverter #11068 (xinlifoobar)
- to_timestamp functions should preserve timezone #11038 (maxburke)
- Rewrite array operator to function in parser #11101 (jayzhan211)
- Resolve empty relation opt for join types #11066 (LorrensP-2158466)
- Add composed extension codec example #11095 (lewiszlw)
- Minor: Avoid some repetition in to_timestamp #11116 (alamb)
- Minor: fix ScalarValue::new_ten error message (cites one not ten) #11126 (gstvg)
- Deprecate Expr::column_refs #11115 (alamb)
- Overflow in negate operator #11084 (LorrensP-2158466)
- Minor: Add Architectural Goals to the docs #11109 (alamb)
- Fix overflow in pow #11124 (LorrensP-2158466)
- Support to unparse Time scalar value to String #11121 (goldmedal)
- Support to unparse
TimestampSecond
andTimestampMicrosecond
to String #11120 (goldmedal) - Add standalone example for
OptimizerRule
#11087 (alamb) - Fix overflow in factorial #11134 (LorrensP-2158466)
- Temporary Fix: Query error when grouping by case expressions #11133 (jonahgao)
- Fix nullability of return value of array_agg #11093 (eejbyfeldt)
- Support filter for List #11091 (jayzhan211)
- [MINOR]: Fix some minor silent bugs #11127 (mustafasrepo)
- Minor Fix for Logical and Physical Expr Conversions #11142 (berkaysynnada)
- Support Date Parquet Data Page Statistics #11135 (dharanad)
- fix flaky array query slt test #11140 (leoyvens)
- Support Decimal and Decimal256 Parquet Data Page Statistics #11138 (Lordworms)
- Implement comparisons on nested data types such that distinct/except would work #11117 (rtyler)
- Minor: dont panic with bad arguments to round #10899 (tmi)
- Minor: reduce replication for nested comparison #11149 (alamb)
- [Minor]: Remove datafusion-functions-aggregate dependency from physical-expr crate #11158 (mustafasrepo)
- adding config to control Varchar behavior #11090 (Lordworms)
- minor: consolidate
gcd
related tests #11164 (jonahgao) - Minor: move batch spilling methods to
lib.rs
to make it reusable #11154 (comphead) - Move schema projection to where it's used in ListingTable #11167 (adriangb)
- Make running in docker instruction be copy-pastable #11148 (findepi)
- Rewrite
array @> array
andarray <@ array
in sql_expr_to_logical_expr #11155 (jayzhan211) - Minor: make some physical_optimizer rules public #11171 (askalt)
- Remove pr_benchmarks.yml #11165 (alamb)
- Optionally display schema in explain plan #11177 (alamb)
- Minor: Add more support for ScalarValue::Float16 #11156 (Lordworms)
- Minor: fix SQLOptions::with_allow_ddl comments #11166 (alamb)
- Update sqllogictest requirement from 0.20.0 to 0.21.0 #11189 (dependabot[bot])
- Support Time Parquet Data Page Statistics #11187 (dharanad)
- Adds support for Dictionary data type statistics from parquet data pages. #11195 (efredine)
- [Minor]: Make sort_batch public #11191 (mustafasrepo)
- Introduce user defined SQL planner API #11180 (jayzhan211)
- Covert grouping to udaf #11147 (Rachelint)
- Make statistics_from_parquet_meta a sync function #11205 (adriangb)
- Allow user defined SQL planners to be registered #11208 (samuelcolvin)
- Recursive
unnest
#11062 (duongcongtoai) - Document how to test examples in user guide, add some more coverage #11178 (alamb)
- Minor: Move MemoryCatalog*Provider into a module, improve comments #11183 (alamb)
- Add standalone example of using the SQL frontend #11088 (alamb)
- Add Optimizer Sanity Checker, improve sortedness equivalence properties #11196 (mustafasrepo)
- Implement user defined planner for extract #11215 (xinlifoobar)
- Move basic SQL query examples to user guide #11217 (alamb)
- Support FixedSizedBinaryArray Parquet Data Page Statistics #11200 (dharanad)
- Implement ScalarValue::Map #11224 (goldmedal)
- Remove unmaintained python pre-commit configuration #11255 (findepi)
- Enable
clone_on_ref_ptr
clippy lint on execution crate #11239 (lewiszlw) - Minor: Improve documentation about pushdown join predicates #11209 (alamb)
- Minor: clean up data page statistics tests and fix bugs #11236 (efredine)
- Replacing pattern matching through downcast with trait method #11257 (edmondop)
- Update substrait requirement from 0.34.0 to 0.35.0 #11206 (dependabot[bot])
- Enhance short circuit handling in
CommonSubexprEliminate
#11197 (peter-toth) - Add bench for data page statistics parquet extraction #10950 (marvinlanhenke)
- Register SQL planners in
SessionState
constructor #11253 (dharanad) - Support DuckDB style struct syntax #11214 (jayzhan211)
- Enable
clone_on_ref_ptr
clippy lint on expr crate #11238 (lewiszlw) - Optimize PushDownFilter to avoid recreating schema columns #11211 (alamb)
- Remove outdated
rewrite_expr.rs
example #11085 (alamb) - Implement TPCH substrait integration teset, support tpch_2 #11234 (Lordworms)
- Enable
clone_on_ref_ptr
clippy lint on physical-expr crate #11240 (lewiszlw) - Add standalone
AnalyzerRule
example that implements row level access control #11089 (alamb) - Replace println! with assert! if possible in DataFusion examples #11237 (Nishi46)
- minor: format
Expr::get_type()
#11267 (jonahgao) - Fix hash join for nested types #11232 (eejbyfeldt)
- Infer count() aggregation is not null #11256 (findepi)
- Remove unnecessary qualified names #11292 (findepi)
- Fix running examples readme #11225 (findepi)
- Minor: Add
ConstExpr::from
and use in physical optimizer #11283 (alamb) - Implement TPCH substrait integration teset, support tpch_3 #11298 (Lordworms)
- Implement user defined planner for position #11243 (xinlifoobar)
- Upgrade to arrow 52.1.0 (and fix clippy issues on main) #11302 (alamb)
- AggregateExec: Take grouping sets into account for InputOrderMode #11301 (thinkharderdev)
- Add user_defined_sql_planners(..) to FunctionRegistry #11296 (Omega359)
- use safe cast in propagate_constraints #11297 (Lordworms)
- Minor: Remove clone in optimizer #11315 (jayzhan211)
- minor: Add
PhysicalSortExpr::new
#11310 (andygrove) - Fix data page statistics when all rows are null in a data page #11295 (efredine)
- Made UserDefinedFunctionPlanner to uniform the usages #11318 (xinlifoobar)
- Implement user defined planner for
create_struct
&create_named_struct
#11273 (dharanad) - Improve stats convert performance for Binary/String/Boolean arrays #11319 (Rachelint)
- Fix typos in datafusion-examples/datafusion-cli/docs #11259 (lewiszlw)
- Minor: Fix Failing TPC-DS Test #11331 (berkaysynnada)
- HashJoin can preserve the right ordering when join type is Right #11276 (berkaysynnada)
- Update substrait requirement from 0.35.0 to 0.36.0 #11328 (dependabot[bot])
- Support to uparse logical plans with timestamp cast to string #11326 (sgrebnov)
- Implement user defined planner for sql_substring_to_expr #11327 (xinlifoobar)
- Improve volatile expression handling in
CommonSubexprEliminate
#11265 (peter-toth) - Support
IS NULL
andIS NOT NULL
on Unions #11321 (samuelcolvin) - Implement TPCH substrait integration test, support tpch_4 and tpch_5 #11311 (Lordworms)
- Enable
clone_on_ref_ptr
clippy lint on physical-plan crate #11241 (lewiszlw) - Remove any aliases in
Filter::try_new
rather than erroring #11307 (samuelcolvin) - Improve
DataFrame
Users Guide #11324 (alamb) - chore: Rename UserDefinedSQLPlanner to ExprPlanner #11338 (andygrove)
- Revert "remove
derive(Copy)
fromOperator
(#11132)" #11341 (alamb)
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
41 Andrew Lamb
17 Jay Zhan
12 Lordworms
12 张林伟
10 Arttu
9 Jax Liu
9 Lorrens Pantelis
8 Piotr Findeisen
7 Dharan Aditya
7 Jonah Gao
7 Xin Li
6 Andy Grove
6 Marvin Lanhenke
6 Trent Hauck
5 Alex Huang
5 Eric Fredine
5 Mustafa Akur
5 Oleks V
5 dependabot[bot]
4 Adrian Garcia Badaracco
4 Berkay Şahin
4 Kevin Su
4 Peter Toth
4 Ruihang Xia
4 Samuel Colvin
3 Bruce Ritchie
3 Edmondo Porcu
3 Emil Ejbyfeldt
3 Heran Lin
3 Leonardo Yvens
3 jcsherin
3 tmi
2 Duong Cong Toai
2 Liang-Chi Hsieh
2 Max Burke
2 kamille
1 Albert Skalt
1 Andrey Koshchiy
1 Benjamin Bannier
1 Bo Lin
1 Chojan Shang
1 Chunchun Ye
1 Dan Harris
1 Devin D'Angelo
1 Eduardo Vega
1 Georgi Krastev
1 Hector Veiga
1 Jeffrey Smith II
1 Kirill Khramkov
1 Matt Nawara
1 Mohamed Abdeen
1 Nga Tran
1 Nishi
1 Phillip LeBlanc
1 R. Tyler Croy
1 RT_Enzyme
1 Sava Vranešević
1 Sergei Grebnov
1 Weston Pace
1 Xiangpeng Hao
1 advancedxy
1 c8ef
1 gstvg
1 yfu
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.