This preview release of DuckDB is named "Oxyura" after the White-headed duck (Oxyura leucocephala) which is an endangered species native to Eurasia.
This time, @Mytherin has written a blog post explaining the quite long and exciting list of new features in this release.
Binary builds are listed at the bottom of this post. Please note that it can take a couple of hours until binary builds for all platforms and environments are available.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE
command with the old version followed by IMPORT DATABASE
with the new version to migrate your data. See the documentation for details.
Featured Changes
- Optimistically write data to disk when batch loading data into the system by @Mytherin in #4996
- Parallel non-order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5033
- Parallel order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5082
- FSST compression by @samansmink in #4366
- CHIMP128 Compression by @Tishj in #4878
- Patas Compression (float/double) (variation on Chimp) by @Tishj in #5044
- Parallel CSV Reader by @pdet in #5194
- Parallelize CREATE INDEX of ART by @taniabogatsch in #4655
- Improve memory management of ART indexes by @Mytherin in #5292
- DISTINCT aggregates with GROUP BY are now executed in parallel by @Tishj in #5146
- Nested "UNION"-type by @Maxxen in #4966
- Allow for queries to start with FROM, instead of with SELECT by @Mytherin in #5076
- Support for the COLUMNS expression, which allows expanding computations on multiple columns by @Mytherin in #5120
- Python-style list-comprehension syntax @Mytherin in #4926
- Improvements to Out-of-Core Hash Join by @lnkuiper in #4970
- jemalloc "extension" for Linux by @lnkuiper in #4971
- Improve rendering of result sets for the shell by @Mytherin in #5140
- Add auto-complete support to the shell by @Mytherin in #4921
- Nicer looking progress bar by @Mytherin in #5187
All Changes
- Fix #4747: Handle pandas num categories between 128 and 256 by @pankajp in #4757
- Julia 0.5.1 by @Mytherin in #4758
- Fix #3595: avoid using system hash for floating point values by @Mytherin in #4761
- Fix #4704. Correct the column name for pragma_storage_info with generated column by @zippond in #4750
- Allow to load extensions through compiler variable definitions by @pdet in #4767
- Fix some typo in code comments by @buaazhwb in #4769
- Enhance duckdb_constraints() by @krlmlr in #4346
- Issue #4764: Window Ignore Nulls by @hawkfish in #4773
- [Python (Relational)] Query now returns a DuckDBPyRelation by @Tishj in #4471
- R types expansion by @hannes in #4778
- Add json_contains by @lnkuiper in #4686
- Fix #4152: create base table reference in returning clause so generated columns are correctly resolved by @Mytherin in #4783
- Fix Exists and ANY correlated subquerys by @lokax in #4752
- Fix for ORDER BY on large dictionary vectors: correctly pass offset into get_index of selection vector by @Mytherin in #4787
- Missing json_contains in extension list by @Mytherin in #4788
- Extensible Casts & Cast Function Rework by @Mytherin in #4785
- Bump sqlite scanner by @hannes in #4789
- Improve sorting for strings and push projections into sort operator by @lnkuiper in #4697
- Parquet: Refactor decompression, including more complete datapage v2 support by @wisp3rwind in #4628
- Parallelize CREATE INDEX of ART by @taniabogatsch in #4655
- Unify LocalStorage and DataTable Storage by @Mytherin in #4798
- feat: support passing all db config to jdbc driver by @Mause in #4794
- Fix #4806: correctly use offset index in pragma_table_info on view by @Mytherin in #4807
- Map VARCHAR, JSON, ENUM to Julia String by @nickrobinson251 in #4810
- fix: support SHOW query types in jdbc client by @Mause in #4799
- Replacement Open Hooks by @hannes in #4721
- Build multiple out of tree extensions in one pass by @Mytherin in #4828
- fix(jdbc): release results before releasing statements by @Mause in #4831
- Fix for #4827 by @PedroTadim in #4829
- Multiblock2 by @jkub in #4555
- Disconnect after test by @krlmlr in #4835
- Check prefix length, not string_t::INLINE_LENGTH when comparing strings while sorting by @lnkuiper in #4816
- Adding a CI workflow to re-build individual out-of-tree extensions by @hannes in #4833
- fix: json getColumnType error by @Mause in #4847
- Attempt two at rebuilding old extensions by @hannes in #4848
- Updating postgres scanner by @hannes in #4832
- Extension Rebuild Attempt 3 by @hannes in #4849
- Adding overwrite flag to R duckdb_register by @hannes in #4850
- Move LocalStorage row groups directly to DataTable instead of re-appending by @Mytherin in #4851
- fix for macos CI by @samansmink in #4854
- Fully qualified s3url by @LindsayWray in #4786
- FSST compression by @samansmink in #4366
- Julia: add support for handling errors in replacement scans by @Mytherin in #4865
- Extension build: turn IGNORE_WARNINGS into generic OPTIONS field, and add --main-only field by @Mytherin in #4866
- Issue #4867: Approximate Quantile Hugeint by @hawkfish in #4868
- Install OpenSSH on ubuntu 16 by @Mytherin in #4877
- Join order regression test: add 20% threshold to cardinalities before we care about regressions by @Mytherin in #4880
- Move LocalStorage row groups directly to DataTable if there are enough rows being appended by @Mytherin in #4876
- Allow referencing of aliases in SELECT clause and TPC-DS extension clean-up by @Mytherin in #4879
- Add github to known hosts by @Mytherin in #4884
- Adding a serialized version of all TPCH queries and test we can read them by @bleskes in #4605
- Add support for custom bind functions to RegisterCastFunction, and propagate client context to the bind function by @Mytherin in #4885
- CSV reader: quoted NULL values should be kept as non-NULL by @Mytherin in #4888
- fix: add numpy to setup_requires to fix build from source by @Mause in #4893
- fix openFlags overwriting in shell fixing #4894 by @kouta-kun in #4895
- Remove filter columns from table scans if they are unused in the remainder of the plan by @lnkuiper in #4817
- feat: add duckdb_library_version method and fix extension load state by @Mause in #4881
- uuid.cpp: GenerateRandomUUID: fix indexing by @nodakai in #4892
- Update serialized plans by @Mytherin in #4900
- Add CPython 3.11 to build matrix by @edgarrmondragon in #4906
- Support UNION_BY_NAME option in read_csv_auto by @douenergy in #4837
- support for virtualizing storage layer by @jkub in #4858
- Reduce data set size of IE join test by @Mytherin in #4905
- Making sure parquet column readers return the expected amount of rows by @hannes in #4909
- Issue #3187: TIMESTAMPTZ <=> VARCHAR by @hawkfish in #4904
- Fix breaking CI on unused variable errors by @Tishj in #4916
- Issue #4912: NOW returns TIMESTAMPTZ by @hawkfish in #4914
- Add ClickBench to benchmark suite by @Mytherin in #4919
- Add auto-complete support to the shell by @Mytherin in #4921
- Add support for list parameters to read_csv and read_csv_auto by @Mytherin in #4922
- Add Python-style list-comprehension syntax support to SQL by @Mytherin in #4926
- Auto-complete: prioritize files with known extensions, and include position at which completion should be placed by @Mytherin in #4930
- Fix ART by @lokax in #4763
- [Python] Add support for Protocols by @Tishj in #4435
- Work-around for #4935: throw internal error if there is no node by @Mytherin in #4940
- Fix #4933: avoid introducing NULL value on first value after empty row by @Mytherin in #4934
- Issue #4942: Check DESC Errors by @hawkfish in #4945
- Issue #4944: Negative Unpadded Centuries by @hawkfish in #4948
- Issue #4943: Date Nanosecond Overflow by @hawkfish in #4947
- feat: add copy method for logical_operator by @stephaniewang526 in #4915
- Bug fix for segmentation fault in list apply by @taniabogatsch in #4910
- Fixing hmac for large secrets in S3FS by @hannes in #4949
- buffered by @jkub in #4924
- Caching Database Instances by @pdet in #4414
- Faster ART key allocations, faster index join by @taniabogatsch in #4800
- Add CI run with disabled string inlining by @Mytherin in #4957
- Split row-group append into Initialize/Append/Finalize and separate append code from version info append by @Mytherin in #4953
- Issue #4965: DateDiff Day Overflow by @hawkfish in #4973
- noswizzle by @jkub in #4923
- Issue #4978: DATE_SUB Subtraction Overflows by @hawkfish in #4985
- feat: request that people raise scanner issues in the right repos by @Mause in #4956
- avoid double-writing the index data by @jkub in #4946
- [Python] Optional Pandas Date as datetime by @pdet in #4633
- Optimistically write data to disk when batch loading data into the system by @Mytherin in #4996
- Bring substrait-extension build back by @pdet in #4993
- fix(jdbc): shutdown database after last connection is closed by @Mause in #4990
- Add support for TRUNCATE [TABLE] syntax by @Mytherin in #5001
- Directly merge row groups from local storage into table even if the table has indexes by @Mytherin in #5003
- String to list casting by @LindsayWray in #4994
- Optimize away DELIM_JOIN even when the child join with the DELIM_GET is an inequality join by @lnkuiper in #4991
- JDBC: Add public getter for statement return type by @Jens-H in #5014
- Remove duplicate code by @lokax in #5008
- Fix lambda bug for struct extract by @taniabogatsch in #5007
- Support CREATE OR REPLACE / TEMPORARY / IF NOT EXISTS with CREATE MACRO / FUNCTION by @lnkuiper in #5006
- Fixing #4859, correctly passing struct type to recursive calls by @hannes in #5017
- [CSV] Added line number to 'maximum_line_size' exceeded error by @Tishj in #5018
- ODBC/JDBC Database Instance Cache by @Mytherin in #5004
- String functions: count unicode codepoints instead of grapheme clusters by @Mytherin in #5028
- Support file_search_path with globbing by @whscullin in #5021
- First cut at TypeScript type declarations for DuckDb by @antonycourtney in #5025
- Undefined behavior sanitizer error fix by @Tishj in #5030
- [Compression] CHIMP128 Compression Algorithm by @Tishj in #4878
- Parallel non-order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5033
- fsst bugfix by @samansmink in #5042
- Avoid installing git for ODBC Windows CI Run by @Mytherin in #5051
- Fix for shell auto-complete by @Mytherin in #5047
- Fix a race condition in an assert by @Mytherin in #5049
- [Python] Accept 'schema' in table reference by @Tishj in #5059
- Fix levenshtein(s1, s2) for empty strings by @lmores in #5062
- Correctly handle NULL values in compound ART keys by @taniabogatsch in #5010
- Issue #5023: Fully Parallel Partitioning by @hawkfish in #5024
- Enable remote optimizer test by @Y-- in #5019
- make wal impl more reusable by @jkub in #5071
- Optionally allow for queries to start with FROM, instead of with SELECT by @Mytherin in #5076
- [Python] Fall back to DOUBLE for unsupported DECIMAL widths by @Tishj in #4749
- Issue #5046: Window Size Restriction by @hawkfish in #5079
- Shell: fixes for auto-complete of home directory and absolute paths by @Mytherin in #5081
- Varsizeblock by @jkub in #5069
- Parallel order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5082
- Fix #5077: correctly handle carriage return newlines in CSV auto-detection by @Mytherin in #5083
- caching table-in-out-functions & chunk cache refactor by @samansmink in #4992
- Fix for #4935: throw internal error if there is no node by @Tmonster in #5089
- Add nested "union"-type by @Maxxen in #4966
- Row Group Collection - smaller allocations for tiny tables by @Mytherin in #5086
- chore: pin setuptools_scm to py3.6 compatible version by @Mause in #5099
- Correctly scan unaligned row groups in DataTable::ScanTableSegment by @Mytherin in #5101
- feat: implement DatabaseMetadata#getFunctions() by @Mause in #5090
- Support batch index in arrow scans by @Mytherin in #5085
- Arrow support for JDBC ResultSet by @hannes in #5088
- fix(jdbc): gracefully handle null bytes in strings by @Mause in #5100
- Add file_row_number flag to parquet reader by @hannes in #5084
- Fix comment by @zhouliqi in #5110
- Add ErrorManager class, allow SQLLogicTests to verify error messages, and improve CSV reader errors by @Mytherin in #5103
- Add support for the COLUMNS expression, which allows expanding computations on multiple columns by @Mytherin in #5120
- Issue #5107: ICU Data Scripts by @hawkfish in #5109
- Batch Insert: Add support for eagerly merging of small adjacent batch indexes by @Mytherin in #5113
- Add temporary 'skip_reload' to problematic test by @Tishj in #5133
- [Python] Add MSVC
/utf-8
flag by @metab0t in #5129 - Convert values whose data types do not have explicit support in NodeJS into strings by @jwills in #5130
- Download OpenSSL from Github instead by @Mytherin in #5141
- Add BoxRenderer class - which improves rendering of result sets for the shell by @Mytherin in #5140
- [Dev] Add
extension
to excluded folder in `format.py' (format-fix/master) by @Tishj in #5142 - Fix #5124: correctly deal with DICTIONARY vectors inside LIST vectors for various functions by @Mytherin in #5151
- [Aggregate] DISTINCT aggregates with GROUP BY are now executed in parallel by @Tishj in #5146
- [Python] Exceptions encountered in 'with' body are now properly propagated by @Tishj in #5157
- Create enum type from query by @lokax in #5126
- Fix #5149: better tracking of query location in column reference, and improve error message by @Mytherin in #5158
- Allow builder to set
GIT_COMMIT_HASH
by @Y-- in #5164 - Fsst bug by @samansmink in #5168
- [Python] Arrow Dataset type requirement is now less strict by @Tishj in #5170
- Fix progress bar of regular table scan by @Mytherin in #5171
- Document highlight features in the shell by @Mytherin in #5176
- Support parallel (batch) insertion into tables that have indexes by @Mytherin in #5177
- Support casting of hex strings to integer types by @IanCal in #5160
- [Aggregate] Fix regressions caused by latest distinct HT operator PR by @Tishj in #5169
- R: Remove duckdb:: qualifier by @krlmlr in #5135
- [Compression] Patas Compression (float/double) (variation on Chimp) by @Tishj in #5044
- [C-API] Decimal casting to other type fixes by @Tishj in #4526
- Default NULL handing for CARDINALITY function by @lokax in #5073
- Update OpenSSL to 1.1.1s by @sjaenick in #5184
- Box renderer: Always display "0 rows" if there are no rows by @Mytherin in #5188
- chore: request OS version and architecture in bug reports by @Mause in #5191
- String to struct cast by @LindsayWray in #5147
- Optimize String Split by @Mytherin in #5186
- Nicer looking progress bar by @Mytherin in #5187
- Correctly call Reset on cast_chunk in CSV writer to prevent string heap from continuously accumulating data by @Mytherin in #5199
- Increase vector size to 2048 by @Mytherin in #5193
- Issue #5131: Time Zone 2022f … by @hawkfish in #5198
- jemalloc "extension" for Linux by @lnkuiper in #4971
- Further clarify database invalidation error, unify db/transaction invalidation, and move errors to error manager by @Mytherin in #5213
- fix: build NodeJS bindings for M1 by @Mause in #5189
- Arrow extension by @samansmink in #5195
- Fix distinct aggregate race: insert next event before scheduling tasks by @Mytherin in #5219
- Avoid exporting SQLite symbols from our sqlite_api_wrapper when building the shell by @Mytherin in #5217
- buffermanager accounting by @jkub in #5134
- Allow NULL bytes in strings by @Mytherin in #5218
- Use cmake's find_package to trace git executable by @bleskes in #5220
- Issue #5197: Deterministic TimeZone Abbreviations by @hawkfish in #5214
- Issue #5239: DATE_DIFF Microseconds Overflow by @hawkfish in #5242
- Various CI Improvements/Speed Ups by @Mytherin in #5228
- Issue #5240: DATE_TRUNC Statistics Orientation by @hawkfish in #5241
- Improvements to Out-of-Core Hash Join by @lnkuiper in #4970
- Add support for extension aliases by @Mytherin in #5226
- Physical batch insert: correctly optimistically flush batches to disk that are close to our row group size by @Mytherin in #5231
- Fix Python stub test by @Mytherin in #5245
- DISTINCT grouped aggregate lowered memory consumption optimization by @Tishj in #5227
- fix: bump node-gyp version by @Mause in #5221
- json_extract bugfixes and memory accounting bugfix by @lnkuiper in #5204
- [Python] Add support for strided
float32
andfloat64
data by @Tishj in #5256 - Issue #4978 - 4. Cardinality estimator assertion errors and filter errors by @Tmonster in #5232
- Adding total_uncompressed_size to parquet column chunk metadata by @hannes in #5216
- Issue #5258: Inverse Percentile NULLs by @hawkfish in #5260
- Issue #5205: TIMESTAMPTZ Casting by @hawkfish in #5229
-
- Like empty list assertion error by @LindsayWray in #5261
- fix: fix python stub test by @Mause in #5269
- Issue #5254: Validate Collation Expressions by @hawkfish in #5270
- Cast overflow varchar to decimal by @LindsayWray in #5262
- Issue #5259: ChunkCollection Sort Values by @hawkfish in #5280
- Parallel CSV Reader by @pdet in #5194
- Add Java constant for default schema name by @michaeljohnalbers in #5271
- Fix/4978 substring overflow by @Maxxen in #5273
- token url encoding bug in S3 glob by @samansmink in #5248
- fix: don't build M1 NodeJS binaries on node versions that don't support M1 by @Mause in #5284
- Several CI fixes by @Mytherin in #5281
- Fix several fuzzer issues, move client context into ExpressionExecutor, and ColumnList index rework by @Mytherin in #5276
- feat: prebuild for NodeJS 19 by @Mause in #5295
- Calendar overflow Fixes by @hawkfish in #5287
- Add correlated columns to LogicalDistinct::distinct_targets when flattening dependent joins by @lnkuiper in #5286
- Fuzzer fixes - 4978 (16) Binder assertion error by @Tmonster in #5285
- [Fuzzer] Fix triggered assertion in LogicalOperator::Verify by @Tishj in #5283
- Disable url decoding of http header values by @samansmink in #5275
- fix: pg constraint foreign key by @Mause in #5264
- Improve memory management of ART indexes by @Mytherin in #5292
- Several parallel CSV reader fixes by @Mytherin in #5291
- [Python] support for pandas experimental NA type by @Tishj in #5246
- Add internal verification to unpinning buffer blocks by @lnkuiper in #5263
- [Python] Fix support for UInt64 and similar by @Tishj in #5299
- Add support for quoted schema/column in DESCRIBE statement by @Tishj in #5230
- Increase SQLite scanner version by @Mytherin in #5309
- node / TypesScript bindings: add missing accessMode argument to Database constructor. by @antonycourtney in #5307
- Initial version of extension to allow creating operators outside of duckdb core lib by @rjatwal in #5144
- Improve progress bar & box rendering by @Mytherin in #5304
- Parallel csv auto fixes by @pdet in #5303
- Current fix for Issue #5266 Returning error with rowid by @Tmonster in #5267
- [Fuzzer] Add support for use of generated columns in GROUP BY expression by @Tishj in #5249
- [Fuzzer] Generated columns now work properly with query-level aliases by @Tishj in #5308
- fix: use oldest supported numpy to build for a given python version by @Mause in #5319
- [UB sanitizer] Prevent doing arithmetic on NaN in 'logical_limit_percent.cpp' by @Tishj in #5322
- Fix OSX Builds on Master - Revert #5319 by @Mytherin in #5329
- Bump Postgres Scanner by @hannes in #5325
- disable node client arrow ipc replacement scans by @samansmink in #5332
- Shared ColumnDataAllocator: hold lock for just a bit longer by @lnkuiper in #5333
Full Changelog: v0.5.1...v0.6.0