This preview release of DuckDB is named "Labradorius" after the Labrador duck (Camptorhynchus labradorius) which was native to North America and went extinct in 1878 despite its reportedly bad taste.
Again, @Mytherin has written a blog post explaining the exciting list of new features in this release.
Binary builds are listed at the bottom of this post. Please note that it can take a couple of hours until binary builds for all platforms and environments are available.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE
command with the old version followed by IMPORT DATABASE
with the new version to migrate your data. See the documentation for details.
What's Changed
- Use structs to avoid confusing C pointer wrappers by @krlmlr in #4961
- Enum type added to the types metadata table by @LindsayWray in #5290
- R: code format by @krlmlr in #5185
- Add starts_with function and operator by @papparapa in #5334
- Feature: Allow binary-formatted strings to be cast to integers by @Maxxen in #5337
- For range joins use NL join when the LHS or RHS side is tiny by @Mytherin in #5399
- Add support for LATERAL joins by @Mytherin in #5393
- [Julia] Add support for consuming a UNION vector into a DataFrame by @Tishj in #5360
- Issue #5314: At Time Zone by @hawkfish in #5341
- Decimal values now round when the value given has more decimals than the
scale
of the target by @Tishj in #5362 - Shell: add individual SQL queries to the history, instead of individual lines by @Mytherin in #5414
- Shell: add support for history search by @Mytherin in #5415
- Parallelise scanning result of ORDER_BY by @lnkuiper in #5403
- Add translate function by @zhouliqi in #5212
- Enable cmake to recognize AppleClang by @changhiskhan in #5432
- Support enum_code() function by @lokax in #5408
- Fix binder error and produce more informative error message. by @Tmonster in #5302
- Parquet Reader: Re-use (de)compression and dictionary buffers and allocate powers of two by @Mytherin in #5445
- Support RLE, DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY Parquet encodings by @Mytherin in #5457
- print profiling output for deserialized logical query plans by @ila in #5448
- Issue #5277: Sorted Aggregate Sorting by @hawkfish in #5456
- Add internal flag to duckdb_functions, and correctly set internal flag for internal functions by @Mytherin in #5462
- Add experimental R String passthrough support by @hannes in #5479
- Issue #5258: Quantile Negative Fractions by @hawkfish in #5463
- Arrow stream ingestion for JDBC client by @hannes in #5449
- PER_THREAD_OUTPUT flag for COPY by @hannes in #5412
- Feature: skip broken tests for now by @Mytherin in #5532
- Add Union All support to R extention by @Tmonster in #5484
- [Python] Add from_parquet features by @papparapa in #5492
- Add ExtractStatements to C API by @LindsayWray in #5524
- Improve http retry by @samansmink in #5549
- Issue #5277: Sorted Aggregate Window by @hawkfish in #5571
- Issue #5422: QUANTILE_DESC Decimals by @hawkfish in #5572
- Issue #5559: 2022g Time Zones by @hawkfish in #5570
- [Dev] Clean up of the python pkg folder structure by @Tishj in #5436
- httpfs: check environment vars for AWS Credentials by @satotake in #5419
- Misc union-type improvements by @Maxxen in #5617
- Fix so Left inner join doesn't re-optimize nodes by @Tmonster in #5620
- [Substrait] C API + from_substrait_json + bump on substrait version. by @pdet in #5613
- Allow strings in ColumnDataCollection to be written to disk by @lnkuiper in #5543
- [PythonDEV] Let
clean.sh
be run from anywhere, not justtools/pythonpkg
by @Tishj in #5625 - Reorganize Join order optimizer code by @Tmonster in #5621
- [Catalog] Grab missing write_locks in a couple places by @Tishj in #5601
- Parquet info to Substrait by @pdet in #5627
- HTTP parquet optimizations by @samansmink in #5405
- Adding delta compression to Bitpacking compression by @samansmink in #5491
- [Python] Changed use of DuckDBPyConnection to shared_ptr by @Tishj in #5635
- Merge feature branch into master by @Mytherin in #5645
- [Python] Display progress bar by default in an interactive environment by @Tishj in #5596
- Add support for
RESET
statement on configuration options by @Tishj in #5603 - httpfs: Encode url path on request by @satotake in #5587
- Fix broken CI because of RESET statement by @Tishj in #5671
- Don't automatically set the bug label on issues by @Mytherin in #5680
- Add support for CREATE VIEW IF NOT EXISTS by @Mytherin in #5682
- Issue #5622: Validate Timezone Characters by @hawkfish in #5658
- Issue 5630 fix. by @Tmonster in #5644
- Adding COLUMN_TYPES option for read_csv_auto by @pdet in #5552
- [Python] Get rid of DuckDBPyResult (merged functionality into DuckDBPyRelation) by @Tishj in #5597
- feat: port nodejs tests to typescript by @Mause in #5632
- Improve nodejs README by @Tishj in #5688
- [Python] Add (partial) support for
numpy.datetime64
objects by @Tishj in #5659 - retry on all httplib errors by @samansmink in #5684
- Return false if file doesn't exist by @Y-- in #5701
- Adding context option to not run replacement scans and exporting namespace of json substrait function - R by @pdet in #5689
- Issue #5609: Scope CTE Windows by @hawkfish in #5690
- Attempt to fix random NodeJS CI failure by @Tishj in #5710
- [Python]
duckdb.execute()
==duckdb.default_connection.execute()
by @Tishj in #5650 - NodeJS: switch to using package_build, and add support to BUILD_NODE to Makefile by @Mytherin in #5691
- JDBC SNAPSHOT Jars by @hannes in #5687
- Fix NodeJS 19 CI for Windows by @Tishj in #5719
- Fix issue 5664 by @lokax in #5667
- Issue #5712: CURRENT_TIMESTAMP and CURRENT_TIME by @hawkfish in #5713
- [CSVReader] Catch a user error in supplying 'columns' option by @Tishj in #5721
- Improve suggestions when LOAD of an extension fails by @Mytherin in #5722
- doc(nodejs): amend arrow stream type docs by @Mause in #5731
- Fix for TSV throwing during sniffing by @pdet in #5555
- Statically link extensions on Linux with Clang by @jkub in #5653
- [Python] Add support for named parameters by @Tishj in #5611
- fix: nodejs source releases should be standalone by @Mause in #5734
- build: don't install python from chocolatey by @Mause in #5740
- fix: use non-string-splitting variable interpolation in binding.gyp.in by @Mause in #5745
- Equalizing DBConfig constructors by @nicku33 in #5747
- We should not treat replacement open paths as disk paths by @nicku33 in #5748
- Allow table in-out functions to be used in correlated subqueries and as LATERAL queries by @Mytherin in #5485
- Issue #5750: clangd std::move by @hawkfish in #5751
- Always parallelize CSV reader when run over multiple files, and several other fixes by @Mytherin in #5757
- Add C++ ODBC tests framework by @Mytherin in #5755
- Fix #5730: document older DuckDB versions internally, and state which DuckDB version a specific file came from by @Mytherin in #5758
- Add support for non-order preserving parallel writing to the CSV and Parquet writers by @Mytherin in #5756
- Don't compute SHA if we allow unsigned extensions by @Y-- in #5760
- Maintain BlockHandle of meta blocks by BlockManager by @Hzc492 in #5699
- Imdb benchmark validation and benchmark improvements by @Tmonster in #5693
- Add support for attaching multiple DuckDB Databases by @Mytherin in #5764
- Fix #5744: Correctly read "compressed" flag in Parquet V2 header by @Mytherin in #5767
- Fix issue 5646 by @lokax in #5652
- Remove icu from ignored directories when formatted by @papparapa in #5765
- Correctly throw an error when attaching over HTTPFS by @Mytherin in #5773
- JDBC add getLong method for timestamp columns by @Jens-H in #5783
- Issue #5776: ISO Year Corrections by @hawkfish in #5796
- Issue #5669: Advance NULL Pointers by @hawkfish in #5793
- Issue #5791: TIMESTAMP/TIMESTAMPTZ Casting by @hawkfish in #5801
- Issue #4121: INTERVAL List Search by @hawkfish in #5805
- Fix incorrect file name in icu-timezone.hpp comment by @papparapa in #5784
- Fully Qualified s3url request with globs by @LindsayWray in #5774
- Map restructure by @LindsayWray in #5768
- Issue #5806: Count Star Window by @hawkfish in #5810
- S3 uploader fixes by @samansmink in #5769
- fix for FSST segfault by @samansmink in #5824
- Issue: #5717: SetValue TIMESTAMP Case by @hawkfish in #5804
- Remove unnecessary code modifying the validity mask of the child vectors of a struct by @Mytherin in #5844
- Add date part specifier synonyms by @papparapa in #5845
- Add non ICU time_bucket function by @papparapa in #5835
- Fix list_sort segmentation fault regression by @taniabogatsch in #5823
- Add support for specifying timestamp precision using standard modifiers by @Mytherin in #5848
- Fix #5836: generate unique oid for attached databases as well by @Mytherin in #5851
- Fix #5782 and #5794: in strict mode do not accept leading zeros when parsing numbers by @Mytherin in #5850
- Fix #5781: add missing flatten call to list_aggregate by @Mytherin in #5854
- Fix #5788: improve error message when referencing an alias that contains a subquery (not supported yet) by @Mytherin in #5855
- Fix #5853: fully qualify function names inside macros during the binding process by @Mytherin in #5856
- CSV Auto Detection: disallow leading + when parsing numbers in strict mode by @Mytherin in #5857
- Additional quote handling for string to list cast by @LindsayWray in #5859
- Parser: add support for unicode space characters by @Mytherin in #5858
- Adding std:: to every move by @hannes in #5873
- MinGW Warning Fixes for R by @Mytherin in #5881
- Issue #5887: ICU DateAdd Overflow by @hawkfish in #5888
- NodeJS replacement scans by @whscullin in #5825
- Further parallelize index creation by @taniabogatsch in #5812
- Added the 'GetExpectedParameterTypes' method to the PreparedStatement… by @AlexR2D2 in #5792
- Extend GenericExecutor by @Maxxen in #5863
- Add signbit function for floating point values by @carlopi in #5862
- Nested/Outer lambda parameters in rhs of inner lambda expressions by @taniabogatsch in #5860
- Issue #5826: ICUDateFunc SubtractField Fix by @hawkfish in #5869
- Reset
schema
setting when the default schema is dropped by @Tishj in #5874 - Issue #5870: Nested ArgMinMax Results by @hawkfish in #5879
- Fix #5779: Parquet writer - when writing lists write only the required subsection of a child entry to the Parquet file by @Mytherin in #5875
- Change AddString to AddBlob in update_segment.cpp by @Maxxen in #5837
- Support UNION_BY_NAME option in parquet_scan read_parquet by @douenergy in #5716
- Minor fixes for cran by @hannes in #5904
- [Julia] Use best practices for locking strategies by @Tishj in #5905
- More GCC 13 issues by @hannes in #5907
- read_csv_auto column_types improvements by @Mytherin in #5911
- feat: add parser support for CREATE DATABASE to allow extensions to provide the functionality by @stephaniewang526 in #5898
- Fix #5903: ICU addition overflow by @papparapa in #5908
- Export set operations to relational API by @Tmonster in #5872
- Join Order + EXPLAIN Improvements by @lnkuiper in #5891
- fixed bug in generating grammar script by @ila in #5917
- Fix implicit conversion by @carlopi in #5892
- Issue 5660 - do not allow unnest with alias in groupby by @Tmonster in #5918
- Parquet: correctly output TIMESTAMP_TZ type when isAdjustedToUTC is set by @Mytherin in #5916
- Parquet reader: Fix an issue reading boolean values that cross column pages by @Mytherin in #5926
- Allow hive columns to be present in parquet files by @samansmink in #5901
- Fix #5936 - in the Pragma parser, avoid calling ToString() on column references because it might add quotes to keywords by @Mytherin in #5939
- Add option to limit parallel compile by @ashish01 in #5935
- Avoid writing .tmp file when redirecting stdout to a file by @Mytherin in #5930
- [Macro] Remove limitation for types of expressions accepted as named arguments by @Tishj in #5876
- Benchmarks by @carlopi in #5942
- Fix non-deterministic test failure of parquet_scan by @papparapa in #5954
- [Binder] Throw exception for aggregate function modifiers applied to non-aggregate functions by @lnkuiper in #5951
- Add logical plan serialization for LOAD, DROP and ALTER by @ywelsch in #5934
- Fix trainbenchmark non-determinism adding ordering by @carlopi in #5952
- Add missing include to duckdb.hpp by @Tishj in #5953
- httpfs: remove unneded include by @carlopi in #5945
- [jemalloc] Detect LG_PAGE by @lnkuiper in #5949
- String to map cast by @LindsayWray in #5838
- Add time bucket function by @papparapa in #5665
- [Python] Add UNION_BY_NAME to from_parquet arguments by @papparapa in #5913
- Ci partial rework by @carlopi in #5943
- Issue #3423: Positional Join Operator by @hawkfish in #5867
- docs: add nodejs connection args to docs by @tshauck in #5780
- [SQL Logic Test] Add support for environment variables by @Tishj in #5877
- Ci: scope / remove env variables by @carlopi in #5967
- Restore node-pre-gyp credentials by @carlopi in #5970
- Python: Allow replacement scans on pyrelations, move to BoxRenderer and use pending query API for relations by @Mytherin in #5962
- Issue #5023: Window Radix Partitions by @hawkfish in #5909
- Update copyright year by @sjaenick in #5974
- Fix #5968: ignore repetition type of root schema by @Mytherin in #5969
- Fix #5971: addition and subtraction on infinity of TIMESTAMPTZ by @papparapa in #5978
- Regression ci by @carlopi in #5961
- Add support for UPSERT (INSERT .. ON CONFLICT DO ..) syntax by @Tishj in #5866
- Removing Arrow ABI Testing by @pdet in #5980
- Remove LogicalTypeId::JSON and implement read_json_objects by @lnkuiper in #5544
- Make sqlsmith extension compile by @PedroTadim in #5963
- Issue #5023: Radix Partition Cardinality by @hawkfish in #5989
- Added UUID case to GetTypeToPython by @maclockard in #5885
- Export right left and full joins by @Tmonster in #5822
- Decimal separator option for CSV reader by @eeroel in #5958
- feat(python): fsspec filesystems by @Mause in #5829
- Fix amalgamation build: returning (std::)move will otherwise be flagg… by @carlopi in #5991
- Fix issue 5675 by @samansmink in #6001
- Only copy relevant list children in column data collection by @taniabogatsch in #5982
- Many Parallel CSV Reader Fixes by @pdet in #5950
- [python] Use duck typing for arrow dataset by @changhiskhan in #5998
- [Import/Export] Exported databases can now be safely moved by @Tishj in #5965
- Add count_if as a macro function by @ashish01 in #6007
- Rcpp17 by @carlopi in #6022
- [Dev] Fix #6020: Fix failure of CI with pyarrow by @papparapa in #6023
- Format script: enforce same varargs formatting by @Mytherin in #6014
- fixed fsst issue with size calculation check by @samansmink in #6016
- Fix Node.js Windows CI jobs by @carlopi in #6041
- fix 5923 - convert float16 column in pandas to float32 column by @wordhardqi in #6028
- Add support for JDBC Metadata for the nested typess List, Struct, Map by @jonathanswenson in #6029
- move block checksum from FileBuffer to BlockManager by @jkub in #6033
- Optimize SELECT UNNEST in lateral joins by @taniabogatsch in #6035
- Fix Python CI: numpy added before-build by @carlopi in #6056
- Track exact ART size and many ART improvements by @taniabogatsch in #5893
- Use back-up to download unixODBC by @carlopi in #6063
- Copy into partition by by @samansmink in #5964
- Add support for a pluggable storage and catalog back-end, and add support for a SQLite back-end storage by @Mytherin in #6066
- Various fixes by @carlopi in #6036
- out of tree extension improvements by @samansmink in #6049
- Fix checks on R-devel by @krlmlr in #6025
- Update editor config by @Tmonster in #6065
- Implement read_json and improve JSON parse errors by @lnkuiper in #5992
- [Python] Add
read_csv
method by @Tishj in #6015 - Add bar function by @papparapa in #5993
- Remove console.log from UDF catch by @chrisbrain in #6082
- Fix performance regression in read_csv_auto auto detection by @Mytherin in #6078
- Allowing lambdas in table functions by @taniabogatsch in #6039
- Add relational tests back by @Tmonster in #6038
- Force-enabling DEBUG_MOVE for debug builds by @hannes in #6099
- [Python] Make
pyarrow.dataset
optional by @Tishj in #6106 - Fuzzer issue #5984 no 25 by @LindsayWray in #6107
- [ParquetWriter] Prevent creating broken parquet files by @Tishj in #6104
- [Fuzzer] Fix issue related to dropping a generated column by @Tishj in #6113
- Implement md alias for motherduck and add motherduck to list of known extensions by @Mytherin in #6111
- Map extract bug by @LindsayWray in #6109
- [Dev] Fix some unqualified
move
's that snuck in by @Tishj in #6117 - Ccaching2 by @carlopi in #6101
- Making CSV Parallel tests more robust by @pdet in #6122
- [Julia] Fix execute deadlock by @Tishj in #6123
- [Python] Check overflow in DATE -> datetime conversion by @Tishj in #6125
- Python box rendering: limit rendering to 10K rows by @Mytherin in #6121
- Correctly setting the validity of constant struct vector references by @taniabogatsch in #6118
- Fix #6092 - retain casing for keywords by @Mytherin in #6112
- Fuzzer issue 9 and 40 from #5984 by @samansmink in #6126
- fix(python): fix gil error in fsspec integration by @Mause in #6140
- Fuzzer issue #5984 no.43. Substring generating an invalid string by @LindsayWray in #6139
- Remove sporadically failing Windows CI CSV reader test by @Mytherin in #6147
- issue 5984 #42 disable nan as random seed by @Tmonster in #6128
- [Python] Add
read_parquet
,to_parquet
andto_csv
by @Tishj in #6129 - Make ATTACH work over HTTP(S), and fix ATTACH for databases with custom types by @Mytherin in #6141
- Replace replacement_opens with storage_init by @Mytherin in #6132
- Fix to Fuzzer 5 item #30, plus various very marginal fixes by @carlopi in #6137
- Improve read_json transform errors and fix some read_json related bugs by @lnkuiper in #6145
- [Fuzzer] ArgMax Segfault by @Tishj in #6144
- Fix #6136: fix issue with SINGLE JOIN where NULL values of a struct were not correctly set by @Mytherin in #6148
- Fix fuzzer issue 35: correctly check overflows on casts from float/double to unsigned integers by @Mytherin in #6151
- [Fuzzer] Unset 'swizzled' flag in SortedData by @lnkuiper in #6143
- Introduces Bit type by @LindsayWray in #5990
- Fix issue related to NodeJS UDF not returning constant vectors by @Tishj in #5697
- 6055 column alias in where clause results in binder error by @Tmonster in #6162
- Skip concurrent index/grouping sets tests for now by @Mytherin in #6164
- Skip attach over HTTPFS test by @Mytherin in #6167
- 5982 (8, 12, 15) binder error when group by all & having clause both refer to column from correlated subquery by @Tmonster in #6163
- More minor fixes to warnings by @carlopi in #6138
- Fix fuzzer issue 14: correctly switch between deleting from transaction local storage and main table based on ids by @Mytherin in #6166
- fix(nodejs): error as object instead of string by @Mause in #6174
- Fixing Parallel CSV Reader over multiple files by @pdet in #6131
- Issue 6157 duckdbj database meta data supports like escape clause by @rpbouman in #6178
- Fuzzer issue: Grapheme function overflow by @LindsayWray in #6171
- Fix fuzzer issue 31 (again) by @lnkuiper in #6172
- Wiring storage_info into attach and create_transaction_manager calls by @rjatwal in #6161
- Try to auto-cast list_filter input and throw exception when failing by @taniabogatsch in #6119
- Pass unrecognized configuration options to storage by @Mytherin in #6177
- Art fuzzer issues by @taniabogatsch in #6168
- Fix Python deadlock - execute all PyRelations through the shared execute loop, and throw exception if Pandas Scan is called while GIL is held by @Mytherin in #6186
- No lambdas in CHECK constraint and generated columns by @taniabogatsch in #6190
- fixes #6159 by @rpbouman in #6183
- Fix lambda warning on building by @taniabogatsch in #6195
- Python: Make duckdb.sql return results for non-select queries in the form of a ValueRelation by @Mytherin in #6196
- Fix #6204: fix buffer management in ColumnDataRowCollection construction used in BoxRenderer by @Mytherin in #6205
- Python: make imports lazy, add .sql as an alias for .query, and add integration functions with polars by @Mytherin in #6181
- bugfix(python): fsspec file modes by @Mause in #6207
- Fix #6182: add DESCRIBE to the set of table name keywords by @Mytherin in #6206
- Fix #5983: avoid serializing type as part of numeric statistics (de)serialization by @Mytherin in #6197
- Fuzzer 16: Between type mismatch by @LindsayWray in #6194
- [Fuzzer] Fixes fuzzer issue 27 by @Tishj in #6193
- Fuzzer fixes 2, 3 and 5 of #5984 by @samansmink in #6187
- DuckDBJ: sanitize values of tableTypes argument in DatabaseMetadata.getTables() by @rpbouman in #6180
- Add more tests for fetch* functions, and add support for Pandas-style .describe() by @Mytherin in #6212
- More descriptive error message if we are using a table function as a scalar function by @Mytherin in #6201
- Make NumPy dependency optional by @Mytherin in #6215
- Implement FORMAT JSON for COPY/IMPORT/EXPORT by @lnkuiper in #6170
- Fix ODBC CI by @Mytherin in #6216
- [Python] Add
read_json
method by @Tishj in #6165 - [C-API] Add struct list_entry, ListVector::reserve and ListVector::set_size by @eddyxu in #6155
- Disable the Node build cache in the CI for now by @Mytherin in #6220
- [Java] BigDecimal scale > precision bug fix by @Tishj in #6110
- Fix #6044: in Value::DECIMAL, switch on the width instead of assuming the width is correctly set with the corresponding integer type by @Mytherin in #6219
- Fix #6184: skip unused column removal right after a filter with entries in the projection map by @Mytherin in #6221
- Bump postgres scanner by @hannes in #6226
- Add support for "show" to py relation objects by @Mytherin in #6224
Full Changelog: v0.6.1...v0.7.0