duckdb 0.7.0 on Python PyPI

This preview release of DuckDB is named "Labradorius" after the Labrador duck (Camptorhynchus labradorius) which was native to North America and went extinct in 1878 despite its reportedly bad taste.

Again, @Mytherin has written a blog post explaining the exciting list of new features in this release.

Binary builds are listed at the bottom of this post. Please note that it can take a couple of hours until binary builds for all platforms and environments are available.

Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.

What's Changed

Use structs to avoid confusing C pointer wrappers by @krlmlr in #4961
Enum type added to the types metadata table by @LindsayWray in #5290
R: code format by @krlmlr in #5185
Add starts_with function and operator by @papparapa in #5334
Feature: Allow binary-formatted strings to be cast to integers by @Maxxen in #5337
For range joins use NL join when the LHS or RHS side is tiny by @Mytherin in #5399
Add support for LATERAL joins by @Mytherin in #5393
[Julia] Add support for consuming a UNION vector into a DataFrame by @Tishj in #5360
Issue #5314: At Time Zone by @hawkfish in #5341
Decimal values now round when the value given has more decimals than the scale of the target by @Tishj in #5362
Shell: add individual SQL queries to the history, instead of individual lines by @Mytherin in #5414
Shell: add support for history search by @Mytherin in #5415
Parallelise scanning result of ORDER_BY by @lnkuiper in #5403
Add translate function by @zhouliqi in #5212
Enable cmake to recognize AppleClang by @changhiskhan in #5432
Support enum_code() function by @lokax in #5408
Fix binder error and produce more informative error message. by @Tmonster in #5302
Parquet Reader: Re-use (de)compression and dictionary buffers and allocate powers of two by @Mytherin in #5445
Support RLE, DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY Parquet encodings by @Mytherin in #5457
print profiling output for deserialized logical query plans by @ila in #5448
Issue #5277: Sorted Aggregate Sorting by @hawkfish in #5456
Add internal flag to duckdb_functions, and correctly set internal flag for internal functions by @Mytherin in #5462
Add experimental R String passthrough support by @hannes in #5479
Issue #5258: Quantile Negative Fractions by @hawkfish in #5463
Arrow stream ingestion for JDBC client by @hannes in #5449
PER_THREAD_OUTPUT flag for COPY by @hannes in #5412
Feature: skip broken tests for now by @Mytherin in #5532
Add Union All support to R extention by @Tmonster in #5484
[Python] Add from_parquet features by @papparapa in #5492
Add ExtractStatements to C API by @LindsayWray in #5524
Improve http retry by @samansmink in #5549
Issue #5277: Sorted Aggregate Window by @hawkfish in #5571
Issue #5422: QUANTILE_DESC Decimals by @hawkfish in #5572
Issue #5559: 2022g Time Zones by @hawkfish in #5570
[Dev] Clean up of the python pkg folder structure by @Tishj in #5436
httpfs: check environment vars for AWS Credentials by @satotake in #5419
Misc union-type improvements by @Maxxen in #5617
Fix so Left inner join doesn't re-optimize nodes by @Tmonster in #5620
[Substrait] C API + from_substrait_json + bump on substrait version. by @pdet in #5613
Allow strings in ColumnDataCollection to be written to disk by @lnkuiper in #5543
[PythonDEV] Let clean.sh be run from anywhere, not just tools/pythonpkg by @Tishj in #5625
Reorganize Join order optimizer code by @Tmonster in #5621
[Catalog] Grab missing write_locks in a couple places by @Tishj in #5601
Parquet info to Substrait by @pdet in #5627
HTTP parquet optimizations by @samansmink in #5405
Adding delta compression to Bitpacking compression by @samansmink in #5491
[Python] Changed use of DuckDBPyConnection to shared_ptr by @Tishj in #5635
Merge feature branch into master by @Mytherin in #5645
[Python] Display progress bar by default in an interactive environment by @Tishj in #5596
Add support for RESET statement on configuration options by @Tishj in #5603
httpfs: Encode url path on request by @satotake in #5587
Fix broken CI because of RESET statement by @Tishj in #5671
Don't automatically set the bug label on issues by @Mytherin in #5680
Add support for CREATE VIEW IF NOT EXISTS by @Mytherin in #5682
Issue #5622: Validate Timezone Characters by @hawkfish in #5658
Issue 5630 fix. by @Tmonster in #5644
Adding COLUMN_TYPES option for read_csv_auto by @pdet in #5552
[Python] Get rid of DuckDBPyResult (merged functionality into DuckDBPyRelation) by @Tishj in #5597
feat: port nodejs tests to typescript by @Mause in #5632
Improve nodejs README by @Tishj in #5688
[Python] Add (partial) support for numpy.datetime64 objects by @Tishj in #5659
retry on all httplib errors by @samansmink in #5684
Return false if file doesn't exist by @Y-- in #5701
Adding context option to not run replacement scans and exporting namespace of json substrait function - R by @pdet in #5689
Issue #5609: Scope CTE Windows by @hawkfish in #5690
Attempt to fix random NodeJS CI failure by @Tishj in #5710
[Python] duckdb.execute() == duckdb.default_connection.execute() by @Tishj in #5650
NodeJS: switch to using package_build, and add support to BUILD_NODE to Makefile by @Mytherin in #5691
JDBC SNAPSHOT Jars by @hannes in #5687
Fix NodeJS 19 CI for Windows by @Tishj in #5719
Fix issue 5664 by @lokax in #5667
Issue #5712: CURRENT_TIMESTAMP and CURRENT_TIME by @hawkfish in #5713
[CSVReader] Catch a user error in supplying 'columns' option by @Tishj in #5721
Improve suggestions when LOAD of an extension fails by @Mytherin in #5722
doc(nodejs): amend arrow stream type docs by @Mause in #5731
Fix for TSV throwing during sniffing by @pdet in #5555
Statically link extensions on Linux with Clang by @jkub in #5653
[Python] Add support for named parameters by @Tishj in #5611
fix: nodejs source releases should be standalone by @Mause in #5734
build: don't install python from chocolatey by @Mause in #5740
fix: use non-string-splitting variable interpolation in binding.gyp.in by @Mause in #5745
Equalizing DBConfig constructors by @nicku33 in #5747
We should not treat replacement open paths as disk paths by @nicku33 in #5748
Allow table in-out functions to be used in correlated subqueries and as LATERAL queries by @Mytherin in #5485
Issue #5750: clangd std::move by @hawkfish in #5751
Always parallelize CSV reader when run over multiple files, and several other fixes by @Mytherin in #5757
Add C++ ODBC tests framework by @Mytherin in #5755
Fix #5730: document older DuckDB versions internally, and state which DuckDB version a specific file came from by @Mytherin in #5758
Add support for non-order preserving parallel writing to the CSV and Parquet writers by @Mytherin in #5756
Don't compute SHA if we allow unsigned extensions by @Y-- in #5760
Maintain BlockHandle of meta blocks by BlockManager by @Hzc492 in #5699
Imdb benchmark validation and benchmark improvements by @Tmonster in #5693
Add support for attaching multiple DuckDB Databases by @Mytherin in #5764
Fix #5744: Correctly read "compressed" flag in Parquet V2 header by @Mytherin in #5767
Fix issue 5646 by @lokax in #5652
Remove icu from ignored directories when formatted by @papparapa in #5765
Correctly throw an error when attaching over HTTPFS by @Mytherin in #5773
JDBC add getLong method for timestamp columns by @Jens-H in #5783
Issue #5776: ISO Year Corrections by @hawkfish in #5796
Issue #5669: Advance NULL Pointers by @hawkfish in #5793
Issue #5791: TIMESTAMP/TIMESTAMPTZ Casting by @hawkfish in #5801
Issue #4121: INTERVAL List Search by @hawkfish in #5805
Fix incorrect file name in icu-timezone.hpp comment by @papparapa in #5784
Fully Qualified s3url request with globs by @LindsayWray in #5774
Map restructure by @LindsayWray in #5768
Issue #5806: Count Star Window by @hawkfish in #5810
S3 uploader fixes by @samansmink in #5769
fix for FSST segfault by @samansmink in #5824
Issue: #5717: SetValue TIMESTAMP Case by @hawkfish in #5804
Remove unnecessary code modifying the validity mask of the child vectors of a struct by @Mytherin in #5844
Add date part specifier synonyms by @papparapa in #5845
Add non ICU time_bucket function by @papparapa in #5835
Fix list_sort segmentation fault regression by @taniabogatsch in #5823
Add support for specifying timestamp precision using standard modifiers by @Mytherin in #5848
Fix #5836: generate unique oid for attached databases as well by @Mytherin in #5851
Fix #5782 and #5794: in strict mode do not accept leading zeros when parsing numbers by @Mytherin in #5850
Fix #5781: add missing flatten call to list_aggregate by @Mytherin in #5854
Fix #5788: improve error message when referencing an alias that contains a subquery (not supported yet) by @Mytherin in #5855
Fix #5853: fully qualify function names inside macros during the binding process by @Mytherin in #5856
CSV Auto Detection: disallow leading + when parsing numbers in strict mode by @Mytherin in #5857
Additional quote handling for string to list cast by @LindsayWray in #5859
Parser: add support for unicode space characters by @Mytherin in #5858
Adding std:: to every move by @hannes in #5873
MinGW Warning Fixes for R by @Mytherin in #5881
Issue #5887: ICU DateAdd Overflow by @hawkfish in #5888
NodeJS replacement scans by @whscullin in #5825
Further parallelize index creation by @taniabogatsch in #5812
Added the 'GetExpectedParameterTypes' method to the PreparedStatement… by @AlexR2D2 in #5792
Extend GenericExecutor by @Maxxen in #5863
Add signbit function for floating point values by @carlopi in #5862
Nested/Outer lambda parameters in rhs of inner lambda expressions by @taniabogatsch in #5860
Issue #5826: ICUDateFunc SubtractField Fix by @hawkfish in #5869
Reset schema setting when the default schema is dropped by @Tishj in #5874
Issue #5870: Nested ArgMinMax Results by @hawkfish in #5879
Fix #5779: Parquet writer - when writing lists write only the required subsection of a child entry to the Parquet file by @Mytherin in #5875
Change AddString to AddBlob in update_segment.cpp by @Maxxen in #5837
Support UNION_BY_NAME option in parquet_scan read_parquet by @douenergy in #5716
Minor fixes for cran by @hannes in #5904
[Julia] Use best practices for locking strategies by @Tishj in #5905
More GCC 13 issues by @hannes in #5907
read_csv_auto column_types improvements by @Mytherin in #5911
feat: add parser support for CREATE DATABASE to allow extensions to provide the functionality by @stephaniewang526 in #5898
Fix #5903: ICU addition overflow by @papparapa in #5908
Export set operations to relational API by @Tmonster in #5872
Join Order + EXPLAIN Improvements by @lnkuiper in #5891
fixed bug in generating grammar script by @ila in #5917
Fix implicit conversion by @carlopi in #5892
Issue 5660 - do not allow unnest with alias in groupby by @Tmonster in #5918
Parquet: correctly output TIMESTAMP_TZ type when isAdjustedToUTC is set by @Mytherin in #5916
Parquet reader: Fix an issue reading boolean values that cross column pages by @Mytherin in #5926
Allow hive columns to be present in parquet files by @samansmink in #5901
Fix #5936 - in the Pragma parser, avoid calling ToString() on column references because it might add quotes to keywords by @Mytherin in #5939
Add option to limit parallel compile by @ashish01 in #5935
Avoid writing .tmp file when redirecting stdout to a file by @Mytherin in #5930
[Macro] Remove limitation for types of expressions accepted as named arguments by @Tishj in #5876
Benchmarks by @carlopi in #5942
Fix non-deterministic test failure of parquet_scan by @papparapa in #5954
[Binder] Throw exception for aggregate function modifiers applied to non-aggregate functions by @lnkuiper in #5951
Add logical plan serialization for LOAD, DROP and ALTER by @ywelsch in #5934
Fix trainbenchmark non-determinism adding ordering by @carlopi in #5952
Add missing include to duckdb.hpp by @Tishj in #5953
httpfs: remove unneded include by @carlopi in #5945
[jemalloc] Detect LG_PAGE by @lnkuiper in #5949
String to map cast by @LindsayWray in #5838
Add time bucket function by @papparapa in #5665
[Python] Add UNION_BY_NAME to from_parquet arguments by @papparapa in #5913
Ci partial rework by @carlopi in #5943
Issue #3423: Positional Join Operator by @hawkfish in #5867
docs: add nodejs connection args to docs by @tshauck in #5780
[SQL Logic Test] Add support for environment variables by @Tishj in #5877
Ci: scope / remove env variables by @carlopi in #5967
Restore node-pre-gyp credentials by @carlopi in #5970
Python: Allow replacement scans on pyrelations, move to BoxRenderer and use pending query API for relations by @Mytherin in #5962
Issue #5023: Window Radix Partitions by @hawkfish in #5909
Update copyright year by @sjaenick in #5974
Fix #5968: ignore repetition type of root schema by @Mytherin in #5969
Fix #5971: addition and subtraction on infinity of TIMESTAMPTZ by @papparapa in #5978
Regression ci by @carlopi in #5961
Add support for UPSERT (INSERT .. ON CONFLICT DO ..) syntax by @Tishj in #5866
Removing Arrow ABI Testing by @pdet in #5980
Remove LogicalTypeId::JSON and implement read_json_objects by @lnkuiper in #5544
Make sqlsmith extension compile by @PedroTadim in #5963
Issue #5023: Radix Partition Cardinality by @hawkfish in #5989
Added UUID case to GetTypeToPython by @maclockard in #5885
Export right left and full joins by @Tmonster in #5822
Decimal separator option for CSV reader by @eeroel in #5958
feat(python): fsspec filesystems by @Mause in #5829
Fix amalgamation build: returning (std::)move will otherwise be flagg… by @carlopi in #5991
Fix issue 5675 by @samansmink in #6001
Only copy relevant list children in column data collection by @taniabogatsch in #5982
Many Parallel CSV Reader Fixes by @pdet in #5950
[python] Use duck typing for arrow dataset by @changhiskhan in #5998
[Import/Export] Exported databases can now be safely moved by @Tishj in #5965
Add count_if as a macro function by @ashish01 in #6007
Rcpp17 by @carlopi in #6022
[Dev] Fix #6020: Fix failure of CI with pyarrow by @papparapa in #6023
Format script: enforce same varargs formatting by @Mytherin in #6014
fixed fsst issue with size calculation check by @samansmink in #6016
Fix Node.js Windows CI jobs by @carlopi in #6041
fix 5923 - convert float16 column in pandas to float32 column by @wordhardqi in #6028
Add support for JDBC Metadata for the nested typess List, Struct, Map by @jonathanswenson in #6029
move block checksum from FileBuffer to BlockManager by @jkub in #6033
Optimize SELECT UNNEST in lateral joins by @taniabogatsch in #6035
Fix Python CI: numpy added before-build by @carlopi in #6056
Track exact ART size and many ART improvements by @taniabogatsch in #5893
Use back-up to download unixODBC by @carlopi in #6063
Copy into partition by by @samansmink in #5964
Add support for a pluggable storage and catalog back-end, and add support for a SQLite back-end storage by @Mytherin in #6066
Various fixes by @carlopi in #6036
out of tree extension improvements by @samansmink in #6049
Fix checks on R-devel by @krlmlr in #6025
Update editor config by @Tmonster in #6065
Implement read_json and improve JSON parse errors by @lnkuiper in #5992
[Python] Add read_csv method by @Tishj in #6015
Add bar function by @papparapa in #5993
Remove console.log from UDF catch by @chrisbrain in #6082
Fix performance regression in read_csv_auto auto detection by @Mytherin in #6078
Allowing lambdas in table functions by @taniabogatsch in #6039
Add relational tests back by @Tmonster in #6038
Force-enabling DEBUG_MOVE for debug builds by @hannes in #6099
[Python] Make pyarrow.dataset optional by @Tishj in #6106
Fuzzer issue #5984 no 25 by @LindsayWray in #6107
[ParquetWriter] Prevent creating broken parquet files by @Tishj in #6104
[Fuzzer] Fix issue related to dropping a generated column by @Tishj in #6113
Implement md alias for motherduck and add motherduck to list of known extensions by @Mytherin in #6111
Map extract bug by @LindsayWray in #6109
[Dev] Fix some unqualified move's that snuck in by @Tishj in #6117
Ccaching2 by @carlopi in #6101
Making CSV Parallel tests more robust by @pdet in #6122
[Julia] Fix execute deadlock by @Tishj in #6123
[Python] Check overflow in DATE -> datetime conversion by @Tishj in #6125
Python box rendering: limit rendering to 10K rows by @Mytherin in #6121
Correctly setting the validity of constant struct vector references by @taniabogatsch in #6118
Fix #6092 - retain casing for keywords by @Mytherin in #6112
Fuzzer issue 9 and 40 from #5984 by @samansmink in #6126
fix(python): fix gil error in fsspec integration by @Mause in #6140
Fuzzer issue #5984 no.43. Substring generating an invalid string by @LindsayWray in #6139
Remove sporadically failing Windows CI CSV reader test by @Mytherin in #6147
issue 5984 #42 disable nan as random seed by @Tmonster in #6128
[Python] Add read_parquet, to_parquet and to_csv by @Tishj in #6129
Make ATTACH work over HTTP(S), and fix ATTACH for databases with custom types by @Mytherin in #6141
Replace replacement_opens with storage_init by @Mytherin in #6132
Fix to Fuzzer 5 item #30, plus various very marginal fixes by @carlopi in #6137
Improve read_json transform errors and fix some read_json related bugs by @lnkuiper in #6145
[Fuzzer] ArgMax Segfault by @Tishj in #6144
Fix #6136: fix issue with SINGLE JOIN where NULL values of a struct were not correctly set by @Mytherin in #6148
Fix fuzzer issue 35: correctly check overflows on casts from float/double to unsigned integers by @Mytherin in #6151
[Fuzzer] Unset 'swizzled' flag in SortedData by @lnkuiper in #6143
Introduces Bit type by @LindsayWray in #5990
Fix issue related to NodeJS UDF not returning constant vectors by @Tishj in #5697
6055 column alias in where clause results in binder error by @Tmonster in #6162
Skip concurrent index/grouping sets tests for now by @Mytherin in #6164
Skip attach over HTTPFS test by @Mytherin in #6167
5982 (8, 12, 15) binder error when group by all & having clause both refer to column from correlated subquery by @Tmonster in #6163
More minor fixes to warnings by @carlopi in #6138
Fix fuzzer issue 14: correctly switch between deleting from transaction local storage and main table based on ids by @Mytherin in #6166
fix(nodejs): error as object instead of string by @Mause in #6174
Fixing Parallel CSV Reader over multiple files by @pdet in #6131
Issue 6157 duckdbj database meta data supports like escape clause by @rpbouman in #6178
Fuzzer issue: Grapheme function overflow by @LindsayWray in #6171
Fix fuzzer issue 31 (again) by @lnkuiper in #6172
Wiring storage_info into attach and create_transaction_manager calls by @rjatwal in #6161
Try to auto-cast list_filter input and throw exception when failing by @taniabogatsch in #6119
Pass unrecognized configuration options to storage by @Mytherin in #6177
Art fuzzer issues by @taniabogatsch in #6168
Fix Python deadlock - execute all PyRelations through the shared execute loop, and throw exception if Pandas Scan is called while GIL is held by @Mytherin in #6186
No lambdas in CHECK constraint and generated columns by @taniabogatsch in #6190
fixes #6159 by @rpbouman in #6183
Fix lambda warning on building by @taniabogatsch in #6195
Python: Make duckdb.sql return results for non-select queries in the form of a ValueRelation by @Mytherin in #6196
Fix #6204: fix buffer management in ColumnDataRowCollection construction used in BoxRenderer by @Mytherin in #6205
Python: make imports lazy, add .sql as an alias for .query, and add integration functions with polars by @Mytherin in #6181
bugfix(python): fsspec file modes by @Mause in #6207
Fix #6182: add DESCRIBE to the set of table name keywords by @Mytherin in #6206
Fix #5983: avoid serializing type as part of numeric statistics (de)serialization by @Mytherin in #6197
Fuzzer 16: Between type mismatch by @LindsayWray in #6194
[Fuzzer] Fixes fuzzer issue 27 by @Tishj in #6193
Fuzzer fixes 2, 3 and 5 of #5984 by @samansmink in #6187
DuckDBJ: sanitize values of tableTypes argument in DatabaseMetadata.getTables() by @rpbouman in #6180
Add more tests for fetch* functions, and add support for Pandas-style .describe() by @Mytherin in #6212
More descriptive error message if we are using a table function as a scalar function by @Mytherin in #6201
Make NumPy dependency optional by @Mytherin in #6215
Implement FORMAT JSON for COPY/IMPORT/EXPORT by @lnkuiper in #6170
Fix ODBC CI by @Mytherin in #6216
[Python] Add read_json method by @Tishj in #6165
[C-API] Add struct list_entry, ListVector::reserve and ListVector::set_size by @eddyxu in #6155
Disable the Node build cache in the CI for now by @Mytherin in #6220
[Java] BigDecimal scale > precision bug fix by @Tishj in #6110
Fix #6044: in Value::DECIMAL, switch on the width instead of assuming the width is correctly set with the corresponding integer type by @Mytherin in #6219
Fix #6184: skip unused column removal right after a filter with entries in the projection map by @Mytherin in #6221
Bump postgres scanner by @hannes in #6226
Add support for "show" to py relation objects by @Mytherin in #6224

Full Changelog: v0.6.1...v0.7.0

duckdb 0.7.0 0.7.0 Preview Release "Labradorius" on Python PyPI

What's Changed

duckdb 0.7.0
0.7.0 Preview Release "Labradorius"

on Python PyPI