duckdb 0.10.2 on Python PyPI

This is a bug fix release for various issues discovered after we released 0.10.1. There are no new features, just bug fixes. Database files created by DuckDB v0.10.* or v0.9.* can be read by DuckDB v0.10.2.

SQL Modifications

This release has a number of bug fixes that change SQL semantics in a few edge cases:

Nested Boolean Comparisons now have consistent NULL comparison semantics - #11496
Structs with non-matching keys require explicit casts when compared or combined - #11396

What's Changed

Bump julia version & fix release script for sub-versions > 9 by @Mytherin in #11225
Flatten Rewrite by @maiadegraaf in #11223
ORDER BY ColumnNumber with Collations by @tiagokepe in #11139
Fix differences to implementation for to_parquet, write_parquet, to_csv, write_csv, Expression.alias, DuckDBPyRelation.map by @binste in #11135
Issue template: Ask for MWEs by @szarnyasg in #11192
Cleaning up FSST: Remove unused AVX512 code by @hannes in #11222
Fix #11211 - correctly fill in string_t padding for bit type by @Mytherin in #11231
Fix #3391: Stop creating background threads if the thread constructor throws an exception by @Mytherin in #11236
R_CMD_CHECK: Pin to duckdb/duckdb-r 0ed106a71c by @carlopi in #11245
Add support for HEX(BLOB) by @Mytherin in #11243
Remove no_vector_verification in Map Subscript Test by @maiadegraaf in #11242
Update logos in README by @szarnyasg in #11256
Ignore user defined parameters that change names or types of csv columns in sniffer's prompt. by @pdet in #11257
[Python] Fix error caused by looking up a TypeCatalogEntry without an active transaction. by @Tishj in #11255
[Fix] Fuzzer issue in list_select by @taniabogatsch in #11248
[Parquet] Support for LZ4 Compression by @hannes in #11220
Fix #11254: Add missing includes to terminal by @Mytherin in #11265
Issue #10867: AsOf Predicate Pushdown by @hawkfish in #11233
Fix plan cost runner regression script by @Tmonster in #11129
Check if we need to throw any remaining errors at end of CSV scanning by @pdet in #11276
Allow duplicate names in json objects when ignore_errors is true by @lnkuiper in #11271
Do not surround JSON with quotes in sqlite shell output by @lnkuiper in #11268
add TRIM support to virtual filesystem, and implementation on linux by @jkub in #11258
Perform direct write operation if input data are larger than buffer size by @quentingodeau in #11203
Fuzzer fixes by @lnkuiper in #11286
Compile spatial also for rtools by @carlopi in #11291
allow injecting custom BufferManager implementation by @jkub in #11215
Default to RECORDS in JSON reader if more than one column is specified by @lnkuiper in #11295
Add support for materialized CTEs in INSERT/UPDATE/DELETE statements by @kryonix in #10878
Only throw exception if je_mallctl fails in DEBUG mode by @lnkuiper in #11303
Fixing casting issue in generators by @hannes in #11304
Rework FileSystem::OpenFile call, and add FILE_FLAGS_NULL_IF_NOT_EXISTS by @Mytherin in #11297
Fix potential UB when list() aggregate is used in combination with other arena using aggregate functions by @Maxxen in #11306
Fix #11293 - for ARRAY([subquery]) explicitly push the ORDER BY of the underlying subquery into the array aggregate by @Mytherin in #11316
Fix #11281: explicitly select column types of information_schema tables for all columns, even if they are always NULL by @Mytherin in #11317
Fixup py upload by @carlopi in #11308
Issue #11279: TIMESTAMP => TIMESTAMPTZ by @hawkfish in #11320
Fix null pointer exception when rolling back updates if the rollback was caused by an OOM by @Mytherin in #11309
Fix #11283 - report consistent foreign key constraint name in information_schema by @Mytherin in #11318
Fix #11294 - avoid applying Filter Pushdown optimization for UNION/EXCEPT without ALL by @Mytherin in #11315
Fix #10695 - handle ? prepared statement parameters correctly for POSITION(x IN y) by @Mytherin in #11314
Windows CLI - emit UTF8 directly using SetConsoleOutputCP(CP_UTF8) if possible by @Mytherin in #11324
Fix #11319: use modulo when computing day of the week in excel extension by @Mytherin in #11328
[CI] Fix bash syntax in TwineUpload by @carlopi in #11333
Fix #11284: avoid adding the same column multiple times to a primary key/unique constraint name list by @Mytherin in #11325
In ColumnData, limit scan to the current count in the column by @Mytherin in #11329
Issue #11269: DISTINCT Sorted Aggregates by @hawkfish in #11321
[Attach] Fix bug causing sequences to break attaching databases. by @Tishj in #11327
Flatten hash vector before combining list hashes by @lnkuiper in #11340
Make sniffer more consistent when nullpadding/ignore_errors are on by @pdet in #11313
fix(arrow): union buffer count & handle schema errors by @Mause in #11326
fix duckdb-r script by @Tmonster in #11345
Fix regression_test_runner.py by @carlopi in #11346
Issue #11234: IEJoin Scan Reset by @hawkfish in #11347
TPC-H: Use BIGINT for ID fields schema where required by the specification by @szarnyasg in #11341
Another round of polishing staged releases by @carlopi in #11342
CI: Remove issue labeling workflow by @szarnyasg in #11355
RE2 upgrade to version 2023-02-01 by @hannes in #11252
File System: Add optional_ptr<FileOpener> to various calls, and add support for attaching DuckDB files over S3 by @Mytherin in #11343
README: Display different logo for light/dark mode by @szarnyasg in #11366
Fix bug in duckdb_bind_blob by @pfarndt in #11368
Fix OSX CI by @samansmink in #11379
Enable clang-tidy on headers and fix all headers to conform to our clang-tidy rules by @Mytherin in #11376
Add logical_type to parameters of format_pg_type by @Flogex in #11369
Issue #10965: RESPECT IGNORE NULLS by @hawkfish in #11372
Fix building issues in WIN32, remove unnecessary modification. by @kindred77 in #11356
Zero-initialize aggregate states with destructors immediately after allocating by @lnkuiper in #11360
Update README.md by @jingshi-ant in #11357
Issue #10885: Negative Window RANGEs by @hawkfish in #11390
Issue #11377: Invertible TIMESTAMP_XXX Casts by @hawkfish in #11392
Update init.py To export "extract_statements" function by @oomojola in #11394
Internal #1657: Stricter STRUCT Casts by @hawkfish in #11396
allow set readonly on attached db by @stephaniewang526 in #11397
Give preference to FSSPEC defined FS by @pdet in #11400
Default serialize optional_idx, add skip_default option to json_serialize_sql() by @Maxxen in #11405
CI: Also label PRs as 'stale' and close them when there's no activity by @szarnyasg in #11420
fix(jdbc): 1-index getBytes() by @Mause in #11421
Remove redundant default descriptions by @szarnyasg in #11415
clang-tidy: enable cppcoreguidelines-pro-type-const-cast by @Mytherin in #11414
clang-tidy: enable cppcoreguidelines-avoid-non-const-global-variables by @Mytherin in #11424
Issue #11419: Quantile Order By by @hawkfish in #11428
[CSV Sniffer] Give preference to quoted candidates by @pdet in #11418
clang-tidy: enable cppcoreguidelines-virtual-class-destructor by @Mytherin in #11437
clang-tidy: enable cppcoreguidelines-[interfaces-global-init|slicing|rvalue-reference-param-not-moved] by @Mytherin in #11435
Fix #11393 - improve error message when trying to use a lateral join column in a table function that does not support it by @Mytherin in #11436
add readonly to duckdb_databases() by @stephaniewang526 in #11429
Fix missing opener propagation by @quentingodeau in #11454
Fix #11246: Use SetConsoleCP function to set input to UTF8 when reading by @Mytherin in #11452
CLI: Add support for ".edit" or "\e" by @Mytherin in #11447
Fix VS2022 Preview ClangCl build by @bodand in #11456
Remove an unnecessary line from bind_insert.cpp by @huachaohuang in #11443
[CI] Skip ccache for R.yml by @carlopi in #11459
Improve binding of CTEs by @kryonix in #11399
Move BindCreateIndex from Catalog to Binder by @philippmd in #11402
[Substrait-ADBC] Fix for substrait plan execution via ADBC by @pdet in #11358
Removing abort() from RE2 again because Google refuses to use exceptions by @hannes in #11458
Defer allocation in read_json by @lnkuiper in #11378
[ODBC] Add escape character to ParseStringFilter to support Power Query ('table_name' is escaped to 'table_name') by @guenp in #11432
Bump to post-portfile change for duckdb_azure by @carlopi in #11476
Reduce memory usage of DELETE operations by @Mytherin in #11470
Use optional_idx in more places by @Mytherin in #11466
Revert "Move BindCreateIndex from Catalog to Binder" by @Mytherin in #11478
[Arrow] Throw on invalid STRUCT type by @Tishj in #11464
[Dev] Do not use CatalogEntry references inside Dependency objects. by @Tishj in #11408
Fix extension builds by @carlopi in #11486
[Fix] Throw BinderException for UNNEST expressions in WINDOW expressions by @taniabogatsch in #11247
Check for IUTF8 flag defined before setting it by @patmaddox in #11488
Fix #11445: correctly detect recursive aliases when using struct unnest by @Mytherin in #11497
Fix #11444: avoid using recursion in string -> list parsing by @Mytherin in #11498
Add serialization for LogicalCopyDatabase operator by @Flogex in #11401
add support in Julia appender for missing and nothing values by @rdavis120 in #11508
[Python] Produce datetime.time values when converting TIME columns to Pandas DataFrame by @Tishj in #11468
[Fix][ADBC] Implement required ADBCConnectionGetObjects schema by @joellubi in #11446
Add support for decimal modulo operation by @Mytherin in #11506
Move CompressedMaterialization inside of StatisticsPropagator by @lnkuiper in #11495
Bump stale bot version by @szarnyasg in #11509
Rework issue workflow by @Mytherin in #11522
[RE2] Add includes and remove potential throw from destructor by @carlopi in #11513
Issue #11292: Nested Boolean Compares by @hawkfish in #11496
[Dev] Initialize new buffers with garbage data if DESTROY_UNPINNED_BLOCKS is set by @Tishj in #11270
Fix timeout in async workflow by @samansmink in #11525
Move assertion in json_scan.cpp by @lnkuiper in #11530
Issue #11518: TryParseTime by @hawkfish in #11519
Fuzzer Bugfixes by @Maxxen in #11544
[CI] Fix CI failure on C Enum Integrity Check by @Tishj in #11547
[ICU] Use the correct lookup precedence for TimeZone settings by @Tishj in #11546
[CI] Move from default GITHUB_TOKEN to specific one by @carlopi in #11556
[CI] Fix Deploy step to execute only for duckdb organization by @carlopi in #11553
Rework RadixPartitionHashTable task assignment in source phase by @lnkuiper in #11528
Run new micro benchmarks in CI when they are added by @Tmonster in #11532
Rework vector_hash for ARRAYs by @Maxxen in #11558
[Dev] Add assertions around Uncompressed String storage by @Tishj in #11267
python: Add missing global options to write_csv by @jzavala-gonzalez in #10382
[Python] Fix issue with lists containing dictionaries of different sizes by @Tishj in #11095
[Dev][Python] Add nightly test to execute all sqllogic tests using the python package by @Tishj in #11137
Parquet Writer: Early out creating dictionary by @lnkuiper in #11461
ODBC driver should ignore "driver" and "trusted_connection" keywords in connection string by @guenp in #11382
[ODBC] Fix: Support loading UTF-8 encoded data with Power BI by @guenp in #11423
Draft permissions - bot does not have permission for drafting by @Mytherin in #11575
CI: Remove 'needs reproducible example' when 'reproduced' label is applied by @szarnyasg in #11576
Various fixes & clean-up around STRUCT UNNEST by @Mytherin in #11580
Update token by @Mytherin in #11592
Update issue template by @szarnyasg in #11577
[CI] Remove GITHUB_PAT variable from R-CMD-check by @carlopi in #11593
Respect read-only mode in dbgen and dsdgen by @Mytherin in #11585
Bump-back duckdb_azure to pre-lzma custom vcpkg-port by @carlopi in #11595
Correctly handle database names with quotes in USE statement by @Mytherin in #11587
Bump postgres version and build arrow also for windows by @carlopi in #11604
Support reading gzipped files in the test runner by @chrisiou in #11600
initializes unknown indexes on catalog lookup by @Maxxen in #11551
Fix Progress Bar for many large CSV Files + Adjustment to not store buffers from compressed files over single threaded scans by @pdet in #11273
CSV Rejects Tables 2.0 by @pdet in #11512
Fix topn placement by @Tmonster in #11601
Fix various issues found by oss-fuzz by @Mytherin in #11613
[ODBC] Fix: timestamps and times are parsed as dates by Power Query by @guenp in #11610
Fix various fuzzer issues, move fuzzer scripts into this repo, and expand reduce_sql_statement to improve test case reduction capabilities of fuzzer by @Mytherin in #11622
[Dev] Make the extension_entries.hpp generation script more modular by @Tishj in #11623
[Fix][ADBC] Don't filter system catalogs/schemas in ConnectionGetObjects by @joellubi in #11618
Add pyodide wheel building github action by @cpcloud in #11531
Move away from dynamic_cast to Cast<> infrastructure by @carlopi in #11619
Extension Metadata by @carlopi in #11515
[Dev] Regenerate query string for IndexCatalogEntry. by @Tishj in #11462
Upload pyodide by @carlopi in #11626
Add docker alpine build to check on builds by @carlopi in #11490
Add Vector Similarity Search (VSS) Extension by @Maxxen in #11614
Metadata fix by @carlopi in #11629
Fix extension config for arrow, remove patch from sqlite by @carlopi in #11628
[CSV Reader] Resets the buffer manager over recursive scans by @pdet in #11631
Make path to append_metadata.cmake relative to top-level CMakeLists.txt by @Flogex in #11635
[CSV Reader] Fixes an issue with conflicting strategies for buffer cleaning by @pdet in #11630
Fix more issues found by the fuzzer, extend SQL reduction further by @Mytherin in #11642
fix(jdbc): support non-string parameter types by @Mause in #11646
Few more fuzzer fixes by @Mytherin in #11648
Bump spatial by @Maxxen in #11650
Avoid performing Apple codesign on extensions by @carlopi in #11652
Filter out single relation predicates before join ordering by @wangxiaoying in #11645
Fix last_value in the duckdb_sequences metadata function by @Tishj in #11465
Limit batch insert threads based on available memory, similar to Parquet write by @Mytherin in #11655
[Vacuum] Fix serialization and Copy of the VacuumStatement by @Tishj in #11656
More index initialization by @Maxxen in #11659
Skip tests with the unzip keyword in python and disable unzip.test for 32bit systems by @chrisiou in #11658
Bump extension versions, remove patches by @carlopi in #11662
Accept a list of multiple nullstring values for CSV Files by @pdet in #11616
Include falloc to fix build on some Linux systems by @zmbc in #11663
Fix #11469 - make unnest parameters case-insensitive by @Mytherin in #11667
Fix #11467: correctly merge unnamed structs and structs in CombineEqualTypes by @Mytherin in #11668
Skip ADBC tests if python version is not 3.9 or higher by @pdet in #11653
Fix #11621 - correctly zero-initialize padding bits in bitpacking compression by @Mytherin in #11671
Fix #11542 - correctly check if a column data segment has updates, and clean up the updates by @Mytherin in #11670
Make UNION BY NAME also use ForceMaxLogicalType, similar to UNION by @Mytherin in #11665
Fix extension_version propagation for external extensions by @carlopi in #11672
Allow decimal type in CSV auto_type_candidates option by @pdet in #11675
Fix #11484: support constant indexes in ARRAY - e.g. ARRAY(SELECT .. ORDER BY 1) by @Mytherin in #11674
Improve hive type auto-casting so that it looks at all files instead of only the first file by @Mytherin in #11676
Fix #11669: deduplicate column names in pivot correctly by @Mytherin in #11678
Disable setting console pages by default, and add .utf8 setting by @Mytherin in #11682
bump vss, handle reverting append when index is unknown by @Maxxen in #11681

Full Changelog: v0.10.1...v0.10.2

duckdb 0.10.2 v0.10.2 Bugfix Release on Python PyPI

SQL Modifications

What's Changed

duckdb 0.10.2
v0.10.2 Bugfix Release

on Python PyPI