duckdb 0.5.0 on Python PyPI

This preview release of DuckDB is named "Pulchellus" after the Green pygmy goose (Nettapus pulchellus) which is native to Australia where VLDB 2022 is starting today. Despite being called a "goose" it is actually a duck.

Binary builds are listed at the bottom of this post. Feedback is very welcome.

Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.

Below a list of changes in this release

Major Changes & Features

#4189: Implement Out-of-Core Hash Join and Re-Work Query Verification
#4022: Art Index Storage
#4274: Join Order Optimizer improvements
#4420: Logical Plan Serialization
#4137, #4347, #4293, #4190, #4178, #4177, #3954 & #4159: Scalability and performance improvements for Window operator
#4004: Add support for extensions to the parser, and add an example of this to the loadable extension demo
#4089: Signed Extensions
#4097 & #4211: Filename column + Hive partitioning support for Parquet Reader
#4501, #4511: Aarch64 Linux builds of CLI, shared library, JDBC & ODBC

Minor Changes & Bug Fixes

#4594: [Map] Fix map_extract from multiple rows
#4585: Fix for r test instability, #4549
#4560: Support all basic integer types in node API
#4558: [CPP-API] Comment no longer causes crash
#4552: [Fuzzer] Issue #4152 - Remove ToString roundtrip in query verification
#4543: Fixing silent assertions
#4542: Check if database is still alive when trying to connect for nodejs
#4541: fix for issue 4533
#4539: Paralelization non-dependent on Arrow rows
#4524: Explicitly deleting default connection on js side
#4522: Correct architecture name for Linux aarch64
#4521: Adding correct substrait release tag to out-of-tree extension deployment
#4520: Added test cases for several fixed JDBC issues
#4516: Fix #4455, dont set default schema in transform
#4513: Issue 4502
#4510: [Casting] Varchar -> Decimal cast fix
#4507: [CSV] Fixed bug related to invalidated iterators
#4505: extension trigger event
#4504: fix: short-circuit hash and version discovery
#4496: [Fuzzer] Issue #4152 - Force no cross-product issue
#4495: Build ODBC driver binary for OSX
#4494: [Fuzzer] Issue #4152 - Analyze inexisting column
#4493: Declare all variables for nodejs.
#4491: Issue #4419: Range Join Swizzling
#4488: Making the parquet extension loadable
#4484: fix: ignore status message from output of mypy stubs check
#4483: [Development bug] unittest result_helper.cpp triggers assertion
#4480: Remove REST server
#4479: Remove assertion
#4477: Removing Substrait From DuckDB Repo
#4474: WIP #4152
#4472: [Python] Removed mutable default parameters
#4470: Fix hidden merge conflict with fetchmany
#4465: [Python] fetchmany implemented
#4458: Issue #4454: VARCHAR/DATE Reversibility
#4448: Issue #3954: Pinned Heap Blocks
#4440: Added support for HUGEINT input type to BIT_COUNT scalar function
#4434: Python: Add PyRelation.fetchnumpy()
#4429: Allow indicating a format version that should be used to write/read from (De)serializer and use it for plans
#4427: Python: Improve docstrings for DuckDBPyRelation and DuckDBPyResult
#4418: Fix typo
#4416: Fix several update issues
#4413: Correctly schedule mix of union/child pipelines (again)
#4409: Increase timeout for coverage checks
#4405: Hybrid ART Leaf Part I
#4404: Add support for TS_MS, TS_NS, and TS_S
#4400: Issue #4388: DATE_TRUNC Low Precision
#4398: fix: correct object return types for arrow functions
#4395: Fix name of environment variable
#4390: Support UNION BY NAME set operation
#4383: Missing LISTs are NULL
#4382: Include PID in test directory name
#4380: R: Avoid translate_duckdb() in tests
#4377: R: Full BLOB support
#4372: Fix #4370: correctly handle non-flat vectors in list_sort
#4371: [Python] Changed all RuntimeErrors thrown in the Python client
#4368: Fixes issue #4365 - Not null constraint is no longer duplicated
#4364: Allow extra parameters in list_aggr to be passed in, as long as they are constant and only used during the bind
#4363: Fix for array_position with NaNs: use Equals::Operation instead of regular equality
#4362: Allow table functions to set cardinality stats through the C API - and utilize this in Julia DataFrame scans
#4359: Mark slow tests
#4355: Fix typo in exception text
#4354: R: Use preinstalled symbol
#4353: Shell: Add missing newline in help output
#4352: Tweak contributing guide [ci skip]
#4345: [Substrait] Pushing-down projections and filters to read relation
#4340: Correctly schedule pipeline dependencies when scheduling mix of UNION and FULL OUTER JOINs
#4336: feat: add basic json support to jdbc client
#4334: Bring ibis/substrait tests to a sane state
#4332: Fix Julia parallelism interleaving with the garbage collector, and expose Pending Query Result in C interface
#4328: Allow specifying a custom home directory using the SET home_directory option
#4327: [Aggregate] DISTINCT aggregates without GROUP BY are now executed in parallel
#4324: Fix #4309: fix for multiple foreign key constraints on the same table-table pair
#4323: Optimizer profiling
#4322: Print NOT operator correctly
#4319: feat: add missing node versions to CI
#4317: refactor: remove dead code in python client
#4316: R: Add rlang as suggested dependency
#4315: Column Data Collection, Arrow Result conversion rework, Cross Product performance fixes & more
#4312: R: Install tidy CLI tool
#4310: R: Add test for test_all_types()
#4304: Improve numeric hash function to a better but slightly slower hash function
#4301: Add unit of measurement in timer function
#4300: Support root type on expressions #4278
#4298: Feature/nodejs client docs
#4297: fix: remove nodejs test focus
#4296: Avoid infinite loop in range(NULL)
#4294: #4276 Serializing data types on table schema in substrait
#4289: [Python/Pandas] fix +/- inf wrongly converting to NaN (NULL)
#4288: Fix fuzzer issue w.r.t. NULL values in generate_series
#4286: [Python - Relation] CreateView on a filtered relation does not cause infinite loop anymore
#4285: chore: remove cython constraint now that bug is fixed
#4284: Pandas timezone
#4283: Return errors from RecordBatchReader
#4280: R: Remove nycflights13 dependency
#4279: R: Don't export duckdb_explain()
#4277: feat: update setup.py links
#4272: Allow 0 as a seed parameter
#4266: R: Only quote non-syntactic and reserved words
#4265: Specialize LIST aggregate function implementation
#4263: R: Avoid attaching package during tests
#4259: Add ANY_VALUE agg function
#4256: Schedule child pipeline correctly
#4255: Disable ibis substrait tests for now
#4250: C API: Report appender error in case conversion fails
#4240: DELIM_JOIN now propagate statistics correctly
#4237: fix: pin cython to work around bug
#4236: Integer types now correctly increase width of DECIMAL type.
#4235: Parquet writer: Write dictionary_page_offset, and distinct_count for dictionary encoded strings/enum
#4234: Implement json_merge_patch and jsonlines output mode
#4233: feat: fix pandas types in docstrings/python types
#4230: Handle nulls in structs and lists
#4225: Add Jaro Winkler
#4215: Use right template for smallint
#4213: feat: update instructions for installing master builds in bug report template
#4212: Improve error message
#4210: PARQUET: Move StringColumnWriter dictionary to use string_t to avoid allocations
#4209: Remove unused PhysicalTypes
#4207: Disable GC during Julia execution to avoid internal GC deadlock in DataFrame scan
#4206: Fix #4202: in the comparison simplification optimizer, we can only shift the cast to the constant if both casts are invertible
#4199: feat: Use pip to install and uninstall python client
#4198: [capi] impl clear bindings for prepared stmt
#4197: feat: port bug_report.md to bug_report.yml
#4196: Fix RTTI issue across extension boundaries on OSX
#4192: Correctly call SetFilePointerEx on Windows so the truncate works as expected
#4191: Fix Expanded CI test case by adding swap space to test
#4188: ALTER SEQUENCE IF EXISTS fix
#4187: [Storage] FOR compression
#4185: ISSUE #3248 Support for ALTER TABLE altering columns NOT NULL
#4183: Julia multi-threading fix: avoid using a time-out to cancel threads in case there are no tasks
#4179: node: add async-iterator-based streaming
#4175: [CI] Python Build with Sanitizer
#4172: Update stubs test
#4168: Issue #4161: Create WindowExecutor
#4167: node: report memory usage to the node GC
#4166: Fix #4165: correctly fill in false_sel when performing comparison with constant null value
#4160: node: don't crash on syntax errors
#4154: Making date_trunc statistics handling consistent with date_part
#4153: Support for int64 round trips in R driver using the bit64 package
#4151: Fix orrify merge conflict
#4143: Correctly handle query parameters in JDBC
#4140: CI Fixes
#4139: Remove redundant code
#4138: Support struct.* to retrieve all struct fields in SELECT list
#4134: Fuzzer Fixes
#4133: Remove DUCKDB_API for deletes. (For Windows/ZIG)
#4132: [Python] project now correctly inherits owning references to PyObjects
#4131: Missing error messages
#4125: Fix Orrify rename merge conflicts
#4124: [Substrait] [Python] [R] Upgrade Substrait and introduce function to export query plan as a substrait - JSON
#4117: (Hopefully) fix signing extension signing on master
#4112: PARQUET: Add data pages encodings to their metadata
#4111: Fix off-by-one in plan cost regression test script
#4110: Rename Orrify -> ToUnifiedFormat, VectorData -> UnifiedVectorFormat, Normalify -> Flatten
#4108: ODBC: fixing multicolumn parameter binding
#4107: Refactor: rename simple aggregate to ungrouped aggregate
#4104: Support Parquet's RLE_DICTIONARY encoding for string columns
#4103: Ntile fixes
#4101: Some follow up fixes for extension signing
#4096: Implement ANALYZE
#4093: Support ORDER BY and LIMIT in correlated subqueries, and add support for the ARRAY(subquery) syntax
#4090: Fix for non varchar input for sequence functions
#4088: Fix Issue #3813 - fixedsize PyArrow List -> DuckDB conversion
#4083: JDBC Change getTimestamp to throw an error for wrong data types
#4080: Several parser improvements
#4076: Unentangle Parquet ColumnWriter and StandardColumnWriterState
#4075: feat(breaking): improve python exceptions
#4070: [JDBC] CachedRowSet support
#4069: Improve error messages of extension install
#4068: Fix bug with PhysicalStreamingWindow
#4065: Better handling plus encoding in urls
#4061: Fix #3991: use case_insensitive_map for headers
#4060: Null handling unification
#4059: Prepared Statement Verification & many prepared statement fixes
#4058: nodejs: use less memory in each
#4057: Fixed an error in comment
#4053: [R] [CI] Run arrow test single threaded to avoid wrong fp comparison
#4050: Bump sqlite scanner version
#4049: Remove need for locks in TPC-H dbgen
#4048: Test query profiler shouldn't output profiling info to the console
#4045: Making delayload flags dependent on whether we are NOT doing a static…
#4044: Issue #3593: avoid duplicate eliminating correlated columns in subqueries when they involve LIST columns
#4039: Making memory leak sanitizer happy with DuckDB Shell
#4035: Fix several memory-allocation related issues - use Allocator in many places, and reduce many allocations all over
#4033: Plan cost regression tests
#4032: Add missing python test dependencies
#4031: Fix issue 3989
#4012: Fix amalgamated build with multiple .cpp
#4011: Fix amalgamation script when --splits is used
#4009: EXPLAIN ANALYSE should honor profiler output format
#4005: Fix for #3997
#4002: fix fts/httpfs include directories
#3999: Include guard renaming for amalgamation export
#3996: Fix for issue #3951
#3990: Substrait Interface in R API
#3988: feat: implement DuckDBConnection#getSchema for JDBC
#3985: Pandas->DuckDB Series of dtype='O' conversion
#3982: Expose dbgen text buffer size as a parameter and Python Replacement Scans Leak fix
#3978: Enhance bound parameters error message
#3977: Adding alias part 2
#3973: Using aggregate input data for aggregate functions
#3971: Issue #3079: When installed system RAM cannot be determined, default to no memory limit
#3967: Use fmt library for Value::ToString of float/double types
#3965: Fix #3942: avoid converting + to space in httplib::decode_url
#3964: Add support for DATEFORMAT and TIMESTAMPFORMAT to COPY TO
#3963: Atomic extension install: use UUID in temp file
#3961: Fix #3960: avoid returning an error when a blob contains a NULL character in duckdb_append_blob
#3958: Fix #3955: correctly compute width/scale when combining decimal type of different width/scale
#3957: [Java] Implement appender support for all? UTF-8 characters 😜
#3953: Fix missing LIST type in duckdb_types
#3952: Windows FileExists regression fix: need to use _wstati64 instead of _wstat64i32
#3950: Atomic extension installation
#3945: Fuzzer #55: Remove Normalify Call
#3939: Issue #3937: Casting infinite times
#3928: Adding alias type struct and map
#3927: Fix failing TPC-E test
#3925: New Julia package requires 0.4 of DuckDB_jll
#3921: Retire LogicalTypeId::HASH and replace it with LogicalTypeId::UBIGINT
#3919: ODBC: SingleExecuteStmt and error message
#3918: Julia compat version
#3917: Ignore invalid UTF8 in fuzzer scripts
#3916: Julia Guidelines fix
#3915: Add duckdb_extensions function
#3914: Expanding jdbc deploy script to be able to automatically release, too
#3912: Julia UUID and version bump
#3911: Making universal builds of OSX Extensions
#3910: Fix for export of current_time, current_timestamp, etc functions
#3909: More fuzzer fixes
#3903: Issue #3881: DATE_TRUNC statistics
#3900: Add newlines at EOF
#3897: feat: add extension load/install methods to python client
#3882: Uncompressed string improvements
#3868: Bump yyjson version
#3867: Enable exporting macro's
#3866: Add default for function NULL handling
#3864: [Python] Relation Explain
#3853: Feature/struct_insert function
#3814: Expose dbgen text buffer size as a parameter
#3694: List lambdas
#3618: Struct Types for Node.js UDFs
#3600: Issue #1466: added map_from_entries function

duckdb 0.5.0 0.5.0 Preview Release "Pulchellus" on Python PyPI

Major Changes & Features

Minor Changes & Bug Fixes

duckdb 0.5.0
0.5.0 Preview Release "Pulchellus"

on Python PyPI