aws/aws-sdk-pandas 3.0.0rc3 on GitHub

What's Changed

Breaking changes:

breaking change: Move dependencies to optional by @jaidisido in #1992
breaking change: Use ExecuteStatement instead of Scan for DynamoDB read_partiql by @jaidisido in #1964

Features/Enhancements:

enhancement: Refactor engine switching when Ray is installed by @LeonLuttenberger in #1792
logging: Enable user to configure RayLogger by @jaidisido in #1801
enhancement: Add support for boto3 kwargs to timestream.create_table by @cnfait in #1819
enhancement: Upgrade Ray to 2.2.x and PyArrow to 7+ by @LeonLuttenberger in #1865
enhancement: Unload ray default max file size by @kukushking in #1912
enhancement: Remove session serialization/deserialization by @kukushking in #1957
enhancement: Unify return values for write json by @LeonLuttenberger in #1960
feature: Log data sizes in load test benchmarks by @LeonLuttenberger in #1949
enhancement: Add write_table_args by @kukushking in #1978
feature: Distribute DynamoDB Parallel Scan by @jaidisido in #1981
enhancement: Use fast file metadata provider by @kukushking in #1997
enhancement: Add names parameter support to PyArrow reading by @LeonLuttenberger in #2008
enhancement: Add support for JSON PyArrow data source by @LeonLuttenberger in #2019
enhancement: Set ray.data parallelisation to -1 by default by @jaidisido in #2022
enhancement: Add distributed variant of the _read_parquet_metadata_file function based on the PyArrow file system by @LeonLuttenberger in #2050
feature: Add faster Pyarrow S3fs listing in distributed mode by @jaidisido in #2030
feature: Validate distributed kwargs by @kukushking in #2051
enhancement: Distribute S3 describe_objects by @jaidisido in #2069
feature: Distributed S3 copy/merge by @kukushking in #2070
enhancement: Add bulk_read option for reading large amounts of Parquet files quickly by @LeonLuttenberger in #2033
enhancement: Upgrade ray to 2.3 by @jaidisido in #2084
enhancement: Extract parallelism and bulk_read into ray_modin_args by @LeonLuttenberger in #2081
deprecate: boto3 resources by @kukushking in #2097

Fixes:

fix: Check row count before creating the Ray dataset in S3 Select by @kukushking in #1808
fix: Allow to pass pandas dfs to Ray/Modin calls by @kukushking in #1812
fix: Fix empty arrow refs by @kukushking in #1816
fix: Sanitize column names modifying the data frame in distributed mode by @LeonLuttenberger in #1926

Documentation:

docs: Add AWS Glue on Ray docs by @jaidisido in #1810
docs: Clarify datasource.on_write_complete docs by @kukushking in #2100

Tests:

tests: Add tests for Glue Ray jobs by @LeonLuttenberger in #1832
tests: Remove awswrangler.distributed from coverage report by @LeonLuttenberger in #1884
tests: Create oad Testing Benchmark Analytics by @malachi-constant in #1905
tests: Adjust load test benchmark values by @malachi-constant in #1910
tests: Remove exports from glueray stack by @malachi-constant in #2020
tests: Add test_modin_s3_read_parquet_many_files by @LeonLuttenberger in #2096

Full Changelog: 3.0.0rc2...3.0.0rc3

aws/aws-sdk-pandas 3.0.0rc3 AWS SDK for pandas 3.0.0rc3 on GitHub

What's Changed

Breaking changes:

Features/Enhancements:

Fixes:

Documentation:

Tests:

aws/aws-sdk-pandas 3.0.0rc3
AWS SDK for pandas 3.0.0rc3

on GitHub