aws/aws-sdk-pandas 2.14.0 on GitHub

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Support Athena Unload 🚀 #1038

Enhancements

Add the ExcludeColumnSchema=True argument to the glue.get_partitions call to reduce response size #1094
Add PyArrow flavor argument to write_parquet via pyarrow_additional_kwargs #1057
Add rename_duplicate_columns and handle_duplicate_columns flag to sanitize_dataframe_columns_names method #1124
Add timestamp_as_object argument to all database read_sql_table methods #1130
Add ignore_null to read_parquet_metadata method #1125

Documentation

Improve documentation on installing SAR Lambda layers with the CDK #1097
Fix broken link to tutorial in to_parquet method #1058

Bug Fix

Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
Fix bucketing overflow issue in Athena #1086

Thanks

We thank the following contributors/users for their work on this release:

@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

aws/aws-sdk-pandas 2.14.0 AWS Data Wrangler 2.14.0 on GitHub