Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
- Clarified docs around potential in-place mutation of dataframe when using
to_parquet
#669
Enhancements
- Enable parallel s3 downloads (~20% speedup) 🚀 #644
- Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
- Enable
LOCK
before concurrentCOPY
calls in Redshift #665 - Make use of Pyarrow
iter_batches
(>= 3.0.0 only) #660 - Enable additional options when overwriting Redshift table (
drop
,truncate
,cascade
) #671 - Reuse s3 client across threads for s3 range requests #684
Bug Fix
- Add
dtypes
for empty ctas athena queries #659 - Add Serde properties when creating CSV table #672
- Pass SSL properties from Glue Connection to MySQL #554
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!