Extensive ETL improvements, including adding support for the recent changes to the CloudFront access log file format.
Hadoop ETL
- Bumped to 0.3.5
- Added Argonaut 6.0 as a dependency (#342)
- Added fromTimestamp to EventEnrichments (#340)
- Added makeTsvSafe to ConversionUtils (#338)
- Added JsonUtils (#323)
- Added support for 3 and 4 return values from MapTransformer (#324)
- Updated GetJsonPayload to use Argonaut and renamed to JsonPayload (#339)
- Added ability to mask IP addresses in ETL (#309)
- refr_ and page_ fields now stored raw (#374)
- Defensively fixed raw spaces in page and referer URLs (#346)
- Fixed regression, single-encoded %s logic didn't account for % itself (#347)
- Added unit tests for fixTabsNewlines (#332)
- Tests now report the failing CanonicalOutput field (#325)
- Now handling all fields double-encoded as per CloudFront post-14-September (#348)
- Added support for 21 Oct CloudFront access log format (#384)
- Added truncation to refr_term (#379)
- Added truncation to se_label (#394)
- Made all prior ME.identity fields TSV-safe (#395)
EmrEtlRunner
- Bumped to 0.5.0
- Bumped Sluice to 0.1.5 (#96)
- Bumped Elasticity to 2.6 (#345)
- Enabled EMR Job Flow debugging for easier access to logs (#279)
- ETL job no longer fails if there's no data for last run period (#296)
- Empty processing dir check now works if dir contains 1 file (#326)
- Added ability to mask IP addresses in ETL (#309)
- Made the examples match what you get from git out of the box, thanks @shermozle (#331)
StorageLoader
- Bumped to 0.1.1
- Bumped Sluice to 0.1.5 (#96)
- Fixed "" in fields acts as an escape character for Postgres, thanks @kingo55 (#329)
- Added ability to --skip analyze (#335)
- Moved VACUUM SORT ONLY to a --include step (#321)
- Added COMPROWS to config and --include compupdate option (#344)
- Changed Postgres VACUUM FULL to VACUUM (#357)
- Added TRUNCATECOLUMNS for Redshift load (#360)
- Added FILLRECORD to our Redshift COPY command (#380)
Postgres
- Fixed error in
recipes_basic.technology_mobile
recipe (#397)