Introduces event de-duplication across different pipeline runs, powered by DynamoDB, along with an important refactoring of the batch pipeline configuration
Documentation
Documentation: fix incorrect release date for R87 (#3126)
Common
- Update copyright years in README (#3148)
- Add CI/CD for EmrEtlRunner and StorageLoader (#3102)
- Add CI/CD for Event Manifest Populator (#3170)
- Add AWS staging credentials to .travis.yml (#3114)
- Update script to sync ap-northeast-2 (Seoul) Snowplow Hosted Assets bucket (#3160)
- Update READMEs markdown in according with CommonMark (#3157)
Event Manifest Populator
- Add Spark job to backpopulate DynamoDB duplicate storage (#3158)
Scala Common Enrich
Scala Hadoop Shred
- Bump to 0.11.0 (#3041)
- Bump sbt-assembly to 0.14.4 (#3140)
- Bump SBT to 0.13.13 (#2972)
- Remove explicit jackson-databind dependency (#3138)
- Add cross-batch natural deduplication (#2999)
Storage
- Add example storage target configuration JSONs (#2990)
StorageLoader
- Bump to 0.10.0 (#3109)
- Remove Northern Virginia endpoint for Postgres load (#3143)
- Handle return code of 4 for EmrEtlRunner in snowplow-runner-and-loader.sh (#3139)
- Use storage target JSONs instead of targets section in config.yml (#2992)
- Replace table configuration property with schema (#2458)
EmrEtlRunner
- Bump to 0.24.0 (#3040)
- Update hadoop_shred version in config.yml.sample to 0.11.0 (#3197)
- Add script to convert config.yml targets section into JSON format (#3135)
- Remove targets section from config.yml.sample (#2989)
- No longer use sources property when loading Elasticsearch (#2993)
- Use storage target JSONs instead of targets section in config.yml (#2991)