Various improvements to EmrEtlRunner and StorageLoader, including turning them into JRuby apps and unifying their configuration file
Common
- Added Ruby script to generate unified config.yml and iglu-resolver.json from runner.yml and loader.yml (#1774)
- Added postgres.yml to up.playbooks (#1767)
- Added Vagrant push script to publish Ruby apps (#1784)
Enrich
- Moved enrichments folder out of EmrEtlRunner (#1574)
- Changed campaign_attribution.json configuration to true (#1608)
EmrEtlRunner & StorageLoader
- Unified the config file format (#878)
- Added support for compressing enriched events, thanks @danisola! (#1265)
- Now supports environment variables in YML config files, thanks @epantera! (#1215)
EmrEtlRunner
- Bumped to 0.17.0
- Added retry logic for EMR bootstrap timeouts (#354)
- Added Snowplow event tracking (#678)
- Added tags for monitoring to config.yml (#1163)
- Improved hierarchy in config.yml (#1447)
- Added Snowplow tracking to config.yml (#1448)
- Moved Iglu resolver into dedicated CLI argument (#1542)
- Renamed archive step to archive_raw (#1543)
- Bumped Sluice to 0.2.2 (#1566)
- Removed use of symbols for properties in YAML configuration (#1572)
- Allowed nil for config.yml's bootstrap field (#1575)
- Simplified trail slash code now that nils are supported (#1588)
- Pinned Contracts to 0.7 (#1590)
- Now fails job if odd number of lzo files in processing (#1728)
- Added an early check that shredded is empty (#1749)
- Allowed config to be passed in via stdin (#1772)
- Added Rake task to build app (#1786)
- Moved Logging module into new Monitoring module (#1797)
- Ensured that _SUCCESS file is written last for enriched events in S3 (#1808)
- Replaced m1.small with m1.medium in config.yml, thanks @danrama! (#1826)
- Recovered from 500 error while checking job status (#1828)
- Recovered from IOError while checking job status (#1881)
- Changed .ruby-version to "jruby" (#1888)
- Now only accepts an array of in buckets (#1910)
- Validated output_compression configuration using contract (#1820)
- Handled exception when the connection times out when checking the cluster, thanks @danisola! (#1599)
- Bumped Elasticity to 6.0.3 (#1939)
StorageLoader
- Bumped to 0.4.0
- Allowed config to passed in via stdin (#1773)
- Added ability to bundle as a JRuby fat jar (#675)
- Started loading Postgres via stdin, thanks @mrwalker! (#624)
- Added Snowplow event tracking (#679)
- Updated to use EmrEtlRunner's expanded config.yml (#1191)
- Pinned Contracts to 0.7 (#1497)
- Moved "include Contracts" (#1499)
- Renamed archive step to archive_enrich (#1544)
- Bumped Sluice to 0.2.2 (#1567)
- Removed use of symbols for properties in YAML configuration (#1573)
- Added Rake task to build app (#1787)
- Scrubbed credentials from stderr (#1918)
- Added test suite (#1919)
- Ensured that _SUCCESS file is written last for enriched events archived to S3 (#1814)
- Started automatically converting "s3n" to "s3" in copy statements (#1937)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/emr_job_started (#1875)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/emr_job_succeeded (#1876)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/emr_job_failed (#1877)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/emr_job_status (#1878)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/jobflow_step_status (#1879)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/load_succeeded (#1884)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/load_failed (#1885)
- Wrote JSON path file for com.snowplowanalytics.monitoring.batch/application_context (#1942)
Deduplication
- Added timetracking and updated schema name (#1962)
Redshift
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/emr_job_started (#1870)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/emr_job_succeeded (#1871)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/emr_job_failed (#1872)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/emr_job_status (#1873)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/jobflow_step_status (#1874)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/load_succeeded (#1882)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/load_failed (#1883)
- Added Redshift DDL for com.snowplowanalytics.monitoring.batch/application_context (#1943)