github dbt-labs/dbt 0.4.1
dbt version 0.4.1

dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.

0. tl;dr

  • slightly modified dbt command structure
  • unique_key setting for incremental models
  • connect to your db over ssh
  • no more model-defaults
  • multithreaded schema tests

If you encounter an SSL/cryptography error while upgrading to this version of dbt, check that your version of pip is up-to-date

pip install -U pip
pip install -U dbt

1. new dbt command structure https://github.com/analyst-collective/dbt/issues/109

# To run models
dbt run # same as before

# to dry-run models 
dbt run --dry # previously dbt test

# to run schema tests
dbt test # previously dbt test --validate

2. Incremental model improvements https://github.com/analyst-collective/dbt/issues/101

Previously, dbt calculated "new" incremental records to insert by querying for rows which matched some sql_where condition defined in the model configuration. This works really well for atomic datasets like a clickstream event log -- once inserted, these records will never change. Other datasets, like a sessions table comprised of many pageviews for many users, can change over time. Consider the following scenario:

User 1 Session 1 Event 1 @ 12:00
User 1 Session 1 Event 2 @ 12:01
-- dbt run --
User 1 Session 1 Event 3 @ 12:02

In this scenario, there are two possible outcomes depending on the sql_where chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!

With this release, you can now add a unique_key expression to an incremental model config. Records matching the unique_key will be deleted from the incremental table, then inserted as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.

The unique_key can be any expression which uniquely defines the row, eg:

sessions:
  materialized: incremental
  sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"
  unique_key: user_id || session_index

3. Run schema validations concurrently https://github.com/analyst-collective/dbt/issues/100

The threads run-target config now applies to schema validations too. Try it with dbt test

4. Connect to database over ssh https://github.com/analyst-collective/dbt/issues/93

Add an ssh-host parameter to a run-target to connect to a database over ssh. The ssh-host parameter should be the name of a Host in your ~/.ssh/config file more info

warehouse:
  outputs:
    dev:
      type: redshift
      host: my-redshift.amazonaws.com
      port: 5439
      user: my-user
      pass: my-pass
      dbname: my-db
      schema: dbt_dbanin
      threads: 8
      ssh-host: ssh-host-name  # <------ Add this line 
  run-target: dev

Remove the model-defaults config https://github.com/analyst-collective/dbt/issues/111

The model-defaults config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:

models:
    My_Package:
        enabled: true
        materialized: table
        snowplow:
            ...
latest releases: v0.20.0, v0.20.0rc2, v0.19.2...
4 years ago