github dataform-co/dataform 2.7.0
2.7.0: Updates for Dataform GCP incremental SQL

latest releases: 3.0.2, 3.0.1, 3.0.0...
11 months ago

From version 2.7.0 onwards, Dataform projects running on Google Cloud Platform will use updated SQL generation logic for incremental insert tables (tables of type incremental without a uniqueKey specified).

Explicit column names

Column names will be explicitly listed in the insert call, which is inline with OSS Dataform behaviour and prevents schema mismatch during insert, for example:

  • if source_table has columns in a different order that the target - it can lead to data corruption (column values can be swapped during insert)
  • if source_table has different number of columns - the insert into fails since target table columns count does not match with the source

For the new script the incremental query has to list all target table columns (can list other extra columns, but at least must contain target table columns) in any order.

Example of new generated SQL:

INSERT INTO $target_table 
    ($target_columns_list) -- listing target columns
    SELECT target_columns_list -- reordering columns so that subquery column order matches the target column order
    FROM (
        $incrementalQuery
    );

Execution within a procedure

In order to facilitate explicit columns, the new code is executed within a procedure, which will be created on the fly. For example:

EXECUTE IMMEDIATE
"""
CREATE OR REPLACE PROCEDURE $procedure_name()
BEGIN
    $preOperations
    $incrementalInsertStatement
    $postOperations
END;
"""
CALL $procedure_name();
DROP PROCEDURE IF EXISTS $procedure_name;

Don't miss a new dataform release

NewReleases is sending notifications on new releases.