From version 2.7.0 onwards, Dataform projects running on Google Cloud Platform will use updated SQL generation logic for incremental insert tables (tables of type incremental
without a uniqueKey
specified).
Explicit column names
Column names will be explicitly listed in the insert call, which is inline with OSS Dataform behaviour and prevents schema mismatch during insert, for example:
- if
source_table
has columns in a different order that the target - it can lead to data corruption (column values can be swapped during insert) - if
source_table
has different number of columns - the insert into fails since target table columns count does not match with the source
For the new script the incremental query has to list all target table columns (can list other extra columns, but at least must contain target table columns) in any order.
Example of new generated SQL:
INSERT INTO $target_table
($target_columns_list) -- listing target columns
SELECT target_columns_list -- reordering columns so that subquery column order matches the target column order
FROM (
$incrementalQuery
);
Execution within a procedure
In order to facilitate explicit columns, the new code is executed within a procedure, which will be created on the fly. For example:
EXECUTE IMMEDIATE
"""
CREATE OR REPLACE PROCEDURE $procedure_name()
BEGIN
$preOperations
$incrementalInsertStatement
$postOperations
END;
"""
CALL $procedure_name();
DROP PROCEDURE IF EXISTS $procedure_name;