github dagster-io/dagster 0.4.0

API Changes

  • There is now a new top-level configuration section storage which controls whether or not
    execution should store intermediate values and the history of pipeline runs on the filesystem,
    on S3, or in memory. The dagster CLI now includes options to list and wipe pipeline run
    history. Facilities are provided for user-defined types to override the default serialization
    used for storage.
  • Similarily, there is a new configuration for RunConfig where the user can specify
    intermediate value storage via an API.
  • OutputDefinition now contains an explicit is_optional parameter and defaults to being
    not optional.
  • New functionality in dagster.check: is_list
  • New functionality in dagster.seven: py23-compatible FileNotFoundError, json.dump,
    json.dumps.
  • Dagster default logging is now multiline for readability.
  • The Nothing type now allows dependencies to be constructed between solids that do not have
    data dependencies.
  • Many error messages have been improved.
  • throw_on_user_error has been renamed to raise_on_error in all APIs, public and private

GraphQL

  • The GraphQL layer has been extracted out of Dagit into a separate dagster-graphql package.
  • startSubplanExecution has been replaced by executePlan.
  • startPipelineExecution now supports reexecution of pipeline subsets.

Dagit

  • It is now possible to reexecute subsets of a pipeline run from Dagit.
  • Dagit's Execute tab now opens runs in separate browser tabs and a new Runs tab allows you to
    browse and view historical runs.
  • Dagit no longer scaffolds configuration when creating new Execute tabs. This functionality will
    be refined and revisited in the future.
  • Dagit's Explore tab is more performant on large DAGs.
  • The dagit -q command line flag has been deprecated in favor of a separate command-line
    dagster-graphql utility.
  • The execute button is now greyed out when Dagit is offline.
  • The Dagit UI now includes more contextual cues to make the solid in focus and its connections
    more salient.
  • Dagit no longer offers to open materializations on your machine. Clicking an on-disk
    materialization now copies the path to your clipboard.
  • Pressing Ctrl-Enter now starts execution in Dagit's Execute tab.
  • Dagit properly shows List and Nullable types in the DAG view.

Dagster-Airflow

  • Dagster-Airflow includes functions to dynamically generate containerized (DockerOperator-based)
    and uncontainerized (PythonOperator-based) Airflow DAGs from Dagster pipelines and config.

Libraries

  • Dagster integration code with AWS, Great Expectations, Pandas, Pyspark, Snowflake, and Spark
    has been reorganized into a new top-level libraries directory. These modules are now
    importable as dagster_aws, dagster_ge, dagster_pandas, dagster_pyspark,
    dagster_snowflake, and dagster_spark.
  • Removed dagster-sqlalchemy and dagma

Examples

  • Added the event-pipeline-demo, a realistic web event data pipeline using Spark and Scala.
  • Added the Pyspark pagerank example, which demonstrates how to incrementally introduce dagster
    into existing data processing workflows.

Documentation

  • Docs have been expanded, reorganized, and reformatted.
latest releases: 0.10.0, 0.10.0.pre0, 0.9.22.post0...
21 months ago