github mage-ai/mage-ai 0.8.83
0.8.83 | Fury of the Gods Release

latest releases: 0.9.70, 0.9.68, 0.9.66...
11 months ago

image

Support more complex streaming pipeline

More complex streaming pipeline is supported in Mage now. You can use more than transformer and more than one sinks in the streaming pipeline.

Here is an example streaming pipeline with multiple transformers and sinks.

Untitled

Doc for streaming pipeline: https://docs.mage.ai/guides/streaming/overview

Custom Spark configuration

Allow using custom Spark configuration to create Spark session used in the pipeline.

spark_config:
  # Application name
  app_name: 'my spark app'
  # Master URL to connect to
  # e.g., spark_master: 'spark://host:port', or spark_master: 'yarn'
  spark_master: 'local'
  # Executor environment variables
  # e.g., executor_env: {'PYTHONPATH': '/home/path'}
  executor_env: {}
  # Jar files to be uploaded to the cluster and added to the classpath
  # e.g., spark_jars: ['/home/path/example1.jar']
  spark_jars: []
  # Path where Spark is installed on worker nodes,
  # e.g. spark_home: '/usr/lib/spark'
  spark_home: null
  # List of key-value pairs to be set in SparkConf
  # e.g., others: {'spark.executor.memory': '4g', 'spark.executor.cores': '2'}
  others: {}

Doc for running PySpark pipeline: https://docs.mage.ai/integrations/spark-pyspark#standalone-spark-cluster

Data integration pipeline

DynamoDB source

New data integration source DynamoDB is added.

Doc: https://github.com/mage-ai/mage-ai/blob/master/mage_integrations/mage_integrations/sources/dynamodb/README.md

Bug fixes

  • Use timestamptz as data type for datetime column in Postgres destination.
  • Fix BigQuery batch load error.

Show file browser outside edit pipeline

Improved the file editor of Mage so that user can edit the files without going into a pipeline.

Untitled

Add all file operations

Untitled

Speed up writing block output to disk

Mage uses Polars to speed up writing block output (DataFrame) to disk, reducing the time of fetching and writing a DataFrame with 2 million rows from 90s to 15s.

Add default .gitignore

Mage automatically adds the default .gitignore file when initializing project

.DS_Store
.file_versions
.gitkeep
.log
.logs/
.preferences.yaml
.variables/
__pycache__/
docker-compose.override.yml
logs/
mage-ai.db
mage_data/
secrets/

Other bug fixes & polish

  • Include trigger URL in slack alert.

Untitled

  • Fix race conditions for multiple runs within one second
  • If DBT block is language YAML, hide the option to add upstream dbt refs
  • Include event_variables in individual pipeline run retry
  • Callback block
    • Include parent block uuid in callback block kwargs
    • Pass parent block’s output and variables to its callback blocks
  • Delete GCP cloud run job after it's completed.
  • Limit the code block output from print statements to avoid sending excessively large payload request bodies when saving the pipeline.
  • Lock typing extension version to fix error TypeError: Instance and class checks can only be used with @runtime protocols.
  • Fix git sync and also updates how we save git settings for users in the backend.
  • Fix MySQL ssh tunnel: close ssh tunnel connection after testing connection.

View full Changelog

Don't miss a new mage-ai release

NewReleases is sending notifications on new releases.