Support more complex streaming pipeline
More complex streaming pipeline is supported in Mage now. You can use more than transformer and more than one sinks in the streaming pipeline.
Here is an example streaming pipeline with multiple transformers and sinks.
Doc for streaming pipeline: https://docs.mage.ai/guides/streaming/overview
Custom Spark configuration
Allow using custom Spark configuration to create Spark session used in the pipeline.
spark_config:
# Application name
app_name: 'my spark app'
# Master URL to connect to
# e.g., spark_master: 'spark://host:port', or spark_master: 'yarn'
spark_master: 'local'
# Executor environment variables
# e.g., executor_env: {'PYTHONPATH': '/home/path'}
executor_env: {}
# Jar files to be uploaded to the cluster and added to the classpath
# e.g., spark_jars: ['/home/path/example1.jar']
spark_jars: []
# Path where Spark is installed on worker nodes,
# e.g. spark_home: '/usr/lib/spark'
spark_home: null
# List of key-value pairs to be set in SparkConf
# e.g., others: {'spark.executor.memory': '4g', 'spark.executor.cores': '2'}
others: {}
Doc for running PySpark pipeline: https://docs.mage.ai/integrations/spark-pyspark#standalone-spark-cluster
Data integration pipeline
DynamoDB source
New data integration source DynamoDB is added.
Bug fixes
- Use
timestamptz
as data type for datetime column in Postgres destination. - Fix BigQuery batch load error.
Show file browser outside edit pipeline
Improved the file editor of Mage so that user can edit the files without going into a pipeline.
Add all file operations
Speed up writing block output to disk
Mage uses Polars to speed up writing block output (DataFrame) to disk, reducing the time of fetching and writing a DataFrame with 2 million rows from 90s to 15s.
Add default .gitignore
Mage automatically adds the default .gitignore
file when initializing project
.DS_Store
.file_versions
.gitkeep
.log
.logs/
.preferences.yaml
.variables/
__pycache__/
docker-compose.override.yml
logs/
mage-ai.db
mage_data/
secrets/
Other bug fixes & polish
- Include trigger URL in slack alert.
- Fix race conditions for multiple runs within one second
- If DBT block is language YAML, hide the option to add upstream dbt refs
- Include event_variables in individual pipeline run retry
- Callback block
- Include parent block uuid in callback block kwargs
- Pass parent block’s output and variables to its callback blocks
- Delete GCP cloud run job after it's completed.
- Limit the code block output from print statements to avoid sending excessively large payload request bodies when saving the pipeline.
- Lock typing extension version to fix error
TypeError: Instance and class checks can only be used with @runtime protocols
. - Fix git sync and also updates how we save git settings for users in the backend.
- Fix MySQL ssh tunnel: close ssh tunnel connection after testing connection.
View full Changelog