Greenmask v1.0.0b1
Preface
This release marks the first beta version of Greenmask v1. It features a significant framework refactoring designed to enable support for multiple DBMSs by reusing core objects and abstractions.
This version introduces the MySQL engine. PostgreSQL support is not yet included in this release; the port of PostgreSQL support (based on v0.2.x) will be available soon.
Note: About 80% of the documentation for PostgreSQL is compatible with MySQL. We have not fully revised it for this release yet. If you have any questions or experience any issues, please open a GitHub issue or write to us on Discord.
Changelog:
- Revised Subset System: Significantly revised subset system allowing easy addition of new DBMS support via multi-dialect architecture.
- MySQL Implementation: MySQL uses CSV format via
LOAD DATA LOCAL INFILEstatement. This approach is open for discussion; feel free to open a discussion on GitHub or Discord. - MySQL Playground: Added MySQL playground for easy testing and experimentation.
- MySQL Engine Support: Added full support for MySQL database engine #312, including:
- Transformers:
- Build & Infrastructure:
MySQL Implementation
Compatibility & Dependencies:
This release has been tested against MySQL 8.0 and 8.4. Other versions might work but have not been verified.
Requirements:
Greenmask focuses on data dumping and transformation, delegating schema operations to vendor tools. Ensure mysqldump (for dumping) and mysql client (for restoration) are installed and available in your environment.
Details:
Dump foramt: MySQL uses CSV format via LOAD DATA LOCAL INFILE statement. This approach is open for discussion; feel free to open a discussion on GitHub or Discord.
Status & Limitations
Some features are currently WIP for the MySQL engine:
show-dumpcommand: Currently missing; intended to show details about a dump by its ID.- Template Language: Ported to support MySQL, but we are considering replacing it with an embedded JS interpreter for better predictability and user familiarity.
- Integration Tests: Currently in progress.
- Filtering: Including and excluding tables/schemas is currently limited. Regular expressions are not supported, and tables must be specified in the
schema.tableformat. - Virtual References: Virtual references are not supported in this release.
- Transformer Inheritance: Inheritance of transformers via Foreign Keys (FK/PK) is not supported.
- Partitioned Tables: Inheritance of transformers for partitioned tables is not supported. Transformers must be defined for each partition if needed.
- Global Salt: Setting a global salt variable for
hashordeterministicengine transformers is currently not supported. This will be fixed in future versions.
Restore & Subset Limitations
- Restoration Order: To restore data correctly and preserve Foreign Key integrity, you must use the
--restore-in-orderflag.- This ensures tables are restored in topological order (for non-circular schemas).
- Performance Impact: This mode forces sequential processing (one worker), ignoring the
--jobsflag. - Circular Schemas: This mode may fail with circular schemas.
- Workaround: To speed up restoration, you can disable MySQL Foreign Key checks on the target database, allowing parallel restoration without this flag.
- Circular References: The subset system currently cannot resolve circular schema dependencies and will panic in such cases. This is a known issue and will be fixed in future versions.
Documentation
The documentation has not yet been fully revised for v1/MySQL. However, 90% of the usage is identical to the PostgreSQL version.
- Transformers: Transformers are identical. You can refer to the Transformers documentation for detailed usage.
Configuration
Greenmask v1 does not directly support standard MySQL environment variables. Instead, configuration is handled via the config file or environment variables using capitalized attribute names with dots replaced by underscores.
- Common Options (
dump.options): These parameters are common for all database engines. - MySQL Options (
dump.mysql.options): These parameters are specific only to the MySQL engine.
Below is the list of supported settings for MySQL.
Environment Variables
Options can be set via environment variables by converting the YAML path to uppercase with underscores. For example, dump.mysql.options.user becomes DUMP_MYSQL_OPTIONS_USER.
Available Options
The options below correspond to standard
mysqldumpcommand arguments. However, only the options listed here are supported.
Dump Options (dump.mysql.options)
- Connection:
user,password,host,port,connect-database - General:
all-databases: Dump all databases.databases: List of databases to dump.no-create-info: Exclude CREATE TABLE statements.no-data: Exclude data from dump.add-drop-table: Include DROP TABLE statements.compact: Reduce dump size with minimal comments.skip-comments: Do not include comments in dump.single-transaction: Use a single transaction for the dump.quick: Fetch rows one at a time.lock-tables: Lock all tables during dump.
- Metadata:
no-tablespaces: Exclude tablespace information.
Restore Options (restore.mysql.options)
- Connection:
user,password,host,port,connect-database - General:
verbose: Enable verbose output.
Configuration Example
dump:
tag:
- "v1"
- "daily"
options:
# Common Dump Options
# Dump only data, without schema
data-only: false # DUMP_OPTIONS_DATA_ONLY
# Dump only schema, without data
schema-only: false # DUMP_OPTIONS_SCHEMA_ONLY
# Number of parallel jobs
jobs: 1 # DUMP_OPTIONS_JOBS
# Exclude table data (supported)
exclude-table-data: [] # DUMP_OPTIONS_EXCLUDE_TABLE_DATA
# Include/Exclude tables/schemas
# Format: "schema.table" (Regexp is NOT supported)
include-table: # DUMP_OPTIONS_INCLUDE_TABLE
- "schema.table"
exclude-table: # DUMP_OPTIONS_EXCLUDE_TABLE
- "schema.table"
include-schema: # DUMP_OPTIONS_INCLUDE_SCHEMA
- "included_schema"
exclude-schema: # DUMP_OPTIONS_EXCLUDE_SCHEMA
- "excluded_schema"
mysql:
options:
# Connection (Common to dump and restore)
user: root # DUMP_MYSQL_OPTIONS_USER
password: "1234" # DUMP_MYSQL_OPTIONS_PASSWORD
host: "playground-mysql-db-original" # DUMP_MYSQL_OPTIONS_HOST
port: 3306 # DUMP_MYSQL_OPTIONS_PORT
connect-database: "employees" # DUMP_MYSQL_OPTIONS_CONNECT_DATABASE
# Dump Selection
# Dump all databases (--all-databases)
all-databases: false # DUMP_MYSQL_OPTIONS_ALL_DATABASES
# List of databases to dump
databases: # DUMP_MYSQL_OPTIONS_DATABASES
- "employees"
# Dump Options
# Exclude CREATE TABLE statements (--no-create-info)
no-create-info: false # DUMP_MYSQL_OPTIONS_NO_CREATE_INFO
# Exclude data from dump (--no-data)
no-data: false # DUMP_MYSQL_OPTIONS_NO_DATA
# Include DROP TABLE statements (--add-drop-table)
add-drop-table: false # DUMP_MYSQL_OPTIONS_ADD_DROP_TABLE
# Reduce dump size with minimal comments (--compact)
compact: false # DUMP_MYSQL_OPTIONS_COMPACT
# Do not include comments in dump (--skip-comments)
skip-comments: false # DUMP_MYSQL_OPTIONS_SKIP_COMMENTS
# Use a single transaction for the dump (--single-transaction)
single-transaction: false # DUMP_MYSQL_OPTIONS_SINGLE_TRANSACTION
# Fetch rows one at a time (--quick)
quick: false # DUMP_MYSQL_OPTIONS_QUICK
# Lock all tables during dump (--lock-tables)
lock-tables: false # DUMP_MYSQL_OPTIONS_LOCK_TABLES
# Metadata
# Exclude tablespace information (--no-tablespaces)
no-tablespaces: false # DUMP_MYSQL_OPTIONS_NO_TABLESPACESRestore Configuration Example
restore:
options:
# Common Restore Options
# Restore only data, without schema
data-only: false # RESTORE_OPTIONS_DATA_ONLY
# Restore only schema, without data
schema-only: false # RESTORE_OPTIONS_SCHEMA_ONLY
# Number of parallel jobs
jobs: 1 # RESTORE_OPTIONS_JOBS
# Restore data in dependency order
restore-in-order: false # RESTORE_OPTIONS_RESTORE_IN_ORDER
mysql:
options:
# Connection
user: root # RESTORE_MYSQL_OPTIONS_USER
password: "1234" # RESTORE_MYSQL_OPTIONS_PASSWORD
host: "playground-mysql-db-transformed" # RESTORE_MYSQL_OPTIONS_HOST
port: 3306 # RESTORE_MYSQL_OPTIONS_PORT
connect-database: "employees" # RESTORE_MYSQL_OPTIONS_CONNECT_DATABASE
# Restore Options
# Enable verbose output
verbose: false # RESTORE_MYSQL_OPTIONS_VERBOSETip: For a complete configuration example, check out the Playground Configuration.
Transformation Configuration Example
dump:
transformation:
# Example 1: Subset with Condition
- schema: "employees"
name: "employees"
# Subset: Dump only employees hired after 2000-01-01
subset_conds:
- "employees.employees.hire_date > '2000-01-01'"
transformers:
- name: "RandomDate"
params:
column: "birth_date"
min: "1980-01-01 00:00:00"
max: "2000-01-01 00:00:00"
# Example 2: Conditional Masking
- schema: "employees"
name: "salaries"
# Apply transformer only if salary is greater than 100000
# Note: 'record' refers to the current row data
when: "record.salary > 100000"
transformers:
- name: "RandomInt"
params:
column: "salary"
min: 50000
max: 90000Usage
Usage:
greenmask [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
delete delete dump from the storage with a specific ID
dump Dump database, transform and store into storage.
help Help about any command
list-dumps list all dumps in the storage
list-transformers list of the allowed transformers with documentation
restore restore dump with ID or the latest to the target database
show-transformer show transformer details
validate Validate database transformation by performing test with limited data dump and print transformation diff
Flags:
--config string Path to config file. Can be JSON or YAML format.
-h, --help help for greenmask
--log-format string Logging format [text|json] (default "text")
--log-level string logging level [debug|info|warn] (default "info")
-v, --version version for greenmask
Note: Many parameters can be set via CLI arguments, which override the configuration file or environment variables.
Dump Command
The dump command supports standard flags for filtering and metadata.
Supported CLI Flags:
--include-table: Include specified table (can be used multiple times).--exclude-table: Exclude specified table (can be used multiple times).--include-schema: Include specified schema (can be used multiple times).--exclude-schema: Exclude specified schema (can be used multiple times).--exclude-table-data: Dump structure only for the specified table (can be used multiple times).--data-only: Dump data only, without schema.--schema-only: Dump schema only, without data.--tag: Add a tag to the dump metadata.--description: Add a description to the dump metadata.
Restore Command
The restore command supports flags for controlling the restoration process.
Supported CLI Flags:
--data-only: Restore data only, without schema.--schema-only: Restore schema only, without data.--jobs: Number of parallel jobs for restoration.--restore-in-order: Restore data in dependency order (required for foreign keys).
Docker images
You can use one of the image from dockerhub greenmask/greenmask:mysql-8.4-lts-v1.0.0b1 or greenmask/greenmask:mysql-8.0-v1.0.0b1 depending on which version is required for you.
Those images are also available in docker github registry mirror ghcr.io/greenmaskio/greenmask:mysql-8.4-lts-v1.0.0b1 or ghcr.io/greenmaskio/greenmask:mysql-8.0-v1.0.0b1.
Note: The Docker images are currently supported on linux/amd64 architecture only and are based on Debian Bookworm.
MySQL Playground
You can easily try out the MySQL engine using the provided Docker Compose playground.
Quick Start
-
Clone the repository and checkout the v1 branch:
git clone https://github.com/greenmaskio/greenmask.git cd greenmask git checkout v1 -
Run the playground environment:
docker compose -f v1/docker-compose-mysql.yml run greenmask-from-source
This command starts the source database, the transformation target database, and MinIO storage, and drops you into a shell inside the Greenmask container.
-
Run Greenmask commands:
Inside the container shell, you can run the following commands to dump, list, and restore data using the pre-mounted configuration:# Create a dump greenmask --config config.yaml dump # List available dumps greenmask --config config.yaml list-dumps # Restore the latest dump greenmask --config config.yaml restore latest
Playground Details
To run using the release image:
docker compose -f v1/docker-compose-mysql.yml run greenmaskThe playground provides two database containers:
- Original: Contains the source data.
- Transformed: Intended target for restoration and verification.
Convenience aliases are available in the shell:
mysql_o: Connect to the original database.mysql_t: Connect to the transformed database.cleanup: Resets the transformed database (useful after restoring a dump to clean up for the next run).
The playground uses the Employees sample database for demonstration purposes.
You can view the full playground configuration file here.
Contributors
Links
Feel free to reach out to us if you have any questions or need assistance: