github GreenmaskIO/greenmask v1.0.0b1

pre-release13 hours ago

Greenmask v1.0.0b1

Preface

This release marks the first beta version of Greenmask v1. It features a significant framework refactoring designed to enable support for multiple DBMSs by reusing core objects and abstractions.

This version introduces the MySQL engine. PostgreSQL support is not yet included in this release; the port of PostgreSQL support (based on v0.2.x) will be available soon.

Note: About 80% of the documentation for PostgreSQL is compatible with MySQL. We have not fully revised it for this release yet. If you have any questions or experience any issues, please open a GitHub issue or write to us on Discord.

Changelog:

  • Revised Subset System: Significantly revised subset system allowing easy addition of new DBMS support via multi-dialect architecture.
  • MySQL Implementation: MySQL uses CSV format via LOAD DATA LOCAL INFILE statement. This approach is open for discussion; feel free to open a discussion on GitHub or Discord.
  • MySQL Playground: Added MySQL playground for easy testing and experimentation.
  • MySQL Engine Support: Added full support for MySQL database engine #312, including:
    • Dump command #314
    • Restore command #348
    • Validate command #363
    • Docker playground and fixes #367
    • MySQL driver real tests #366
    • Implemented delete command #364
    • Ported list-dumps command #361
    • TypeClass implementation for better type handling #360
    • New transformers command #365
  • Transformers:
    • Ported built-in transformers #353
    • Ported cmd transformer #359
    • Fixes for Cmd/Record* transformers in list command #388
  • Build & Infrastructure:
    • Added build steps for V1 version #374
    • Fixed schema include and revised type assertion in driver #372
    • Fixed --jobs setting for dump cmd #370
    • Float parsing fix #369
    • Disabled 0.2.x binary and docker build from v1 tags #386

MySQL Implementation

Compatibility & Dependencies:

This release has been tested against MySQL 8.0 and 8.4. Other versions might work but have not been verified.

Requirements:

Greenmask focuses on data dumping and transformation, delegating schema operations to vendor tools. Ensure mysqldump (for dumping) and mysql client (for restoration) are installed and available in your environment.

Details:

Dump foramt: MySQL uses CSV format via LOAD DATA LOCAL INFILE statement. This approach is open for discussion; feel free to open a discussion on GitHub or Discord.

Status & Limitations

Some features are currently WIP for the MySQL engine:

  • show-dump command: Currently missing; intended to show details about a dump by its ID.
  • Template Language: Ported to support MySQL, but we are considering replacing it with an embedded JS interpreter for better predictability and user familiarity.
  • Integration Tests: Currently in progress.
  • Filtering: Including and excluding tables/schemas is currently limited. Regular expressions are not supported, and tables must be specified in the schema.table format.
  • Virtual References: Virtual references are not supported in this release.
  • Transformer Inheritance: Inheritance of transformers via Foreign Keys (FK/PK) is not supported.
  • Partitioned Tables: Inheritance of transformers for partitioned tables is not supported. Transformers must be defined for each partition if needed.
  • Global Salt: Setting a global salt variable for hash or deterministic engine transformers is currently not supported. This will be fixed in future versions.

Restore & Subset Limitations

  • Restoration Order: To restore data correctly and preserve Foreign Key integrity, you must use the --restore-in-order flag.
    • This ensures tables are restored in topological order (for non-circular schemas).
    • Performance Impact: This mode forces sequential processing (one worker), ignoring the --jobs flag.
    • Circular Schemas: This mode may fail with circular schemas.
    • Workaround: To speed up restoration, you can disable MySQL Foreign Key checks on the target database, allowing parallel restoration without this flag.
  • Circular References: The subset system currently cannot resolve circular schema dependencies and will panic in such cases. This is a known issue and will be fixed in future versions.

Documentation

The documentation has not yet been fully revised for v1/MySQL. However, 90% of the usage is identical to the PostgreSQL version.

Configuration

Greenmask v1 does not directly support standard MySQL environment variables. Instead, configuration is handled via the config file or environment variables using capitalized attribute names with dots replaced by underscores.

  • Common Options (dump.options): These parameters are common for all database engines.
  • MySQL Options (dump.mysql.options): These parameters are specific only to the MySQL engine.

Below is the list of supported settings for MySQL.

Environment Variables

Options can be set via environment variables by converting the YAML path to uppercase with underscores. For example, dump.mysql.options.user becomes DUMP_MYSQL_OPTIONS_USER.

Available Options

The options below correspond to standard mysqldump command arguments. However, only the options listed here are supported.

Dump Options (dump.mysql.options)

  • Connection: user, password, host, port, connect-database
  • General:
    • all-databases: Dump all databases.
    • databases: List of databases to dump.
    • no-create-info: Exclude CREATE TABLE statements.
    • no-data: Exclude data from dump.
    • add-drop-table: Include DROP TABLE statements.
    • compact: Reduce dump size with minimal comments.
    • skip-comments: Do not include comments in dump.
    • single-transaction: Use a single transaction for the dump.
    • quick: Fetch rows one at a time.
    • lock-tables: Lock all tables during dump.
  • Metadata:
    • no-tablespaces: Exclude tablespace information.

Restore Options (restore.mysql.options)

  • Connection: user, password, host, port, connect-database
  • General:
    • verbose: Enable verbose output.

Configuration Example

dump:
  tag:
    - "v1"
    - "daily"
  options:
    # Common Dump Options
    # Dump only data, without schema
    data-only: false                  # DUMP_OPTIONS_DATA_ONLY
    # Dump only schema, without data
    schema-only: false                # DUMP_OPTIONS_SCHEMA_ONLY
    # Number of parallel jobs
    jobs: 1                           # DUMP_OPTIONS_JOBS
    # Exclude table data (supported)
    exclude-table-data: []            # DUMP_OPTIONS_EXCLUDE_TABLE_DATA
    # Include/Exclude tables/schemas
    # Format: "schema.table" (Regexp is NOT supported)
    include-table:                    # DUMP_OPTIONS_INCLUDE_TABLE
      - "schema.table"
    exclude-table:                    # DUMP_OPTIONS_EXCLUDE_TABLE
      - "schema.table"
    include-schema:                   # DUMP_OPTIONS_INCLUDE_SCHEMA
      - "included_schema"
    exclude-schema:                   # DUMP_OPTIONS_EXCLUDE_SCHEMA
      - "excluded_schema"

  mysql:
    options:
      # Connection (Common to dump and restore)
      user: root                      # DUMP_MYSQL_OPTIONS_USER
      password: "1234"                # DUMP_MYSQL_OPTIONS_PASSWORD
      host: "playground-mysql-db-original" # DUMP_MYSQL_OPTIONS_HOST
      port: 3306                      # DUMP_MYSQL_OPTIONS_PORT
      connect-database: "employees"   # DUMP_MYSQL_OPTIONS_CONNECT_DATABASE

      # Dump Selection
      # Dump all databases (--all-databases)
      all-databases: false            # DUMP_MYSQL_OPTIONS_ALL_DATABASES
      # List of databases to dump
      databases:                      # DUMP_MYSQL_OPTIONS_DATABASES
        - "employees"

      # Dump Options
      # Exclude CREATE TABLE statements (--no-create-info)
      no-create-info: false           # DUMP_MYSQL_OPTIONS_NO_CREATE_INFO
      # Exclude data from dump (--no-data)
      no-data: false                  # DUMP_MYSQL_OPTIONS_NO_DATA
      # Include DROP TABLE statements (--add-drop-table)
      add-drop-table: false           # DUMP_MYSQL_OPTIONS_ADD_DROP_TABLE
      # Reduce dump size with minimal comments (--compact)
      compact: false                  # DUMP_MYSQL_OPTIONS_COMPACT
      # Do not include comments in dump (--skip-comments)
      skip-comments: false            # DUMP_MYSQL_OPTIONS_SKIP_COMMENTS
      # Use a single transaction for the dump (--single-transaction)
      single-transaction: false       # DUMP_MYSQL_OPTIONS_SINGLE_TRANSACTION
      # Fetch rows one at a time (--quick)
      quick: false                    # DUMP_MYSQL_OPTIONS_QUICK
      # Lock all tables during dump (--lock-tables)
      lock-tables: false              # DUMP_MYSQL_OPTIONS_LOCK_TABLES
      
      # Metadata
      # Exclude tablespace information (--no-tablespaces)
      no-tablespaces: false           # DUMP_MYSQL_OPTIONS_NO_TABLESPACES

Restore Configuration Example

restore:
  options:
    # Common Restore Options
    # Restore only data, without schema
    data-only: false                  # RESTORE_OPTIONS_DATA_ONLY
    # Restore only schema, without data
    schema-only: false                # RESTORE_OPTIONS_SCHEMA_ONLY
    # Number of parallel jobs
    jobs: 1                           # RESTORE_OPTIONS_JOBS
    # Restore data in dependency order
    restore-in-order: false           # RESTORE_OPTIONS_RESTORE_IN_ORDER

  mysql:
    options:
      # Connection
      user: root                      # RESTORE_MYSQL_OPTIONS_USER
      password: "1234"                # RESTORE_MYSQL_OPTIONS_PASSWORD
      host: "playground-mysql-db-transformed" # RESTORE_MYSQL_OPTIONS_HOST
      port: 3306                      # RESTORE_MYSQL_OPTIONS_PORT
      connect-database: "employees"   # RESTORE_MYSQL_OPTIONS_CONNECT_DATABASE

      # Restore Options
      # Enable verbose output
      verbose: false                  # RESTORE_MYSQL_OPTIONS_VERBOSE

Tip: For a complete configuration example, check out the Playground Configuration.

Transformation Configuration Example

dump:
  transformation:
    # Example 1: Subset with Condition
    - schema: "employees"
      name: "employees"
      # Subset: Dump only employees hired after 2000-01-01
      subset_conds:
        - "employees.employees.hire_date > '2000-01-01'"
      transformers:
        - name: "RandomDate"
          params:
            column: "birth_date"
            min: "1980-01-01 00:00:00"
            max: "2000-01-01 00:00:00"

    # Example 2: Conditional Masking
    - schema: "employees"
      name: "salaries"
      # Apply transformer only if salary is greater than 100000
      # Note: 'record' refers to the current row data
      when: "record.salary > 100000"
      transformers:
        - name: "RandomInt"
          params:
            column: "salary"
            min: 50000
            max: 90000

Usage

Usage:
  greenmask [command]

Available Commands:
  completion        Generate the autocompletion script for the specified shell
  delete            delete dump from the storage with a specific ID
  dump              Dump database, transform and store into storage.
  help              Help about any command
  list-dumps        list all dumps in the storage
  list-transformers list of the allowed transformers with documentation
  restore           restore dump with ID or the latest to the target database
  show-transformer  show transformer details
  validate          Validate database transformation by performing test with limited data dump and print transformation diff

Flags:
      --config string       Path to config file. Can be JSON or YAML format.
  -h, --help                help for greenmask
      --log-format string   Logging format [text|json] (default "text")
      --log-level string    logging level [debug|info|warn] (default "info")
  -v, --version             version for greenmask

Note: Many parameters can be set via CLI arguments, which override the configuration file or environment variables.

Dump Command

The dump command supports standard flags for filtering and metadata.

Supported CLI Flags:

  • --include-table: Include specified table (can be used multiple times).
  • --exclude-table: Exclude specified table (can be used multiple times).
  • --include-schema: Include specified schema (can be used multiple times).
  • --exclude-schema: Exclude specified schema (can be used multiple times).
  • --exclude-table-data: Dump structure only for the specified table (can be used multiple times).
  • --data-only: Dump data only, without schema.
  • --schema-only: Dump schema only, without data.
  • --tag: Add a tag to the dump metadata.
  • --description: Add a description to the dump metadata.

Restore Command

The restore command supports flags for controlling the restoration process.

Supported CLI Flags:

  • --data-only: Restore data only, without schema.
  • --schema-only: Restore schema only, without data.
  • --jobs: Number of parallel jobs for restoration.
  • --restore-in-order: Restore data in dependency order (required for foreign keys).

Docker images

You can use one of the image from dockerhub greenmask/greenmask:mysql-8.4-lts-v1.0.0b1 or greenmask/greenmask:mysql-8.0-v1.0.0b1 depending on which version is required for you.

Those images are also available in docker github registry mirror ghcr.io/greenmaskio/greenmask:mysql-8.4-lts-v1.0.0b1 or ghcr.io/greenmaskio/greenmask:mysql-8.0-v1.0.0b1.

Note: The Docker images are currently supported on linux/amd64 architecture only and are based on Debian Bookworm.

MySQL Playground

You can easily try out the MySQL engine using the provided Docker Compose playground.

Quick Start

  1. Clone the repository and checkout the v1 branch:

    git clone https://github.com/greenmaskio/greenmask.git
    cd greenmask
    git checkout v1
  2. Run the playground environment:

    docker compose -f v1/docker-compose-mysql.yml run greenmask-from-source

    This command starts the source database, the transformation target database, and MinIO storage, and drops you into a shell inside the Greenmask container.

  3. Run Greenmask commands:
    Inside the container shell, you can run the following commands to dump, list, and restore data using the pre-mounted configuration:

    # Create a dump
    greenmask --config config.yaml dump
    
    # List available dumps
    greenmask --config config.yaml list-dumps
    
    # Restore the latest dump
    greenmask --config config.yaml restore latest

Playground Details

To run using the release image:

docker compose -f v1/docker-compose-mysql.yml run greenmask

The playground provides two database containers:

  • Original: Contains the source data.
  • Transformed: Intended target for restoration and verification.

Convenience aliases are available in the shell:

  • mysql_o: Connect to the original database.
  • mysql_t: Connect to the transformed database.
  • cleanup: Resets the transformed database (useful after restoring a dump to clean up for the next run).

The playground uses the Employees sample database for demonstration purposes.

You can view the full playground configuration file here.

Contributors

@tarbaev-vl
@wwoytenko

Links

Feel free to reach out to us if you have any questions or need assistance:

Don't miss a new greenmask release

NewReleases is sending notifications on new releases.