github tis24dev/proxsave v0.14.0

5 hours ago

ProxSave v0.14.0

🧭 PBS API-first restores, chunked artifacts, safer staging, and bounded IO across backups/restores

This release is a major step forward in restore safety, scalability, and operator control. It introduces an API-first, UI-driven PBS staged restore flow (with explicit Merge vs Clean behavior), adds chunking + reassembly to handle large artifacts and improve selective restores, hardens restore staging directory lifecycle and permissions, and makes several key operations cancellable and bounded (streaming category analysis, PVESH timeouts, filesystem IO timeouts). It also improves compatibility diagnostics, rollback handling, and test robustness.

  • PBS staged restore: API-first + UI-driven reconciliation (Merge vs Clean 1:1):

    • Added PBS notifications backup/restore, including notifications.cfg / notifications-priv.cfg and a generated notifications_summary.json.
    • Implemented API-based PBS apply (proxmox-backup-manager) with strict 1:1 reconciliation support and controlled fallbacks.
    • Moved PBS restore reconciliation out of backup.env and into an interactive restore-time choice:
      • Merge: non-destructive behavior; skips destructive/API-unavailable actions.
      • Clean (1:1): strict reconcile allowed; file-based fallbacks are permitted when needed.
    • Introduced a dedicated final staged phase for PBS API-backed applies (node, datastores, remotes, jobs, notifications):
      • File-based config apply runs while PBS services remain stopped.
      • The final API phase may temporarily start services, applies API-backed categories, and then attempts to stop services again.
    • Improved PBS API/UI error reporting: capture and surface API availability errors, and return a descriptive error when Merge mode skips API-applied categories.
    • Hardened PBS list parsing (sanitized keys, stricter rows, descriptive errors with row indexes and available keys).
    • Improved deterministic cleanup ordering for notifications in strict mode (matchers cleaned before endpoints to avoid reference-blocked cleanup).
  • Chunking + reassembly for large artifacts (and better selective restore matching):

    • Implemented smart file chunking with metadata (SHA256, .chunked markers) and robust chunk write logic.
    • Added ReassembleChunkedFiles after extract:
      • Discovers chunks (numeric sort), concatenates, validates integrity, reapplies chmod/chown/mtime, and restores metadata.
    • Improved selective restore path matching and mapping of chunk artifacts back to original paths (originalPathFromChunk) to reduce missed matches.
    • Added extensive unit test coverage for chunking, discovery, validation, and reassembly behavior.
  • Backup prefiltering + structured-config safety (less damage, better stats):

    • Enhanced backup prefiltering with detailed stats, symlink skipping, and safer normalization rules.
    • Avoids normalization of known structured config roots:
      • /etc/proxmox-backup, /etc/pve, /etc/ssh, /etc/pam.d, /etc/systemd/system.
    • Normalization helpers now return (changed bool) and normalization is limited to safe text normalization (no sorting).
    • Added recovery for malformed/flattened PBS datastore.cfg:
      • Detect duplicate keys and attempt to restore content from pbs_datastore_inventory.json using a lightweight inventory parser.
    • Added a new prefilter-manual CLI command to run the prefilter standalone (custom root/max-size/log-level).
  • Restore staging dirs: secure lifecycle, cleanup, and safer defaults:

    • Added secure staging directory creation under /tmp/proxsave/restore-stage-*:
      • 0700 permissions, PID-aware naming, unique creation via MkdirTemp with timestamp/pid pattern.
    • Added cleanupOldRestoreStageDirs() and wired it into orchestrator startup/UI paths to remove aged staging dirs.
    • Added PROXSAVE_PRESERVE_RESTORE_STAGING and preserve-on-warnings behavior (including staged installs/network), plus auto-removal after successful clean restores.
    • Improved logs around staging creation/cleanup and documented the behavior.
  • Streamed + cancellable category analysis with safe fallback:

    • AnalyzeBackupCategories now accepts a context and scans archives streaming (O(1) memory), enabling cancellation and better error handling.
    • Avoids double-closing underlying files from decompression readers.
    • Detects category availability on-the-fly and exposes an injectable analysis function for testing.
    • Falls back to a safe full restore with a user-facing message when analysis fails; added tests (including truncated-tar error handling).
  • Broader context cancellation support and restore code hardening:

    • Added ctx.Err() checks in long-running loops (backup history/replication aggregation) for early cancellation.
    • Ensured restore apply paths handle nil contexts safely (createBundle, decrypt TUI, guardMountPoint, access control applies).
    • Introduced a context-aware reader (contextReader) so large io.Copy operations can be cancelled cleanly.
    • Refactored many restore internals: tighter error checks, reduced unused parameters, improved confirm/preview messaging, and clearer restore destination warnings.
  • PVE service management and restore ordering correctness:

    • Centralized PVE cluster service management into a single ordered list.
    • Stop order is now correct and tested (reverse order: pvestatd → pveproxy → pvedaemon → pve-cluster), start uses forward order.
    • Updated docs/diagrams to reflect the corrected stop sequence.
  • Security hardening and rollback artifact robustness:

    • Hardened permissions for rollback artifacts and markers to 0600 (archive, scripts, marker/location files).
    • Ensured rollback log files are created before writing markers/scripts.
    • Removed redundant shebang lines from generated rollback scripts.
    • Ensured firewall rollback disarm cleans up scripts, logging failures without failing the flow; added tests.
  • Compatibility messaging, warnings, and UX improvements:

    • ValidateCompatibility now produces clearer warnings/errors when backup/system type cannot be detected (includes type info where available).
    • Final restore summary now reflects logged warnings (logger.HasWarnings()), with unit test coverage.
    • Fixed category toggle/deselection logic so deselecting truly removes entries from the selection map; selection counts now reflect len(selected).
    • Improved logStep messages to include step index for clearer progress tracing.
  • Bounded execution to avoid hangs: PVESH and filesystem IO timeouts:

    • Added PVESH_TIMEOUT and applied per-call timeouts to pvesh operations.
    • Extended PVE storage parsing to capture runtime fields (active, enabled, status) with tolerant parsing of bool/int/string forms:
      • Includes runtime info in logs and can skip storages that appear unavailable to reduce hangs.
    • Added FS_IO_TIMEOUT and a new internal/safefs bounded probe layer (stat, readdir, statfs) to avoid blocking on unreachable mounts.
    • Propagated IO timeouts through datastore/storage sampling, directory/file sampling, PXAR metadata collection, and report generation; timeouts are handled gracefully with warnings/skips.
    • Fixed pvesh context cancellation handling and tightened deps signatures for command/run/lookpath to accept contexts/timeouts.
  • Collector refactor: bounded filesystem sampler replaces legacy PXAR sampling:

    • Replaced the complex fanout-based PXAR sampling implementation with a simpler bounded sampler (fs_sampling_bounded).
    • Removed deprecated PXAR tuning options and related code paths/fields from CollectorConfig.
    • Updated docs/templates:
      • Cleaned up legacy env mappings for PXAR knobs.
      • Simplified public config examples and clarified sampling semantics.
      • Noted include/exclude pattern reuse for PVE datastore sampling.
  • Lock-file correctness and CI hygiene:

    • Lock checks now parse pid/host/time metadata, perform same-host PID liveness checks (injectable killFunc), and remove stale locks when appropriate.
    • Prevented tests from creating a --progress artifact; removed stale artifacts before runs.
    • Hardened rclone-copy stubs to reject empty or flag-like destinations (starting with --) to avoid accidental writes.
    • General test cleanup and refactors to accommodate new context/timeouts and API behavior.
  • Build/versioning improvements:

    • Makefile VERSION derivation now produces stable dev-style version strings (tag + dev count + sha + .dirty handling) for both build and build-release.
  • Dependencies:

    • Bumped golang.org/x/term to 0.40.0.
    • Bumped golang.org/x/crypto to 0.48.0.

Breaking/behavior changes to note:

  • PBS restore reconciliation is now selected interactively (Merge vs Clean 1:1). The previous env-driven RESTORE_PBS_APPLY_MODE / RESTORE_PBS_STRICT settings were removed.
  • Several legacy PXAR tuning knobs were removed as the collector moved to a bounded sampling implementation.

Overall: more reliable PBS restores (API-first with explicit operator intent), better handling of large artifacts via chunking, safer staging with stronger permissions and cleanup, and fewer hangs thanks to bounded IO and cancellable workflows.

Changelog

Don't miss a new proxsave release

NewReleases is sending notifications on new releases.