ProxSave v0.14.0
🧭 PBS API-first restores, chunked artifacts, safer staging, and bounded IO across backups/restores
This release is a major step forward in restore safety, scalability, and operator control. It introduces an API-first, UI-driven PBS staged restore flow (with explicit Merge vs Clean behavior), adds chunking + reassembly to handle large artifacts and improve selective restores, hardens restore staging directory lifecycle and permissions, and makes several key operations cancellable and bounded (streaming category analysis, PVESH timeouts, filesystem IO timeouts). It also improves compatibility diagnostics, rollback handling, and test robustness.
-
PBS staged restore: API-first + UI-driven reconciliation (Merge vs Clean 1:1):
- Added PBS notifications backup/restore, including
notifications.cfg/notifications-priv.cfgand a generatednotifications_summary.json. - Implemented API-based PBS apply (
proxmox-backup-manager) with strict 1:1 reconciliation support and controlled fallbacks. - Moved PBS restore reconciliation out of
backup.envand into an interactive restore-time choice:- Merge: non-destructive behavior; skips destructive/API-unavailable actions.
- Clean (1:1): strict reconcile allowed; file-based fallbacks are permitted when needed.
- Introduced a dedicated final staged phase for PBS API-backed applies (node, datastores, remotes, jobs, notifications):
- File-based config apply runs while PBS services remain stopped.
- The final API phase may temporarily start services, applies API-backed categories, and then attempts to stop services again.
- Improved PBS API/UI error reporting: capture and surface API availability errors, and return a descriptive error when Merge mode skips API-applied categories.
- Hardened PBS list parsing (sanitized keys, stricter rows, descriptive errors with row indexes and available keys).
- Improved deterministic cleanup ordering for notifications in strict mode (matchers cleaned before endpoints to avoid reference-blocked cleanup).
- Added PBS notifications backup/restore, including
-
Chunking + reassembly for large artifacts (and better selective restore matching):
- Implemented smart file chunking with metadata (SHA256,
.chunkedmarkers) and robust chunk write logic. - Added
ReassembleChunkedFilesafter extract:- Discovers chunks (numeric sort), concatenates, validates integrity, reapplies
chmod/chown/mtime, and restores metadata.
- Discovers chunks (numeric sort), concatenates, validates integrity, reapplies
- Improved selective restore path matching and mapping of chunk artifacts back to original paths (
originalPathFromChunk) to reduce missed matches. - Added extensive unit test coverage for chunking, discovery, validation, and reassembly behavior.
- Implemented smart file chunking with metadata (SHA256,
-
Backup prefiltering + structured-config safety (less damage, better stats):
- Enhanced backup prefiltering with detailed stats, symlink skipping, and safer normalization rules.
- Avoids normalization of known structured config roots:
/etc/proxmox-backup,/etc/pve,/etc/ssh,/etc/pam.d,/etc/systemd/system.
- Normalization helpers now return
(changed bool)and normalization is limited to safe text normalization (no sorting). - Added recovery for malformed/flattened PBS
datastore.cfg:- Detect duplicate keys and attempt to restore content from
pbs_datastore_inventory.jsonusing a lightweight inventory parser.
- Detect duplicate keys and attempt to restore content from
- Added a new
prefilter-manualCLI command to run the prefilter standalone (custom root/max-size/log-level).
-
Restore staging dirs: secure lifecycle, cleanup, and safer defaults:
- Added secure staging directory creation under
/tmp/proxsave/restore-stage-*:0700permissions, PID-aware naming, unique creation viaMkdirTempwith timestamp/pid pattern.
- Added
cleanupOldRestoreStageDirs()and wired it into orchestrator startup/UI paths to remove aged staging dirs. - Added
PROXSAVE_PRESERVE_RESTORE_STAGINGand preserve-on-warnings behavior (including staged installs/network), plus auto-removal after successful clean restores. - Improved logs around staging creation/cleanup and documented the behavior.
- Added secure staging directory creation under
-
Streamed + cancellable category analysis with safe fallback:
AnalyzeBackupCategoriesnow accepts a context and scans archives streaming (O(1) memory), enabling cancellation and better error handling.- Avoids double-closing underlying files from decompression readers.
- Detects category availability on-the-fly and exposes an injectable analysis function for testing.
- Falls back to a safe full restore with a user-facing message when analysis fails; added tests (including truncated-tar error handling).
-
Broader context cancellation support and restore code hardening:
- Added
ctx.Err()checks in long-running loops (backup history/replication aggregation) for early cancellation. - Ensured restore apply paths handle nil contexts safely (createBundle, decrypt TUI, guardMountPoint, access control applies).
- Introduced a context-aware reader (
contextReader) so largeio.Copyoperations can be cancelled cleanly. - Refactored many restore internals: tighter error checks, reduced unused parameters, improved confirm/preview messaging, and clearer restore destination warnings.
- Added
-
PVE service management and restore ordering correctness:
- Centralized PVE cluster service management into a single ordered list.
- Stop order is now correct and tested (reverse order:
pvestatd → pveproxy → pvedaemon → pve-cluster), start uses forward order. - Updated docs/diagrams to reflect the corrected stop sequence.
-
Security hardening and rollback artifact robustness:
- Hardened permissions for rollback artifacts and markers to 0600 (archive, scripts, marker/location files).
- Ensured rollback log files are created before writing markers/scripts.
- Removed redundant shebang lines from generated rollback scripts.
- Ensured firewall rollback disarm cleans up scripts, logging failures without failing the flow; added tests.
-
Compatibility messaging, warnings, and UX improvements:
ValidateCompatibilitynow produces clearer warnings/errors when backup/system type cannot be detected (includes type info where available).- Final restore summary now reflects logged warnings (
logger.HasWarnings()), with unit test coverage. - Fixed category toggle/deselection logic so deselecting truly removes entries from the selection map; selection counts now reflect
len(selected). - Improved
logStepmessages to include step index for clearer progress tracing.
-
Bounded execution to avoid hangs: PVESH and filesystem IO timeouts:
- Added
PVESH_TIMEOUTand applied per-call timeouts topveshoperations. - Extended PVE storage parsing to capture runtime fields (
active,enabled,status) with tolerant parsing of bool/int/string forms:- Includes runtime info in logs and can skip storages that appear unavailable to reduce hangs.
- Added
FS_IO_TIMEOUTand a newinternal/safefsbounded probe layer (stat,readdir,statfs) to avoid blocking on unreachable mounts. - Propagated IO timeouts through datastore/storage sampling, directory/file sampling, PXAR metadata collection, and report generation; timeouts are handled gracefully with warnings/skips.
- Fixed
pveshcontext cancellation handling and tightened deps signatures for command/run/lookpath to accept contexts/timeouts.
- Added
-
Collector refactor: bounded filesystem sampler replaces legacy PXAR sampling:
- Replaced the complex fanout-based PXAR sampling implementation with a simpler bounded sampler (
fs_sampling_bounded). - Removed deprecated PXAR tuning options and related code paths/fields from
CollectorConfig. - Updated docs/templates:
- Cleaned up legacy env mappings for PXAR knobs.
- Simplified public config examples and clarified sampling semantics.
- Noted include/exclude pattern reuse for PVE datastore sampling.
- Replaced the complex fanout-based PXAR sampling implementation with a simpler bounded sampler (
-
Lock-file correctness and CI hygiene:
- Lock checks now parse pid/host/time metadata, perform same-host PID liveness checks (injectable
killFunc), and remove stale locks when appropriate. - Prevented tests from creating a
--progressartifact; removed stale artifacts before runs. - Hardened rclone-copy stubs to reject empty or flag-like destinations (starting with
--) to avoid accidental writes. - General test cleanup and refactors to accommodate new context/timeouts and API behavior.
- Lock checks now parse pid/host/time metadata, perform same-host PID liveness checks (injectable
-
Build/versioning improvements:
- Makefile VERSION derivation now produces stable dev-style version strings (tag + dev count + sha +
.dirtyhandling) for bothbuildandbuild-release.
- Makefile VERSION derivation now produces stable dev-style version strings (tag + dev count + sha +
-
Dependencies:
- Bumped
golang.org/x/termto 0.40.0. - Bumped
golang.org/x/cryptoto 0.48.0.
- Bumped
Breaking/behavior changes to note:
- PBS restore reconciliation is now selected interactively (Merge vs Clean 1:1). The previous env-driven
RESTORE_PBS_APPLY_MODE/RESTORE_PBS_STRICTsettings were removed. - Several legacy PXAR tuning knobs were removed as the collector moved to a bounded sampling implementation.
Overall: more reliable PBS restores (API-first with explicit operator intent), better handling of large artifacts via chunking, safer staging with stronger permissions and cleanup, and fewer hangs thanks to bounded IO and cancellable workflows.