github rhoopr/kei v0.12.2

one hour ago

Fixed

  • [watch] interval in TOML now actually overrides the docker image's 24-hour default. The published image's CMD hardcoded --watch-with-interval 86400, and the equivalent KEI_WATCH_WITH_INTERVAL env var was read by clap with the same precedence as a CLI flag, so a user who set [watch] interval = 3600 in their mounted config.toml kept seeing 24-hour cycles in the logs after a docker compose down/up -d. The Dockerfile now bakes the 24-hour default as ENV KEI_WATCH_WITH_INTERVAL=86400 instead of a CMD argument, and Config::build reads the env directly with precedence --watch-with-interval > [watch] interval > env > unset. Docker users who set nothing still get the 24-hour default; users who set TOML now get what TOML says. Note: this is a behavior change for anyone who deliberately set KEI_WATCH_WITH_INTERVAL to override a TOML value, TOML wins now. (#293, #319)
  • Deleted local files are now always re-downloaded on the next sync. The full-sync and incremental-sync paths had a "trust-state" optimization that skipped the per-asset filesystem check when the state DB said an asset was downloaded. A 5-path random sample-check at sync start was supposed to disable the optimization if files had gone missing, but with a large DB the sample reliably missed individual deletions, so a rm against a synced photo was permanent until the user ran kei reconcile by hand. The fix removes the fast-skip on both paths; every asset now goes through the cached dir_cache lookup that already powered the on-disk skip, and the unreliable sample-check is gone. A deleted file is now detected and re-downloaded on the next sync, restoring the data-sacred invariant. (#318)
  • KEI_WATCH_WITH_INTERVAL= (empty value) is now treated as unset. Lets users force a one-shot from the published docker image with docker run -e KEI_WATCH_WITH_INTERVAL= ... kei sync ...; previously the parser rejected the empty string and there was no other way to disable the image's baked-in 24h watch default without rebuilding. (#320)
  • HDR / wide-gamut HEICs now embed XMP metadata correctly. iOS HDR captures (iPhone 12 Pro and newer in HDR mode) carry an ICC profile in the HEIC colr atom under colour_type = "prof" (Display P3 / Display P3 Linear); a smaller share use rICC. The pinned mp4-atom rev decoded the profile bytes via Buf::slice without advancing the cursor, so the parent atom's strict end-check rejected every such file with under decode: colr and --embed-xmp / --set-exif-rating / --set-exif-datetime left the image without metadata. The kei-side retry marker would then loop forever. Fixed by bumping the mp4-atom pin past the upstream cursor-advance fix. Image bytes (mdat) were never affected. Closes #276; addresses #269. (#317)
  • Sync no longer advances the sync token when enumeration errored and zero assets downloaded. build_download_outcome was returning Success whenever the consumer count was zero, regardless of whether the producer had logged enumeration errors against malformed CloudKit pages. The next sync would resume from the advanced token and silently skip every asset on the errored pages. The zero-download path now mirrors the with-downloads branch and surfaces partial_failure whenever enumeration_errors > 0, so the sync token is preserved and the affected pages are re-scanned. (#322)
  • kei import-existing no longer drops [photos] file_match_policy = "name-id7". The matcher always derived expected paths via the default NameSizeDedupWithSuffix policy, ignoring the TOML file_match_policy value (and there was no CLI flag for it). Users migrating a tree previously synced by icloudpd --file-match-policy name-id7 saw every file land in the unmatched bucket because the on-disk filenames carried the _<id7> suffix while kei looked for the bare stem. Both the TOML key and a new --file-match-policy flag are now plumbed end-to-end through import. (#294, thanks @basdp)
  • kei import-existing matches AM/PM siblings across NBSP and regular space. macOS 13+ writes 12:34 PM with U+202F (NARROW NO-BREAK SPACE) between the time and the meridian; trees previously synced through tools that normalize to a regular space (or vice-versa) saw every AM/PM-named file come up unmatched even though the bytes were identical. After the existing primary and dedup-suffix probes miss, resolve_match_path consults a per-import DirCache for a sibling that differs only in the AM/PM whitespace character and adopts it (logged at INFO naming both paths). One read_dir per parent across all probes; no extra cost on miss. (#301)
  • Adopted asset rows are now committed atomically. kei import-existing previously wrote a row in two stages (upsert_seen, then mark_downloaded); a crash or interrupt between the two left a pending row with local_path = NULL, and the next sync treated the asset as never-seen and re-downloaded it. The new StateDb::import_adopt performs both steps inside one rusqlite transaction, so each row either lands as downloaded with local_path set or rolls back for a clean re-adopt on the next run. (#305)
  • kei import-existing exits non-zero when the iCloud fetch errors mid-scan. A fetcher Err was logged-and-continued, downgrading a partial scan to a clean exit and letting cron / Docker move on; the next sync would re-download files import should have adopted. The walk now bails with a message naming the library and the fetcher error, so operators see a non-zero exit and can re-run after the upstream condition clears. (#299)
  • kei import-existing rejects the same system directories kei sync rejects. --download-dir /etc (and the rest of the /bin, /usr, /sbin, /proc, /sys, /dev, /var/run deny list) now bail at config build for both subcommands with the same error string. The TOCTOU directory.exists() precondition was also replaced with a single tokio::fs::metadata round-trip so a missing or non-directory path surfaces as Cannot read download directory <path>: <io error> instead of the previous silent zero-match. Empty-filename fallbacks (non-UTF-8 segments, trailing slashes, paths ending in ..) now log a WARN with the asset id instead of writing an empty filename row in silence. (#308)
  • HEIF XMP probe walks top-level boxes by header rather than dispatching every box through mp4-atom. extract_xmp_bytes was using Any::decode_maybe, which routes into the full FourCC parser table; two of those parsers (Dfla, Avcc) call Vec::with_capacity with a u32 read straight from the file and OOM on malformed input. The probe now reads each top-level box header, descends into meta directly, and skips every other kind without touching the parser table. Removes a class of crash and OOM repros surfaced by the new fuzz harnesses. (#286)
  • Live recovery tests now print kei's stderr on failure. sync_recovers_deleted_file and sync_truncated_file_does_not_cause_data_loss previously panicked with just "deleted file should be re-downloaded" or "correctly-sized photo must be on disk". kei had exited 0, so assert().success() discarded stderr before the disk-state check ran, leaving nothing actionable when the tests failed intermittently. Both now run with RUST_LOG=kei=debug and include the captured stderr plus a post-sync walkdir in the panic message.

Security

  • paste 1.0.15 (RUSTSEC-2024-0436, unmaintained) is no longer in the dependency graph. Reached kei transitively through mp4-atom; the new pin uses upstream's pastey replacement. The cargo audit ignore for the advisory was removed from .cargo/audit.toml. Closes #310. (#311, #317)

Added

  • kei import-existing --force-empty overrides the new empty-library guard. When a selected library returns zero assets but the state DB already holds prior rows, import now bails before scanning with a message naming the empty zone and the prior DB total. The most likely cause is a transient permissions glitch or stale auth, not that the user emptied the library, and the previous behavior printed matched: 0 and exited 0 with no signal. First-run users (fresh DB) skip the network round-trip entirely; populated DBs pay one extra HyperionIndexCountLookup per library. --force-empty (or KEI_FORCE_EMPTY=1) is the escape hatch for users who genuinely emptied the library. (#312)
  • kei import-existing reads sync's full path-derivation chain. The matcher previously consumed only a subset of DownloadConfig, so any sync option not explicitly threaded through silently produced unmatched files. Import now derives expected paths via expected_paths_for(asset, &DownloadConfig) against the same CLI > env > TOML > default chain kei sync uses, and the ImportArgs flag set was extended to cover --size, --live-photo-mode, --live-photo-size, --live-photo-mov-filename-policy, --align-raw, --force-size, --file-match-policy, and KEI_KEEP_UNICODE_IN_FILENAMES. A new dedup-suffix fallback also matches the size-disambiguated filenames icloudpd writes when two iCloud assets collide on filename - kei wrote the bare path, but the matcher now retries with <stem>-<size><ext> before declaring unmatched. (#296)
  • icloudpd compatibility baseline for kei import-existing. 15 wiremock scenarios stage on-disk layouts using fixture data copied verbatim from icloud_photos_downloader's own test suite, run import end-to-end against synthetic CloudKit pages, and assert every file matches. Acts as a regression guard against accidental layout divergence (default policy, name-id7, raw DNG, JPEG+MOV / HEIC+_HEVC.MOV live-photo pairs, multi-asset live photos, plain MOV/MP4 video, year-only and none folder structures, Unicode strip/keep, pre-1970 dates, invalid filename chars). (#296)
  • Fuzz harnesses for 10 parser entry points. New fuzz/ directory with cargo-fuzz targets covering CloudKit JSON deserializers, auth responses (SRP init, account login, 2FA challenge), TOML config, the *Enc field decoders, path sanitization, HEIF atom walking, the HEIF XMP probe pipeline, Adobe XMP Toolkit (run via FFI so ASan can see C++ memory bugs), PhotoAsset::from_records, and the state enum from_str parsers. Checked-in seeds under fuzz/seeds/ include two OOM regression repros for an upstream mp4-atom bug that kei calls synchronously during the EXIF probe. just fuzz run TARGET replays them on every invocation. (#287)

Changed

  • kei import-existing summary lists Filtered (no path) and Hash errors. Previously the completion banner only reported total / matched / unmatched. Assets dropped before path derivation (no versions, content/date filter, --live-photo-mode skip, --force-size with a missing size) and files whose SHA256 read failed (permissions, vanished mid-scan, I/O error) silently disappeared from the tally. Operators can now reconcile against the on-disk tree, and empty-expected-paths assets are logged at WARN with the id and relevant config. (#300)
  • kei now compiles as a library plus a thin binary shim. src/main.rs is a 7-line entry point that calls kei::main_inner(); the module tree moved verbatim into src/lib.rs. Behavior is identical: cargo install kei, the docker image, and cargo run all build and run the same binary. The crate has a lib target now, which lets the in-tree fuzz harnesses depend on it directly instead of re-inlining source files via #[path]. A new __fuzz_internals Cargo feature gates pub mod __fuzz containing wrappers around pub(crate) parser entry points; default builds don't include it. (#287)
  • kei import-existing emits a stage="scan_started" tracing marker. Once authentication, library resolution, and the iCloud scan have all returned and the first asset is dequeued for path derivation, kei logs an INFO event tagged stage="scan_started". Operators tailing stderr (or piping through a log shipper) can now distinguish "still authenticating / fetching the album list" from "actually walking assets" without watching the progress bar. Used by the SIGINT recovery test to wait for a deterministic point before sending the signal. (#313)

Full changelog: CHANGELOG.md

Don't miss a new kei release

NewReleases is sending notifications on new releases.