✨ New Features
- Added progress bar to Step 5 (Find Albums) to show album association processing progress
- Step 1: Extension fixing now replaces the incorrect extension instead of appending — Previously, a file like
vacation_sunset.heic(actually JPEG) would be renamed tovacation_sunset.heic.jpg. Now it becomesvacation_sunset.jpg. The associated JSON sidecar and any supplemental-metadata JSON files are atomically renamed to match. This produces cleaner output filenames with no change in metadata accuracy, since all downstream steps already used only the final extension. The double-extension handling in the truncated filename fixer (Step 4) has been kept for natural Pixel-style suffixes (.PANO.jpg,.MP.mp4, etc.) which are not affected by this change. - Step 3 progress bar now fills in real time — Previously the hashing phase (
groupIdenticalFast2) only printed a text message every 50 size groups (and only in verbose mode), so the progress bar appeared to jump to 100% instantly at the end. AFillingBaris now created before the bucket-processing loop and updated after each slice of size groups finishes, giving continuous visual feedback during the (potentially long) deduplication hashing phase. - Step 7 progress bar unified — The two separate bars ("Writing EXIF data" and "Flushing pending EXIF writes") are now a single bar that tracks all output files from start to finish. The total is pre-counted before processing begins, so the bar fills steadily across the whole step without a surprise second bar appearing at the end.
🚀 Performance Improvements
Overall pipeline performance is approximately 3× faster on a modern PC with an SSD compared to the previous version, based on real-world tests (400GB takeout now 38m instead of 1h 20m) due to the following changes:
- Step 1 (Fix Extensions): single-pass collection + parallel processing — Previously the directory was traversed twice: once to count files (for the progress bar) and once to process them. A single
toList()now serves both purposes, and_processFile(128-byte header read + MIME check + optional rename) runs in parallelFuture.waitbatches atdiskOptimizedconcurrency. - Step 2 (Discover Media): parallel JSON partner-sharing checks —
jsonPartnerSharingExtractorwas called sequentially per file inside the stream loop. Files are now collected first, then processed withFuture.waitin batches ofdiskOptimizedconcurrency, reducing total I/O wait from a sum of latencies to roughly one batch-time perN/batchSizeiterations. - Step 6 (Move Files): parallel file operations —
moveAllnow calls the parallel variant of the move engine (moveMediaEntitiesParallel) withdiskOptimizedconcurrency (cores × 8, max 32) instead of the sequential one-entity-at-a-time loop. The parallel implementation already existed but was dead code. For cross-drive copy operations this is the single largest win. - Step 7 EXIF batch write throughput dramatically improved —
stableTagsetKeypreviously grouped files by tag names and values, so every file landed in its own 1-entry bucket (unique date string = unique key). The threshold check never fired, and the final flush called one ExifTool process per file. The batch queue now groups by tag names only; all files needing the same tag set (e.g.DateTimeOriginal + DateTimeDigitized + DateTime) land in a single bucket. ExifTool's batch mode already supports different values per file by interleaving per-file args before each filename, so correctness is unchanged. This results in a large reduction in ExifTool process spawns for typical collections. - 7-Zip extraction speed improved — The 7-Zip extractor now uses
-mmt=N(explicit thread count equal toPlatform.numberOfProcessors) instead of-mmt=on, and suppresses stdout/stderr/progress pipe output (-bso0 -bse0 -bsp0) to reduce I/O overhead. For large archives with many files this avoids unnecessary process pipe traffic and lets 7-Zip use all available CPU cores. - Step 7: ExifTool stay-open IPC — zero Perl startup overhead — All ExifTool write operations (single-file and batch) are now routed through a single long-running
exiftool -stay_open Trueprocess started once at launch. On Linux / WSL, Perl startup costs ~1-2 s per invocation; every write now takes only the actual I/O time. The argfile batch path also no longer needs a temp file when stay-open is active, as stdin has no command-line length limit. Falls back transparently to one-shot invocations if the persistent process fails to start. - Parallel ZIP extraction — When 7-Zip is available and multiple ZIPs are present, archives are now extracted concurrently. Concurrency scales with core count (
max(2, N÷4), capped at 4) so 4-core machines run 2 in parallel, 8-core run 3, 16-core run 4, etc. Since extraction is I/O-bound (JPEGs are already compressed, so Deflate adds negligible CPU work), each process receives the full processor count (-mmt=N) rather than a split share — threads mostly block on I/O and don't compete for CPU. Native Dart extraction remains sequential to avoid simultaneous heap pressure from two large ZIPs. - Step 7: Large MOV/MP4 files with oversized QuickTime atoms are no longer retried — ExifTool emits
atom is too large for rewritingwhen a video file's data block exceeds its internal rewrite limit (e.g. a 676 MB MOV file). Previously this produced 4–6 noisy log lines and a pointless single-file retry. The error is now recognised as unrecoverable: the batch-level "retrying" message is suppressed, no per-file retry is attempted, and a single clear[WARNING]is emitted per affected file stating that the file was still sorted correctly. - JSON sidecar read consolidated (Steps 4 + 7) — Each media file's
.jsonsidecar was previously parsed up to three times: once for the date, once for GPS coordinates, and again in Step 7 to retrieve coordinates for EXIF writing. GPS is now extracted alongside the date in a single read during Step 4 and cached on the entity, so Step 7 requires no additional file I/O for GPS data. - GPS data from
geoDataExifnow correctly used — The coordinate extractor previously only read from thegeoDatafield of the JSON sidecar. Google Photos also stores the original camera-recorded GPS ingeoDataExif, which is often the only source of valid coordinates (e.g. for videos, photos edited by third-party apps that strip EXIF, or photos tagged after upload). The extractor now prefersgeoDataExifand falls back togeoData, significantly increasing the number of files that receive GPS in their output EXIF.. - Step 3: XXH3 replaces hand-rolled FNV-1a for quick-signature and fingerprint hashing — The 32-bit FNV-1a closure used in
_quickSignatureand the 64-bit FNV-1a method used in_triSampleFingerprintare replaced by XXH3 (viapackage:xxh3). XXH3 is approximately 10× faster than SHA-256 and significantly faster than FNV-1a on the 4 KiB slices read per bucket candidate, while providing 64-bit hash quality. - Full-file content hashing uses XXH3 instead of SHA-256 —
MediaHashService.calculateFileHash(the definitive byte-for-byte equality check used before any file is discarded) now usesxxh3Stringfor small files and thexxh3Streamchunked API for large files. This replaces the previouspackage:cryptoSHA-256 implementation. Thepackage:cryptodependency has been removed.
🐛 Bug Fixes
- Step 1: Extension collision resolved with unique filename instead of skip — When fixing a file's extension would produce a name that already exists (e.g.
teams_jens.png→teams_jens.jpgbutteams_jens.jpgalready exists due to storage-saver mode), the file is now renamed to the next available unique name (teams_jens(1).jpg) using the same(N)counter logic as the move step. Files with an existing counter suffix are handled correctly:teams_jens(1).png→teams_jens(1)(1).jpg. Previously such files were silently left with the wrong extension, which later caused ExifTool batch failures ("Not a valid PNG — looks more like a JPEG") with a noisy multi-round binary-split cascade in Step 7. - Windows: emoji album folders no longer cause pipeline failures — On Windows,
Directory.list()throws when a path contains certain emoji characters. Album directories with emoji names (e.g.Holiday Memories 🎄) are now temporarily renamed to a hex-encoded form at the start of the pipeline and restored immediately after all steps complete. Output album folders always use the original emoji name. If the process crashes mid-run, the hex-encoded names are detected and restored automatically on the next run viaprogress.json. - Step 1: Extension fixing no longer skips edited files by default — Files with language-specific "edited" suffixes (e.g.
-edited) were unconditionally skipped during extension fixing, regardless of the--skip-extrasflag. This meant a file likeIMG_3376-bearbeitet.HEICthat was actually a JPEG would keep its wrong extension and fail later withNot a valid HEIC (looks more like a JPEG). The guard is now conditional: edited files are only skipped during extension fixing when--skip-extrasis explicitly set. - Added
archiverenas a recognised Dutch andarchivierenas a German special folder name (Google Photos exports this as a mistranslation of "Archive" for NL/GER users). - Windows: trailing backslash in quoted paths —
--input "path\"and--output "path\"now work correctly. The trailing path separator is stripped before processing; previously the C-runtime interpreted\"as an escaped quote, causing subsequent flags to be swallowed into the path value. If the resulting path value still appears to contain embedded flags (e.g.--input "path\" --output ...), GPTH now exits with a clear diagnostic message instead of silently failing. - Suppressed a misleading batch-level ExifTool warning for InteropIFD errors. Those files are already retried individually (introduced in v5.1.1), so logging the whole batch as failed gave the false impression that every file in the batch was broken.
- Step 7: UTC offset tags now written natively for JPEGs, fixing InteropIFD corruption warnings —
OffsetTime,OffsetTimeOriginal, andOffsetTimeDigitizedare now written inside the native JPEG write methods (writeDateTimeNativeJpeg/writeCombinedNativeJpeg) together with the date tags, eliminating the second ExifTool invocation that previously followed every successful native write. This is also more resilient for files with a corrupt InteropIFD: theimagelibrary's sub-IFD reader wraps each sub-IFD in atry/catchand silently drops any that fail to parse, then the writer removes the dangling0xA005pointer — so the output JPEG has a clean EXIF block with no corrupt InteropIFD, rather than triggering ExifTool'sTruncated InteropIFD directoryerror. The ExifTool fallback path (used when the native write itself fails) is untouched and still includes the strip-and-retry logic from v5.1.1. This addresses an issue introduced in version 5.0.9, during the fix of the UTC bug. - Step 7: Large MOV/MP4 files with oversized QuickTime atoms are no longer retried — ExifTool emits
atom is too large for rewritingwhen a video file's data block exceeds its internal rewrite limit (e.g. a 676 MB MOV file). Previously this produced 4–6 noisy log lines and a pointless single-file retry. The error is now recognised as unrecoverable: the batch-level "retrying" message is suppressed, no per-file retry is attempted, and a single clear[WARNING]is emitted per affected file stating that the file was still sorted correctly. - Step 4: Truncated filename fixer no longer duplicates Pixel suffixes — Files with double extensions containing Pixel-specific suffixes (
.PANO.jpg,.MP.mp4,.NIGHT.jpg,.vr.jpg) had the suffix doubled when the truncated filename fixer restored the full name from JSON metadata (e.g.PXL_20230518_095458599.PANO.PANO.jpg). The title's extension is now stripped symmetrically with the filename's, preventing the duplication.
🚀 Improvements
- 7-Zip detection logged once — The 7-Zip executable path is now resolved once per extraction session (cached in the service instance) and reported via a single
[ INFO ]message. Previously the path was re-detected for every ZIP file, producing no visible confirmation at all in CLI mode. - Removed noise in verbose logs and ensured more accurate representation of errors/warnings
- Step 7: MTS, M2TS, WMV, AVI, MPEG, and BMP files are now skipped before ExifTool is called — ExifTool does not support writing metadata to these formats. Previously they were passed to ExifTool individually, producing
[WARNING] ExifTool command failednoise for every such file. They are now detected upfront by extension and MIME type and silently skipped (a single warning is still logged per file unless warnings are silenced). - Refactoring, offloading complex logic in separate files for maintainability and removed legacy code.
🔧 Internal
- Replaced custom
_Mutexclass withPool(1)frompackage:poolinMediaHashService— same single-access semantics with less custom code. - Replaced hand-rolled
LinkedHashMapLRU cache (~60 lines) withLruCachefrompackage:lruinMediaHashService. - Added type-safe
toJson()/fromJson()serialization toMediaEntity,FileEntity, andAlbumEntity, replacing ~260 lines of duck-typeddynamiccasting inProgressSaverService.