github Xentraxx/GooglePhotosTakeoutHelper_Neo v6.0.0

latest releases: v6.1.3, v6.1.2, v6.1.1...
9 days ago

New Features

  • Added progress bar to Step 5 (Find Albums) to show album association processing progress
  • Step 1: Extension fixing now replaces the incorrect extension instead of appending — Previously, a file like vacation_sunset.heic (actually JPEG) would be renamed to vacation_sunset.heic.jpg. Now it becomes vacation_sunset.jpg. The associated JSON sidecar and any supplemental-metadata JSON files are atomically renamed to match. This produces cleaner output filenames with no change in metadata accuracy, since all downstream steps already used only the final extension. The double-extension handling in the truncated filename fixer (Step 4) has been kept for natural Pixel-style suffixes (.PANO.jpg, .MP.mp4, etc.) which are not affected by this change.
  • Step 3 progress bar now fills in real time — Previously the hashing phase (groupIdenticalFast2) only printed a text message every 50 size groups (and only in verbose mode), so the progress bar appeared to jump to 100% instantly at the end. A FillingBar is now created before the bucket-processing loop and updated after each slice of size groups finishes, giving continuous visual feedback during the (potentially long) deduplication hashing phase.
  • Step 7 progress bar unified — The two separate bars ("Writing EXIF data" and "Flushing pending EXIF writes") are now a single bar that tracks all output files from start to finish. The total is pre-counted before processing begins, so the bar fills steadily across the whole step without a surprise second bar appearing at the end.

🚀 Performance Improvements

Overall pipeline performance is approximately 3× faster on a modern PC with an SSD compared to the previous version, based on real-world tests (400GB takeout now 38m instead of 1h 20m) due to the following changes:

  • Step 1 (Fix Extensions): single-pass collection + parallel processing — Previously the directory was traversed twice: once to count files (for the progress bar) and once to process them. A single toList() now serves both purposes, and _processFile (128-byte header read + MIME check + optional rename) runs in parallel Future.wait batches at diskOptimized concurrency.
  • Step 2 (Discover Media): parallel JSON partner-sharing checksjsonPartnerSharingExtractor was called sequentially per file inside the stream loop. Files are now collected first, then processed with Future.wait in batches of diskOptimized concurrency, reducing total I/O wait from a sum of latencies to roughly one batch-time per N/batchSize iterations.
  • Step 6 (Move Files): parallel file operationsmoveAll now calls the parallel variant of the move engine (moveMediaEntitiesParallel) with diskOptimized concurrency (cores × 8, max 32) instead of the sequential one-entity-at-a-time loop. The parallel implementation already existed but was dead code. For cross-drive copy operations this is the single largest win.
  • Step 7 EXIF batch write throughput dramatically improvedstableTagsetKey previously grouped files by tag names and values, so every file landed in its own 1-entry bucket (unique date string = unique key). The threshold check never fired, and the final flush called one ExifTool process per file. The batch queue now groups by tag names only; all files needing the same tag set (e.g. DateTimeOriginal + DateTimeDigitized + DateTime) land in a single bucket. ExifTool's batch mode already supports different values per file by interleaving per-file args before each filename, so correctness is unchanged. This results in a large reduction in ExifTool process spawns for typical collections.
  • 7-Zip extraction speed improved — The 7-Zip extractor now uses -mmt=N (explicit thread count equal to Platform.numberOfProcessors) instead of -mmt=on, and suppresses stdout/stderr/progress pipe output (-bso0 -bse0 -bsp0) to reduce I/O overhead. For large archives with many files this avoids unnecessary process pipe traffic and lets 7-Zip use all available CPU cores.
  • Step 7: ExifTool stay-open IPC — zero Perl startup overhead — All ExifTool write operations (single-file and batch) are now routed through a single long-running exiftool -stay_open True process started once at launch. On Linux / WSL, Perl startup costs ~1-2 s per invocation; every write now takes only the actual I/O time. The argfile batch path also no longer needs a temp file when stay-open is active, as stdin has no command-line length limit. Falls back transparently to one-shot invocations if the persistent process fails to start.
  • Parallel ZIP extraction — When 7-Zip is available and multiple ZIPs are present, archives are now extracted concurrently. Concurrency scales with core count (max(2, N÷4), capped at 4) so 4-core machines run 2 in parallel, 8-core run 3, 16-core run 4, etc. Since extraction is I/O-bound (JPEGs are already compressed, so Deflate adds negligible CPU work), each process receives the full processor count (-mmt=N) rather than a split share — threads mostly block on I/O and don't compete for CPU. Native Dart extraction remains sequential to avoid simultaneous heap pressure from two large ZIPs.
  • Step 7: Large MOV/MP4 files with oversized QuickTime atoms are no longer retried — ExifTool emits atom is too large for rewriting when a video file's data block exceeds its internal rewrite limit (e.g. a 676 MB MOV file). Previously this produced 4–6 noisy log lines and a pointless single-file retry. The error is now recognised as unrecoverable: the batch-level "retrying" message is suppressed, no per-file retry is attempted, and a single clear [WARNING] is emitted per affected file stating that the file was still sorted correctly.
  • JSON sidecar read consolidated (Steps 4 + 7) — Each media file's .json sidecar was previously parsed up to three times: once for the date, once for GPS coordinates, and again in Step 7 to retrieve coordinates for EXIF writing. GPS is now extracted alongside the date in a single read during Step 4 and cached on the entity, so Step 7 requires no additional file I/O for GPS data.
  • GPS data from geoDataExif now correctly used — The coordinate extractor previously only read from the geoData field of the JSON sidecar. Google Photos also stores the original camera-recorded GPS in geoDataExif, which is often the only source of valid coordinates (e.g. for videos, photos edited by third-party apps that strip EXIF, or photos tagged after upload). The extractor now prefers geoDataExif and falls back to geoData, significantly increasing the number of files that receive GPS in their output EXIF..
  • Step 3: XXH3 replaces hand-rolled FNV-1a for quick-signature and fingerprint hashing — The 32-bit FNV-1a closure used in _quickSignature and the 64-bit FNV-1a method used in _triSampleFingerprint are replaced by XXH3 (via package:xxh3). XXH3 is approximately 10× faster than SHA-256 and significantly faster than FNV-1a on the 4 KiB slices read per bucket candidate, while providing 64-bit hash quality.
  • Full-file content hashing uses XXH3 instead of SHA-256MediaHashService.calculateFileHash (the definitive byte-for-byte equality check used before any file is discarded) now uses xxh3String for small files and the xxh3Stream chunked API for large files. This replaces the previous package:crypto SHA-256 implementation. The package:crypto dependency has been removed.

🐛 Bug Fixes

  • Step 1: Extension collision resolved with unique filename instead of skip — When fixing a file's extension would produce a name that already exists (e.g. teams_jens.pngteams_jens.jpg but teams_jens.jpg already exists due to storage-saver mode), the file is now renamed to the next available unique name (teams_jens(1).jpg) using the same (N) counter logic as the move step. Files with an existing counter suffix are handled correctly: teams_jens(1).pngteams_jens(1)(1).jpg. Previously such files were silently left with the wrong extension, which later caused ExifTool batch failures ("Not a valid PNG — looks more like a JPEG") with a noisy multi-round binary-split cascade in Step 7.
  • Windows: emoji album folders no longer cause pipeline failures — On Windows, Directory.list() throws when a path contains certain emoji characters. Album directories with emoji names (e.g. Holiday Memories 🎄) are now temporarily renamed to a hex-encoded form at the start of the pipeline and restored immediately after all steps complete. Output album folders always use the original emoji name. If the process crashes mid-run, the hex-encoded names are detected and restored automatically on the next run via progress.json.
  • Step 1: Extension fixing no longer skips edited files by default — Files with language-specific "edited" suffixes (e.g.-edited) were unconditionally skipped during extension fixing, regardless of the --skip-extras flag. This meant a file like IMG_3376-bearbeitet.HEIC that was actually a JPEG would keep its wrong extension and fail later with Not a valid HEIC (looks more like a JPEG). The guard is now conditional: edited files are only skipped during extension fixing when --skip-extras is explicitly set.
  • Added archiveren as a recognised Dutch and archivieren as a German special folder name (Google Photos exports this as a mistranslation of "Archive" for NL/GER users).
  • Windows: trailing backslash in quoted paths--input "path\" and --output "path\" now work correctly. The trailing path separator is stripped before processing; previously the C-runtime interpreted \" as an escaped quote, causing subsequent flags to be swallowed into the path value. If the resulting path value still appears to contain embedded flags (e.g. --input "path\" --output ...), GPTH now exits with a clear diagnostic message instead of silently failing.
  • Suppressed a misleading batch-level ExifTool warning for InteropIFD errors. Those files are already retried individually (introduced in v5.1.1), so logging the whole batch as failed gave the false impression that every file in the batch was broken.
  • Step 7: UTC offset tags now written natively for JPEGs, fixing InteropIFD corruption warningsOffsetTime, OffsetTimeOriginal, and OffsetTimeDigitized are now written inside the native JPEG write methods (writeDateTimeNativeJpeg / writeCombinedNativeJpeg) together with the date tags, eliminating the second ExifTool invocation that previously followed every successful native write. This is also more resilient for files with a corrupt InteropIFD: the image library's sub-IFD reader wraps each sub-IFD in a try/catch and silently drops any that fail to parse, then the writer removes the dangling 0xA005 pointer — so the output JPEG has a clean EXIF block with no corrupt InteropIFD, rather than triggering ExifTool's Truncated InteropIFD directory error. The ExifTool fallback path (used when the native write itself fails) is untouched and still includes the strip-and-retry logic from v5.1.1. This addresses an issue introduced in version 5.0.9, during the fix of the UTC bug.
  • Step 7: Large MOV/MP4 files with oversized QuickTime atoms are no longer retried — ExifTool emits atom is too large for rewriting when a video file's data block exceeds its internal rewrite limit (e.g. a 676 MB MOV file). Previously this produced 4–6 noisy log lines and a pointless single-file retry. The error is now recognised as unrecoverable: the batch-level "retrying" message is suppressed, no per-file retry is attempted, and a single clear [WARNING] is emitted per affected file stating that the file was still sorted correctly.
  • Step 4: Truncated filename fixer no longer duplicates Pixel suffixes — Files with double extensions containing Pixel-specific suffixes (.PANO.jpg, .MP.mp4, .NIGHT.jpg, .vr.jpg) had the suffix doubled when the truncated filename fixer restored the full name from JSON metadata (e.g. PXL_20230518_095458599.PANO.PANO.jpg). The title's extension is now stripped symmetrically with the filename's, preventing the duplication.

🚀 Improvements

  • 7-Zip detection logged once — The 7-Zip executable path is now resolved once per extraction session (cached in the service instance) and reported via a single [ INFO ] message. Previously the path was re-detected for every ZIP file, producing no visible confirmation at all in CLI mode.
  • Removed noise in verbose logs and ensured more accurate representation of errors/warnings
  • Step 7: MTS, M2TS, WMV, AVI, MPEG, and BMP files are now skipped before ExifTool is called — ExifTool does not support writing metadata to these formats. Previously they were passed to ExifTool individually, producing [WARNING] ExifTool command failed noise for every such file. They are now detected upfront by extension and MIME type and silently skipped (a single warning is still logged per file unless warnings are silenced).
  • Refactoring, offloading complex logic in separate files for maintainability and removed legacy code.

🔧 Internal

  • Replaced custom _Mutex class with Pool(1) from package:pool in MediaHashService — same single-access semantics with less custom code.
  • Replaced hand-rolled LinkedHashMap LRU cache (~60 lines) with LruCache from package:lru in MediaHashService.
  • Added type-safe toJson() / fromJson() serialization to MediaEntity, FileEntity, and AlbumEntity, replacing ~260 lines of duck-typed dynamic casting in ProgressSaverService.

Don't miss a new GooglePhotosTakeoutHelper_Neo release

NewReleases is sending notifications on new releases.