github ksanyok/TextHumanize v0.33.0
TextHumanize 0.33.0 — Media watermark forensics

3 days ago

Extends TextHumanize beyond text: a new pure-Python (numpy; Pillow optional), offline engine to detect and remove AI-watermark/provenance signals in images, audio and video.

Added — texthumanize.media_watermark

  • detect_media_watermarks(source) / media_watermark_report(source) — parses PNG/JPEG/WebP/GIF/WAV/MP3/FLAC/MP4/MKV for:
    • C2PA / CAI manifests (Content Credentials), XMP digitalSourceType / trainedAlgorithmicMedia, EXIF Software/Make;
    • embedded generation parameters (Stable Diffusion, ComfyUI…) and generator signatures (Midjourney, DALL·E, Adobe Firefly, Leonardo, NovelAI, Ideogram, FLUX, Suno, Sora…);
    • image LSB-steganography and audio out-of-band / ultrasonic anomaly probes.
  • clean_media_watermarks(source, output=...) — strips provenance/metadata and re-serialises PNG/JPEG/WebP/WAV; for MP4/MKV it returns a safe ffmpeg -map_metadata -1 recipe instead of an unsafe in-place rewrite.
  • media_format(bytes) — magic-byte format detection.
  • CLItexthumanize media file.png and texthumanize media file.png --clean -o clean.png.

Honest scope

This audits inspectable metadata and statistical signals. It cannot detect or remove robust in-content neural watermarks such as Google SynthID, which are embedded in the pixels/samples and survive metadata stripping and re-encoding. Treat it as a provenance/forensics and metadata-privacy tool, not a guarantee of erasing every watermark. See the Responsible Use guide.

Don't miss a new TextHumanize release

NewReleases is sending notifications on new releases.