ksanyok/TextHumanize v0.33.0 on GitHub

Extends TextHumanize beyond text: a new pure-Python (numpy; Pillow optional), offline engine to detect and remove AI-watermark/provenance signals in images, audio and video.

Added — `texthumanize.media_watermark`

detect_media_watermarks(source) / media_watermark_report(source) — parses PNG/JPEG/WebP/GIF/WAV/MP3/FLAC/MP4/MKV for:
- C2PA / CAI manifests (Content Credentials), XMP digitalSourceType / trainedAlgorithmicMedia, EXIF Software/Make;
- embedded generation parameters (Stable Diffusion, ComfyUI…) and generator signatures (Midjourney, DALL·E, Adobe Firefly, Leonardo, NovelAI, Ideogram, FLUX, Suno, Sora…);
- image LSB-steganography and audio out-of-band / ultrasonic anomaly probes.
clean_media_watermarks(source, output=...) — strips provenance/metadata and re-serialises PNG/JPEG/WebP/WAV; for MP4/MKV it returns a safe ffmpeg -map_metadata -1 recipe instead of an unsafe in-place rewrite.
media_format(bytes) — magic-byte format detection.
CLI — texthumanize media file.png and texthumanize media file.png --clean -o clean.png.

Honest scope

This audits inspectable metadata and statistical signals. It cannot detect or remove robust in-content neural watermarks such as Google SynthID, which are embedded in the pixels/samples and survive metadata stripping and re-encoding. Treat it as a provenance/forensics and metadata-privacy tool, not a guarantee of erasing every watermark. See the Responsible Use guide.

ksanyok/TextHumanize v0.33.0 TextHumanize 0.33.0 — Media watermark forensics on GitHub

Added — texthumanize.media_watermark

Honest scope

ksanyok/TextHumanize v0.33.0
TextHumanize 0.33.0 — Media watermark forensics

on GitHub

Added — `texthumanize.media_watermark`