v3.6.0: Audio Anomaly Detection Modality
Audio joins tabular, time-series, graph, text, and image as a first-class PyOD modality on the agentic and multimodal line. The additions are encoder-agnostic and additive, with no change to existing detectors.
New
AudioFeatureEncoder: each clip becomes a 74-dim handcrafted acoustic vector (20 MFCC, 12 chroma, 5 spectral descriptors, each as mean and standard deviation over frames, via librosa). Registered as theaudio-mfccencoder.EmbeddingOD.for_audio(quality=...):fast=IForest,balanced=KNN,best=LUNAR over the audio encoder, so any classical detector runs on audio (embed then detect).AudioAE: DCASE-style log-mel reconstruction autoencoder that reuses the PyODAutoEncoder, scored by per-clip mean reconstruction error. Requirestorch.- ADEngine: audio file-path profiling and routing (
for_audioas the default,AudioAEas the deep alternative); knowledge-base entries forAudioAEand audio support onEmbeddingODandMultiModalOD. pip install pyod[audio]: new optional extra (librosa, soundfile).
Counts
Buildable detector count rises from 60 to 61: 61 total (43 tabular, 7 time-series, 8 graph, 2 text, 2 image, 1 multimodal, 3 audio).
Install
pip install --upgrade pyod # core
pip install "pyod[audio]" # audio encoder (librosa, soundfile)
pip install "pyod[torch,audio]" # AudioAE (deep)
References the public methods (the DCASE 2020 Task 2 log-mel autoencoder baseline, and MFCC, chroma, and spectral features via librosa). No breaking API changes.
Full changelog: v3.5.4...v3.6.0