github yzhao062/pyod v3.6.0
v3.6.0: Audio Anomaly Detection Modality

6 hours ago

v3.6.0: Audio Anomaly Detection Modality

Audio joins tabular, time-series, graph, text, and image as a first-class PyOD modality on the agentic and multimodal line. The additions are encoder-agnostic and additive, with no change to existing detectors.

New

  • AudioFeatureEncoder: each clip becomes a 74-dim handcrafted acoustic vector (20 MFCC, 12 chroma, 5 spectral descriptors, each as mean and standard deviation over frames, via librosa). Registered as the audio-mfcc encoder.
  • EmbeddingOD.for_audio(quality=...): fast=IForest, balanced=KNN, best=LUNAR over the audio encoder, so any classical detector runs on audio (embed then detect).
  • AudioAE: DCASE-style log-mel reconstruction autoencoder that reuses the PyOD AutoEncoder, scored by per-clip mean reconstruction error. Requires torch.
  • ADEngine: audio file-path profiling and routing (for_audio as the default, AudioAE as the deep alternative); knowledge-base entries for AudioAE and audio support on EmbeddingOD and MultiModalOD.
  • pip install pyod[audio]: new optional extra (librosa, soundfile).

Counts

Buildable detector count rises from 60 to 61: 61 total (43 tabular, 7 time-series, 8 graph, 2 text, 2 image, 1 multimodal, 3 audio).

Install

pip install --upgrade pyod        # core
pip install "pyod[audio]"         # audio encoder (librosa, soundfile)
pip install "pyod[torch,audio]"   # AudioAE (deep)

References the public methods (the DCASE 2020 Task 2 log-mel autoencoder baseline, and MFCC, chroma, and spectral features via librosa). No breaking API changes.

Full changelog: v3.5.4...v3.6.0

Don't miss a new pyod release

NewReleases is sending notifications on new releases.