What's New
NVIDIA Sortformer v2 - Complete rewrite of speaker diarization with streaming support. Handles long audio files (25+ minutes) natively without memory issues.
Features
- Streaming Architecture: Processes audio in ~10s chunks with speaker cache for consistent speaker tracking across entire recordings
- Pre-tuned Configs: Choose from callhome(), dihard3() or custom() (your custom use case)
- Custom Thresholds: Fine-tune onset/offset thresholds and minimum segment durations for your use case
- 4 Speaker Support: Identifies and tracks up to 4 distinct speakers
Usage
use parakeet_rs::sortformer::{Sortformer, DiarizationConfig};
let mut sortformer = Sortformer::with_config(
"diar_streaming_sortformer_4spk-v2.onnx",
None,
DiarizationConfig::callhome(),
)?;
let segments = sortformer.diarize(audio, 16000, 1)?;Download
Get the model: https://huggingface.co/altunenes/parakeet-rs/blob/main/diar_streaming_sortformer_4spk-v2.onnx
See examples/diarization.rs for combining diarization with TDT transcription.
Full Changelog: v0.2.3...v0.2.4
add diar_streaming_sortformer_4spk-v2 onnx by @altunenes in #23