jundot/omlx v0.2.20.dev1 on GitHub

This is a pre-release build for testing purposes. Detailed feature breakdowns will be included in the official release notes. Please test extensively and report any issues.

New Features

oQ Quantization — oMLX Universal Dynamic Quantization with oQ2-oQ8 levels, calibration datasets, and CLIP support
Accuracy Benchmark — evaluate model intelligence with MMLU, HellaSwag, TruthfulQA, GSM8K, and LiveCodeBench. all datasets bundled locally for offline use. card-style grid UI with per-benchmark sample size selector (30/50/100/200/300/Full) and batch processing (1x/2x/4x/8x)
Benchmark Queue — queue multiple models for sequential benchmarking. results persist on the server until explicitly cleared. comparison table in text export for easy cross-model analysis
SpecPrefill — attention-based sparse prefill for MoE models. reduces prefill compute by skipping low-attention tokens while preserving output quality
Prefill Memory Guard — prevents kernel panics on large context by detecting head_dim>128 O(n²) SDPA fallback and enforcing safe prefill chunk sizes
MLX Only filter in model downloader — toggle to show only MLX-converted models
Admin override for OCR model sampling params — prevent repetition loops on OCR models

Bug Fixes

fix anthropic API temperature default (should be None, not 1.0)