github jundot/omlx v0.2.20.dev1

latest releases: v0.3.9.dev1, v0.3.8, v0.3.8rc1...
pre-releaseone month ago

This is a pre-release build for testing purposes. Detailed feature breakdowns will be included in the official release notes. Please test extensively and report any issues.

New Features

  • oQ Quantization — oMLX Universal Dynamic Quantization with oQ2-oQ8 levels, calibration datasets, and CLIP support
  • Accuracy Benchmark — evaluate model intelligence with MMLU, HellaSwag, TruthfulQA, GSM8K, and LiveCodeBench. all datasets bundled locally for offline use. card-style grid UI with per-benchmark sample size selector (30/50/100/200/300/Full) and batch processing (1x/2x/4x/8x)
  • Benchmark Queue — queue multiple models for sequential benchmarking. results persist on the server until explicitly cleared. comparison table in text export for easy cross-model analysis
  • SpecPrefill — attention-based sparse prefill for MoE models. reduces prefill compute by skipping low-attention tokens while preserving output quality
  • Prefill Memory Guard — prevents kernel panics on large context by detecting head_dim>128 O(n²) SDPA fallback and enforcing safe prefill chunk sizes
  • MLX Only filter in model downloader — toggle to show only MLX-converted models
  • Admin override for OCR model sampling params — prevent repetition loops on OCR models

Bug Fixes

  • fix anthropic API temperature default (should be None, not 1.0)

Don't miss a new omlx release

NewReleases is sending notifications on new releases.