jundot/omlx v0.2.20.dev2 on GitHub

This is a pre-release build for testing purposes.

New Features

Hybrid quantization modes — per-layer mxfp4/mxfp8/affine format selection for better quality-size tradeoffs
Clip optimization speedup — GPU batch size setting for faster AWQ-style clipping
Block inference during quantization — prevents request conflicts while oQ is running
Download raw results — export benchmark results as JSON
Use model sampling settings — benchmarks now respect per-model sampling parameters