This is a pre-release build for testing purposes.
New Features
- Hybrid quantization modes — per-layer mxfp4/mxfp8/affine format selection for better quality-size tradeoffs
- Clip optimization speedup — GPU batch size setting for faster AWQ-style clipping
- Block inference during quantization — prevents request conflicts while oQ is running
- Download raw results — export benchmark results as JSON
- Use model sampling settings — benchmarks now respect per-model sampling parameters
Bug Fixes
- fix MC benchmarks (MMLU, HellaSwag, TruthfulQA) always scoring 0%