Release v0.1.4

TLDR:

Refactor of Environment.a_generate + Rubric internals to support interleaved generation and scoring (enabled by default)
Addresses lots of other small issues + QoL improvements, see below for details

What's Changed

vf-tui parsing crashes when tool_calls contains JSON strings instead of dicts by @nancyjlau in #250
Fix eval saving failing on -n -1 by @mikasenghaas in #255
Fix missing parser parameter in Rubric instances across environments by @bdsaglam in #276
fix(tui): escape user content to prevent markup injection issues by @srthkdev in #273
Make answer + info both optional by @willccbb in #282
docs(env): clarify optional answer/info fields and evaluation behavior by @srthkdev in #268
docs(rubric): add documentation for passing class objects to reward functions by @srthkdev in #269
feat(verifiers): add MathRubric to verifiers module by @srthkdev in #263
chore(eval): add logging throughout evaluation script for better traceability by @srthkdev in #262
fix markuperror in completion by @jalexine in #284
Math python tweak by @willccbb in #286
fix: add robust function schema parsing by @dhruvrnaik in #285
disable max turns default by @willccbb in #292
Fix max_concurrent_requests in eval script and also use for rollout scoring by @mikasenghaas in #294
Rename max_concurrent_requests to max_concurrent by @mikasenghaas in #295
Fix log verbosity of third-party packages in eval script by @mikasenghaas in #296
fix: Propagate tool_call_id in prompt messages by @walln in #318
Higher timeouts and limits for eval client by @mikasenghaas in #316
Update init.py for StatefulToolEnv by @stangirala in #306
Option to find last instance of \boxed{} by @kyleavery in #310
fix(envs): prevent IndexError by capping dataset selection range by @srthkdev in #314
fix: correct grammar in README - remove extra word 'in' by @Traddoo in #283
Optionally init environment as multi-file package by @mikasenghaas in #300
Added audio modality support by @yurpl in #312
Will/normalize n eval by @willccbb in #319
fix(verifiers): add error handling for judge model API calls by @srthkdev in #291
actions update, publish-environments by @willccbb in #323
fix(env_utils): add detailed logging to environment loader by @srthkdev in #309
fix(envs): Async tool call bug in StatefulToolEnv by @bdsaglam in #326
fix: num_iterations by @ZhichenRen in #320
refactor(envs): rename push_to_hub to push_to_hf_hub in make_dataset by @srthkdev in #336
fix(env_utils): improve error message for load_environment function absence by @srthkdev in #335
orchestration refactor for interleaved generation and scoring by @willccbb in #324
docs: document environments hub usage patterns by @willccbb in #344
mcp verifiers env by @cdreetz in #343
AGENTS.md by @willccbb in #346
Add ty pre-commit hook by @willccbb in #347
ARC-AGI-3 environment by @willccbb in #348
fix typing issues by @willccbb in #355
Add import regression tests to prevent missing exports by @fsndzomga in #353
Fix RubricGroup score_rollout parser handling by @willccbb in #357
v0.1.4 release by @willccbb in #358

New Contributors

@nancyjlau made their first contribution in #250
@bdsaglam made their first contribution in #276
@srthkdev made their first contribution in #273
@jalexine made their first contribution in #284
@dhruvrnaik made their first contribution in #285
@walln made their first contribution in #318
@stangirala made their first contribution in #306
@kyleavery made their first contribution in #310
@Traddoo made their first contribution in #283
@yurpl made their first contribution in #312
@ZhichenRen made their first contribution in #320
@cdreetz made their first contribution in #343
@fsndzomga made their first contribution in #353

Full Changelog: v0.1.3...v0.1.4

PrimeIntellect-ai/verifiers v0.1.4 on GitHub

Release v0.1.4

What's Changed

New Contributors

PrimeIntellect-ai/verifiers v0.1.4
on GitHub