Release v0.1.4
TLDR:
- Refactor of
Environment.a_generate+Rubricinternals to support interleaved generation and scoring (enabled by default) - Addresses lots of other small issues + QoL improvements, see below for details
What's Changed
- vf-tui parsing crashes when tool_calls contains JSON strings instead of dicts by @nancyjlau in #250
- Fix eval saving failing on
-n -1by @mikasenghaas in #255 - Fix missing parser parameter in Rubric instances across environments by @bdsaglam in #276
- fix(tui): escape user content to prevent markup injection issues by @srthkdev in #273
- Make answer + info both optional by @willccbb in #282
- docs(env): clarify optional answer/info fields and evaluation behavior by @srthkdev in #268
- docs(rubric): add documentation for passing class objects to reward functions by @srthkdev in #269
- feat(verifiers): add MathRubric to verifiers module by @srthkdev in #263
- chore(eval): add logging throughout evaluation script for better traceability by @srthkdev in #262
- fix markuperror in completion by @jalexine in #284
- Math python tweak by @willccbb in #286
- fix: add robust function schema parsing by @dhruvrnaik in #285
- disable max turns default by @willccbb in #292
- Fix
max_concurrent_requestsin eval script and also use for rollout scoring by @mikasenghaas in #294 - Rename
max_concurrent_requeststomax_concurrentby @mikasenghaas in #295 - Fix log verbosity of third-party packages in eval script by @mikasenghaas in #296
- fix: Propagate tool_call_id in prompt messages by @walln in #318
- Higher timeouts and limits for eval client by @mikasenghaas in #316
- Update init.py for StatefulToolEnv by @stangirala in #306
- Option to find last instance of \boxed{} by @kyleavery in #310
- fix(envs): prevent IndexError by capping dataset selection range by @srthkdev in #314
- fix: correct grammar in README - remove extra word 'in' by @Traddoo in #283
- Optionally init environment as multi-file package by @mikasenghaas in #300
- Added audio modality support by @yurpl in #312
- Will/normalize n eval by @willccbb in #319
- fix(verifiers): add error handling for judge model API calls by @srthkdev in #291
- actions update, publish-environments by @willccbb in #323
- fix(env_utils): add detailed logging to environment loader by @srthkdev in #309
- fix(envs): Async tool call bug in StatefulToolEnv by @bdsaglam in #326
- fix: num_iterations by @ZhichenRen in #320
- refactor(envs): rename push_to_hub to push_to_hf_hub in make_dataset by @srthkdev in #336
- fix(env_utils): improve error message for load_environment function absence by @srthkdev in #335
- orchestration refactor for interleaved generation and scoring by @willccbb in #324
- docs: document environments hub usage patterns by @willccbb in #344
- mcp verifiers env by @cdreetz in #343
- AGENTS.md by @willccbb in #346
- Add ty pre-commit hook by @willccbb in #347
- ARC-AGI-3 environment by @willccbb in #348
- fix typing issues by @willccbb in #355
- Add import regression tests to prevent missing exports by @fsndzomga in #353
- Fix RubricGroup score_rollout parser handling by @willccbb in #357
- v0.1.4 release by @willccbb in #358
New Contributors
- @nancyjlau made their first contribution in #250
- @bdsaglam made their first contribution in #276
- @srthkdev made their first contribution in #273
- @jalexine made their first contribution in #284
- @dhruvrnaik made their first contribution in #285
- @walln made their first contribution in #318
- @stangirala made their first contribution in #306
- @kyleavery made their first contribution in #310
- @Traddoo made their first contribution in #283
- @yurpl made their first contribution in #312
- @ZhichenRen made their first contribution in #320
- @cdreetz made their first contribution in #343
- @fsndzomga made their first contribution in #353
Full Changelog: v0.1.3...v0.1.4