Highlights
- New TCP check toolset
- Improve context window management using jq for tool call results.
- Fix ulimit issue on arm base machines
- New GCP & AKS mcp
- Breaking Change - Improvements for elasticsearch/opensearch toolset - Docs
What's Changed
- tell claude to use -s by @aantn in #1253
- Backstage - with dco fixes by @aantn in #1254
- add grafana dashboard eval where it is too big by @Sheeproid in #1255
- Skip building images on PRs from external users (forks) by @aantn in #1249
- Optimize Docker Build on PR: add caching and build amd64 only by @aantn in #1257
- add to regression evals tests where non sonnet models ask for clarifications by @Sheeproid in #1259
- Allow manually triggering evals by @aantn in #1258
- Improvements to manual evals run by @aantn in #1265
- Minor tweaks to evals by @aantn in #1266
- [ROB-2864] gcp docs update by @Avi-Robusta in #1268
- Evals: add additional info to github reports by @aantn in #1267
- delete loki test that new implementation solves by @RoiGlinik in #1274
- ROB-2062 New logic to enable/disable telemetry for saas env by @RoiGlinik in #1275
- github action bug fix: wait for calico CRD to be established by @Sheeproid in #1278
- Add connectivity check toolset (TCP only) with tests and eval fixture by @Sheeproid in #1273
- Accept OPENROUTER_API_KEY/BASE for LLM evals and classifier checks by @Sheeproid in #1282
- Add historical timing data to evals by @aantn in #1272
- Improve eval notification UX: delete progress comment and create fresh comment by @aantn in #1283
- Improvements to /eval command by @aantn in #1290
- Fix fork detection when head repo is deleted or unavailable by @aantn in #1256
- Remove post-processing support by @aantn in #1279
- Running a focused set of benchmark with regression + benchmark tag by @Sheeproid in #1298
- Json mixin for tools by @aantn in #1280
- /evals dx improvements by @aantn in #1303
- chore: onboard isort 7.0.0 precommit check by @mainred in #1252
- Reduce number of comments on PRs from evals by @aantn in #1309
- tech debt: remove dead code functions by @aantn in #1311
- Raise ulimit memory limit by @moshemorad in #1317
- Elasticsearch Toolset by @aantn in #1302
- harden security on /eval (rework) by @aantn in #1320
- Update Prometheus toolset config by @aantn in #1319
- azure mcp integration by @arikalon1 in #1318
- GCP MCP integration by @arikalon1 in #1310
- .github: README: Add OpenSSF scorecard badge by @illume in #1196
New Contributors
Full Changelog: 0.18.1...0.18.2