2.0.0 (2026-01-28)
Features
- add AI scenario generation (#1110) (7da469d)
- add CI/CD execution support for evaluations v3 (#1118) (d28adac)
- add COSS licensing enforcement for self-hosted deployments (#1170) (37c30ec)
- add http agent (#1053) (02284be)
- add link to setup evaluations from sdk (2130e30)
- add orchestrator pattern for Claude Code context management (#1163) (7b3415b)
- analytics: track onboarding progress metrics in PostHog (1533de5)
- claude: add rogerio-cto-review agent and worktree command (#1192) (326196a)
- claude: add workflow commands for worktrees and PR review (#1135) (8643e92)
- clickhouse trace filtering (#1079) (12f4b03)
- dev: add Docker Compose dev environment with profiles (#1188) (72e8df5)
- evaluations v3 execution and new evaluations results page (#1113) (510f65d)
- evaluations-v3: add lambda warmup for faster evaluation runs (cc95cca)
- evaluations-v3: implement HTTP agent support (#1196) (7afb24e)
- evaluations-v3: improve table column resizing and overflow handling (d1d3831)
- evaluations-v3: major table performance improvements, prompts to experiment button and other bugfixes (#1181) (2cbf430)
- evaluations-v3: support evaluators/{id} path for database evaluators (2f65327)
- evaluators: add "Use via API" dialog with code snippets (58ccaf5)
- event sourcing powered evaluations (#1090) (fd9898e)
- improve trace/span event sourcing pipeline (#980) (d67854d)
- integrate HTTP agent into scenario/simulations quick run (#1071) (3e3a8d4)
- introduce first step towards dark mode (#1143) (426d776)
- licensing: add centralized license enforcement with resource limits (#1208) (a511233)
- llm-config: upgrade model registry with dynamic parameters and OpenRouter sync (#1115) (f03a283)
- new online evaluations and guardrails setup (#1151) (7c8a804)
- new simulation card design (#1106) (3a116af)
- projects: add drawer-based project creation (#1068) (5620034)
- prompts: show icon-only buttons with tooltips in compare mode (4f4ecfe)
- refactor model providers UI to drawer-based (#1050) (8c8df73)
- regenerate api key (#1083) (e09bf3f)
- scenarios: add help text and tooltips to scenario form fields (#1128) (fec3e73)
- sdk: add online evaluations API and ensureSetup for TypeScript (2209258)
- traces: add reasoning tokens and effort support for LLM models (16f1d4a)
- track events as spans for REST API (#1089) (ec8243e)
- ui: revamp LLM parameter controls with button-based selects (9a42d93)
- update onboarding for new go sdk shape (#1225) (ae6b6a2)
- use programmatic langwatch config in scenario runner (#1074) (34a9d62)
- walking skeleton for scenarios (#1047) (f6acbb8)
Bug Fixes
- add vendor folder before installation to fix docker build (292fe83)
- add z-index to tooltip (#1078) (1804329)
- annotation highlight scroll (#1073) (7e3471d)
- base64 markdown rendering (8017548)
- check if graph exists (#1067) (eef4089)
- ci: add pnpm-lock.yaml for agentic-e2e-tests (#1216) (761a1a0)
- clickhouse replication issue with goose migrations + tables not replicating correctly (#1116) (db6638f)
- cluster goose db (#1140) (2cc0e69)
- config: disable HSTS and upgrade-insecure-requests in development (#1149) (f88086e)
- dspy: capture full message output including reasoning_content (8257cae)
- elasticsearch migrations for batch evals for new target fields (530bb73)
- evaluations-v3: display Code Agent outputs with custom field names (#1226) (9a69c53)
- evaluations-v3: fix type errors in httpAgentUtils and dslAdapter (3196cc7)
- evaluations-v3: pass all LLM params including reasoning to targets (c786a73)
- evaluations-v3: persist all LLM parameters in local prompt config (4b87561)
- evaluations-v3: prevent autosave data loss on back navigation (335e571)
- event sourcing improvements from testing (#1109) (2a400db)
- fix emojis without breaking multiline prompt evaluators anymore (2d47925)
- fix failing unit tests (9f9ad87)
- goose migrate missing priming row (#1145) (5698c57)
- goose migration directory was wrong in dockerfile (#1105) (ecd620b)
- improve dedupe logic, and fix span dropping issue in span storage event handler (#1201) (3b43fae)
- improve locking contention delay config and error handling (#1171) (5d84748)
- light mode token changes + hide theme selector if no feature flag (#1152) (6729925)
- litellm: fix Anthropic model integration issues (#1197) (1ed2c7f)
- llm-config: smart max_tokens handling on model switch (7513131)
- make otlp validation and parsing less strict, to support more otlp protocol versions (#1148) (dc1e1eb)
- navigation to the same drawer url, get the trace id button on the conversation working again (f906f56)
- normalize otlp ids to guaranteed otel ids (#1164) (2e54acb)
- normalize span IDs to hex strings before BullMQ queue (2e54acb)
- onboarding: prevent model provider credential inputs from resetting (#1060) (ca8b8ee)
- prompts: default maxTokens to undefined for model-based defaults (4a36aee)
- prompts: show Bedrock models in model selector dropdown (#1206) (2e49e01)
- prompts: structured outputs with custom field names and types (#1112) (d1c0370)
- prompts: use model's actual max_tokens for new prompts (5aaa234)
- proper terminology on analytics and add linking button for the graph (15e3c2c)
- properly handle clickhouse engine tag macros for replicated cluster configs (#1111) (6052374)
- python-sdk: resolve name collision between Evaluation TypedDict and class alias (873909a)
- react imports on deja view (#1160) (6514a1a)
- remove duplicate evaluations unit test (already in integration) (#1177) (8ed9d28)
- rework pie/donut data and colours (#1055) (8d50910)
- scenario editor UX improvements and bug fixes (#1086) (1d44f72)
- set correct ksuid environment in worker (#1173) (10ec064)
- small project drawer title fix, make + Add clickable (9746fdb)
- tests: align license router tests with RBAC middleware behavior (#1207) (1def54d)
- tests: normalize column IDs to names in orchestrator integration test (dc7c2ea)
- unit tests and typecheck (802ccc1)
- various evaluations v3 fixes (#1122) (c9904fc)
Miscellaneous
- ✨ new readme preview video 💅🏼 (#1036) (ba949c5)
- eval pagination footer (#1044) (aaea14f)
- fix all biome lint issues (#1121) (d83bb6e)
- improve stressed+blessed event sourcing tooling (#1108) (82ccab6)
- main: release python-sdk 0.10.0 (#1142) (749a977)
- main: release python-sdk 0.9.0 (#1114) (0f24551)
- migrate Cursor config to Claude Code system (#1147) (fc20384)
- remove litellm enterprise deps, add license file generation (792243a)
- standardize top-level rules on AGENTS.md, remove duplicate CLAUDE.md (#1150) (43ba172)
- sync model registry (43bb203)
- sync model registry (363 models) (#1138) (43bb203)
- update where goose migration db is stored + improve handling (#1141) (e9265ed)
Documentation
- add Repository + Service pattern documentation (#1190) (fa6a81e)
- extract design principles from PR #1025 into searchable documentation (#1139) (41e57d2)
- improve Claude Code agent configuration and BDD workflow (#1189) (e2a1e2d)
- move TESTING.md to docs/TESTING_PHILOSOPHY.md (#1157) (c475c86)
- standardize worktree and branch naming conventions (#1211) (c3ef006)