DeepTutor v1.2.1 Release Notes
Release Date: 2026.04.21
Highlights
Per-Stage Token Limits & Temperature for Chat (#348)
Promoted the agentic chat pipeline to a first-class config citizen in agents.yaml. A new capabilities.chat block exposes per-stage max_tokens (responding, answer_now, thinking, observing, acting, react_fallback) and a shared temperature, deep-merged over baked-in defaults via services/config/loader.py::get_chat_params(). The responding and answer_now budgets jump from the previous hard-coded 1800 to 8000, eliminating the mid-sentence truncation that was clipping long answers. Internally, _ChatLimits.from_config coerces every legacy shape (missing keys, scalar instead of dict, partial overrides) into a stable dataclass so existing installs keep working without touching their YAML. 10 new unit tests cover loader resolution, deep-merge precedence, and dataclass coercion.
Regenerate Last Response — CLI, WebSocket, Web UI (#349)
Added a real regenerate flow that re-runs the previous user message in place, working uniformly across every entry point:
- CLI —
/regenerate(alias/retry) inside thedeeptutor chatREPL. - WebSocket —
type: "regenerate"message on/api/v1/ws, with optionaloverridesforcapability,tools,knowledge_bases,language,config. - Web UI — a per-message Regenerate button (
RefreshCcwicon) on the last assistant turn for chat-capability replies.
On the backend, TurnRuntimeManager.regenerate_last_turn rolls back the trailing assistant via the new SQLiteSessionStore.delete_message / get_last_message helpers, then reuses start_turn with _persist_user_message=False and _regenerate=True so the user row isn't duplicated and memory_service.refresh_from_turn isn't run a second time. Pre-flight checks raise non-fatal regenerate_busy (another turn is running) or nothing_to_regenerate (no prior user message) errors instead of silently failing. The _stage_responding LLM call also gained empty-response diagnostics that surface a structured warning when the model returns no content. 18 new tests cover all three reject paths, the delete-then-restart flow, the memory-refresh-skip contract, and the WebSocket round-trip.
UI Harmony Polish for Regenerate
Two follow-up tweaks so the new button matches the rest of the chat UI and behaves predictably under server rejection:
- i18n — added
Regeneratekeys toweb/locales/{en,zh}/app.json(Regenerate/重新生成) and switchedChatMessages.tsxfrom a hardcoded"Regenerate"string tot("Regenerate"), matching the existingt("Copy")pattern in the same row. - Optimistic-pop rollback — when the server rejects a regenerate request pre-flight (
regenerate_busy/nothing_to_regenerate), the optimisticPOP_LAST_ASSISTANT+STREAM_STARTplaceholder is now restored via a newRESTORE_ASSISTANTreducer action. The popped message is held in a per-keypendingRegenerateRefand cleared ondoneor any terminal result, so the transcript never silently loses the user's last reply.
Bug Fixes
- Dark code blocks unreadable on light theme (#352) — the hard-coded
#1f2937/#292524code-block background combined with.prose pre(forcing#D6D3D1) and.prose code:not(.md-code-block__code):not(.md-inline-code)(overriding tovar(--foreground)) was producing near-black text on near-black backgrounds in light mode. Added a higher-specificity.md-renderer .md-code-block { ,pre,code }rule that pins#e5e7ebregardless of theme, and tagged the<code>elements inRichMarkdownRendererandSimpleMarkdownRendererfallbacks with the existingmd-code-block__codeclass so the:not()guard kicks in. Thanks @DarkGenius. - None embeddings crashed LlamaIndex pipeline (#347, fixes #346) — when an embedding provider returns
nullfor a chunk's vector, theNoneended up in the vector index and blew upnp.dot(NoneType)during similarity computation. Two-layer fix:_extract_embeddings_from_responsenow usesor []instead ofget(key, default)so explicitNonevalues are caught, andCustomEmbedding._get_text_embeddingsvalidates the batch result and substitutes a zero vector for anyNoneslot. Thanks @kagura-agent. - Gemma models rejected
json_objectresponse_format (#345, fixes #344) — Gemma served through LM Studio (and similar local OpenAI-compatible servers) responds400 "'response_format.type' must be 'json_schema' or 'text'"when handedresponse_format={"type": "json_object"}. Addedsupports_response_format: Falseto the existinggemmaMODEL_OVERRIDESentry so the json_object path is skipped; the existingextract_json_objectutilities in the visualize and math-animator agents already parse JSON from plain text, so all callers continue to work without further changes. Thanks @octo-patch.
Test Suite Expansion
Net +575 test lines: 10 cases for the chat-params loader / _ChatLimits coercion (tests/services/config/test_chat_params_config.py), 18 cases for the regenerate flow including all three reject paths, the in-place delete + restart, the memory-refresh skip, and the end-to-end no-duplicate-user contract (tests/services/session/test_regenerate.py), 14 cases for supports_response_format model overrides (tests/services/llm/test_capabilities.py), and a regression test for the None-embedding extraction path (tests/services/embedding/test_extract_embeddings.py).
What's Changed
- fix(rag): guard against None embeddings in LlamaIndex pipeline by @kagura-agent in #347
- fix: disable json_object response_format for gemma models by @octo-patch in #345
- fix(web): ensure readable text in dark code blocks on light theme by @DarkGenius in #352
- feat(chat): make per-stage token limits and temperature configurable via agents.yaml by @DarkGenius in #348
- feat(chat): regenerate last response (CLI, WebSocket, Web UI) by @DarkGenius in #349
Community Contributions
- @DarkGenius — Make per-stage chat token limits configurable via
agents.yaml(#348) - @DarkGenius — Regenerate last response across CLI / WebSocket / Web UI (#349)
- @DarkGenius — Ensure readable text in dark code blocks on light theme (#352)
- @kagura-agent — Guard against
Noneembeddings in the LlamaIndex pipeline (#347) - @octo-patch — Disable
json_objectresponse_format for Gemma models (#345)
Full Changelog: v1.2.0...v1.2.1