github HKUDS/DeepTutor v1.2.1
DeepTutor-v1.2.1

4 hours ago

DeepTutor v1.2.1 Release Notes

Release Date: 2026.04.21

Highlights

Per-Stage Token Limits & Temperature for Chat (#348)

Promoted the agentic chat pipeline to a first-class config citizen in agents.yaml. A new capabilities.chat block exposes per-stage max_tokens (responding, answer_now, thinking, observing, acting, react_fallback) and a shared temperature, deep-merged over baked-in defaults via services/config/loader.py::get_chat_params(). The responding and answer_now budgets jump from the previous hard-coded 1800 to 8000, eliminating the mid-sentence truncation that was clipping long answers. Internally, _ChatLimits.from_config coerces every legacy shape (missing keys, scalar instead of dict, partial overrides) into a stable dataclass so existing installs keep working without touching their YAML. 10 new unit tests cover loader resolution, deep-merge precedence, and dataclass coercion.

Regenerate Last Response — CLI, WebSocket, Web UI (#349)

Added a real regenerate flow that re-runs the previous user message in place, working uniformly across every entry point:

  • CLI/regenerate (alias /retry) inside the deeptutor chat REPL.
  • WebSockettype: "regenerate" message on /api/v1/ws, with optional overrides for capability, tools, knowledge_bases, language, config.
  • Web UI — a per-message Regenerate button (RefreshCcw icon) on the last assistant turn for chat-capability replies.

On the backend, TurnRuntimeManager.regenerate_last_turn rolls back the trailing assistant via the new SQLiteSessionStore.delete_message / get_last_message helpers, then reuses start_turn with _persist_user_message=False and _regenerate=True so the user row isn't duplicated and memory_service.refresh_from_turn isn't run a second time. Pre-flight checks raise non-fatal regenerate_busy (another turn is running) or nothing_to_regenerate (no prior user message) errors instead of silently failing. The _stage_responding LLM call also gained empty-response diagnostics that surface a structured warning when the model returns no content. 18 new tests cover all three reject paths, the delete-then-restart flow, the memory-refresh-skip contract, and the WebSocket round-trip.

UI Harmony Polish for Regenerate

Two follow-up tweaks so the new button matches the rest of the chat UI and behaves predictably under server rejection:

  • i18n — added Regenerate keys to web/locales/{en,zh}/app.json (Regenerate / 重新生成) and switched ChatMessages.tsx from a hardcoded "Regenerate" string to t("Regenerate"), matching the existing t("Copy") pattern in the same row.
  • Optimistic-pop rollback — when the server rejects a regenerate request pre-flight (regenerate_busy / nothing_to_regenerate), the optimistic POP_LAST_ASSISTANT + STREAM_START placeholder is now restored via a new RESTORE_ASSISTANT reducer action. The popped message is held in a per-key pendingRegenerateRef and cleared on done or any terminal result, so the transcript never silently loses the user's last reply.

Bug Fixes

  • Dark code blocks unreadable on light theme (#352) — the hard-coded #1f2937 / #292524 code-block background combined with .prose pre (forcing #D6D3D1) and .prose code:not(.md-code-block__code):not(.md-inline-code) (overriding to var(--foreground)) was producing near-black text on near-black backgrounds in light mode. Added a higher-specificity .md-renderer .md-code-block { ,pre,code } rule that pins #e5e7eb regardless of theme, and tagged the <code> elements in RichMarkdownRenderer and SimpleMarkdownRenderer fallbacks with the existing md-code-block__code class so the :not() guard kicks in. Thanks @DarkGenius.
  • None embeddings crashed LlamaIndex pipeline (#347, fixes #346) — when an embedding provider returns null for a chunk's vector, the None ended up in the vector index and blew up np.dot(NoneType) during similarity computation. Two-layer fix: _extract_embeddings_from_response now uses or [] instead of get(key, default) so explicit None values are caught, and CustomEmbedding._get_text_embeddings validates the batch result and substitutes a zero vector for any None slot. Thanks @kagura-agent.
  • Gemma models rejected json_object response_format (#345, fixes #344) — Gemma served through LM Studio (and similar local OpenAI-compatible servers) responds 400 "'response_format.type' must be 'json_schema' or 'text'" when handed response_format={"type": "json_object"}. Added supports_response_format: False to the existing gemma MODEL_OVERRIDES entry so the json_object path is skipped; the existing extract_json_object utilities in the visualize and math-animator agents already parse JSON from plain text, so all callers continue to work without further changes. Thanks @octo-patch.

Test Suite Expansion

Net +575 test lines: 10 cases for the chat-params loader / _ChatLimits coercion (tests/services/config/test_chat_params_config.py), 18 cases for the regenerate flow including all three reject paths, the in-place delete + restart, the memory-refresh skip, and the end-to-end no-duplicate-user contract (tests/services/session/test_regenerate.py), 14 cases for supports_response_format model overrides (tests/services/llm/test_capabilities.py), and a regression test for the None-embedding extraction path (tests/services/embedding/test_extract_embeddings.py).

What's Changed

  • fix(rag): guard against None embeddings in LlamaIndex pipeline by @kagura-agent in #347
  • fix: disable json_object response_format for gemma models by @octo-patch in #345
  • fix(web): ensure readable text in dark code blocks on light theme by @DarkGenius in #352
  • feat(chat): make per-stage token limits and temperature configurable via agents.yaml by @DarkGenius in #348
  • feat(chat): regenerate last response (CLI, WebSocket, Web UI) by @DarkGenius in #349

Community Contributions

  • @DarkGenius — Make per-stage chat token limits configurable via agents.yaml (#348)
  • @DarkGenius — Regenerate last response across CLI / WebSocket / Web UI (#349)
  • @DarkGenius — Ensure readable text in dark code blocks on light theme (#352)
  • @kagura-agent — Guard against None embeddings in the LlamaIndex pipeline (#347)
  • @octo-patch — Disable json_object response_format for Gemma models (#345)

Full Changelog: v1.2.0...v1.2.1

Don't miss a new DeepTutor release

NewReleases is sending notifications on new releases.