github HKUDS/DeepTutor v1.4.2

5 hours ago

DeepTutor v1.4.2 Release Notes

Release Date: 2026.05.28

v1.4.2 is a stability and polish release on top of v1.4.1.
It unblocks Gemini 2.5+ across Visualize and the chat agent, fixes a
ContextVar regression that silently routed authenticated requests to the
admin workspace, hardens the chat protocol for reasoning models with
native tool calling, ships smooth-streaming UX across every chat
surface, and adds support for the Lemonade local provider.

Gemini 2.5+ Reasoning Default-Off

Gemini 2.5 / 3 ship with thinking enabled by default and burn the entire
max_tokens budget on reasoning unless reasoning_effort: "none" is
sent on the request. v1.4.2 centralizes that logic in
reasoning_params.default_reasoning_effort_for, the single source of
truth used by all three execution paths (the OpenAI SDK, the aiohttp
fallback, and the reasoning-kwargs builder). Visualize, Chat, Solve,
and the agentic loop all stop returning empty bodies when configured
against gemini-2.5-pro / gemini-2.5-flash / gemini-3-*.

Visualize Pipeline Hardening

Three independent failure modes are fixed:

  • Per-capability max_tokens defaults — Visualize now has its own
    entry in agents.yaml (16k tokens) seeded from
    DEFAULT_AGENTS_SETTINGS, so existing users with a stale
    data/user/settings/agents.yaml pick up the higher cap automatically
    without hand-editing.
  • SVG / HTML root trim — when a model wraps its output with prose
    ("Here you go: <svg>…") or emits a closing fence on the same line
    as the closing tag, the generator agent now trims to the outermost
    <svg>…</svg> / <!doctype>…</html> so the renderer always receives
    a clean root.
  • Review-step JSON-mode crash → graceful fallback — large or
    complex SVGs occasionally trip JSON-mode escaping inside the review
    step. Instead of crashing the turn, Visualize now logs the failure
    and ships the unreviewed draft so the user still sees a rendered
    result.

Authenticated Requests Land In The Right Workspace (#485)

In v1.4.1, require_auth was a sync FastAPI dependency. FastAPI
dispatches sync dependencies via anyio.to_thread.run_sync, which
runs them in a worker thread under a copy of the request context —
so the set_current_user(...) call inside the dependency installed
the user on the thread's context, which was discarded when the
thread returned. The endpoint then read the unset default and fell
back to the admin workspace, silently routing every authenticated
user's reads/writes through the local admin's data.

require_auth and require_admin are now async def, so they
execute in the same asyncio task as the endpoint and the
ContextVar is visible everywhere downstream. HTTP and WebSocket
entry points now share a single _install_current_user helper so
the user object resolved from a token payload is identical across
transports.

Reasoning Models + Native Tool Calling: Label Protocol Fixed

v1.4.1 tried to be clever with reasoning models that have native
tool-calling support — it told them to ignore the TOOL/THINK/
FINISH/PAUSE labels and rely on reasoning_content plus
tool_calls alone, and inside run_labeled_step it treated
<think> preludes and any incoming tool-call delta as implicit
label resolutions. In practice both shortcuts hurt: when a tool
call leaked into the content stream as JSON instead of a real
tool_calls delta, there was no label to repair against, and the
loop happily treated the JSON-as-answer as a FINISH. Multi-turn
reasoning + tool workflows would either burn iterations on repair
retries or silently terminate early.

In v1.4.2:

  • Reasoning + native-tools system prompt tells the model that
    reasoning is displayed in a separate trace area, but the formal
    content stream must still start with exactly one of
    FINISH/TOOL/THINK/PAUSE.
  • run_labeled_step no longer treats tool-call deltas as
    authoritative for label resolution, and implicit_think_label is
    ignored (kept for API compatibility). A missing label always falls
    to LABEL_UNKNOWN, so the chat pipeline's protocol-repair path
    catches it instead of silently mis-routing the turn.
  • Inline <think>...</think> preludes are streamed live into the
    reasoning sub-trace and stripped from the formal text returned
    to the loop — so the answer area no longer leaks raw provider
    markers.

Smooth Streaming Across Every Chat Surface

The rAF typewriter (useSmoothStreamText) introduced last week for
the main chat is now wired through AssistantResponse, so the
book chat panel, quiz follow-up tab, and any other surface that
renders an assistant message all get the same frame-aligned cadence
during streaming and a no-op pass-through for completed messages.

Companion fixes:

  • Book chat panel and quiz follow-up tab moved their autoscroll to
    useLayoutEffect and stopped using scrollIntoView({behavior: "smooth"}) — the smooth animation races against the next-frame
    layout update during fast streams and produces visible jitter. They
    now do a single scrollTop = scrollHeight pin in layout phase,
    matching what useChatAutoScroll does on the main chat.
  • Book chat panel marks its scroller with data-chat-scroll-root so
    the global overflow-anchor: none rule applies (the browser's
    built-in scroll anchoring fights manual pinning when code blocks
    reflow above the cursor).
  • AssistantResponse is now memoized — completed bubbles stop
    re-parsing markdown when an unrelated streaming sibling updates the
    parent.

Sidebar Redesign

The expanded sidebar's chat-session list moved into its own
collapsible Recents region with an independent scroll viewport, so
long histories no longer push secondary nav off-screen. The "New chat"
button is gone (clicking Chat in the nav already starts a new
session), and a Docs link to deeptutor.info
sits next to the GitHub link in the footer.

Each session now renders with a deterministic, friendly Lucide icon —
sparkles, leaf, feather, cloud, droplet, sun, moon, flame, star, etc.
— so the sidebar feels varied at a glance without shuffling on
re-render. Running sessions add a gentle wiggle animation; idle ones
stay still.

Lemonade Local Provider

New lemonade provider binding for the AMD Ryzen AI / NPU runtime
(default base URL http://localhost:13305/api/v1). Auto-detected by
port 13305, no API key required, listed in the README Docker host-
gateway section and in the provider configuration docs alongside
Ollama / LM Studio / llama.cpp / vLLM.

Models-Endpoint Probe Honors DISABLE_SSL_VERIFY

The context-window auto-detection now passes
aiohttp.TCPConnector(ssl=False) when DISABLE_SSL_VERIFY is set,
matching the behavior of the rest of the HTTP layer. Self-signed local
inference servers no longer fall back to the default context window
just because the probe couldn't verify their cert.

Tests

  • tests/api/test_auth_contextvar.py — pins the regression from #485:
    a sync require_auth would lose the ContextVar; the async version
    preserves it across the dependency boundary.
  • tests/services/llm/test_reasoning_params.py — covers the
    centralized default_reasoning_effort_for mapping.
  • tests/core/test_labeled_step_think_prelude.py — updated to reflect
    the new "labels are always required" semantics.
  • tests/agents/chat/test_agentic_parallel_tools.py — verifies the
    reasoning + native-tools path still resolves multi-tool turns.
  • tests/services/config/test_context_window_detection.py — the
    models-endpoint probe honors DISABLE_SSL_VERIFY and passes a
    TCPConnector(ssl=False) to the aiohttp session.

Upgrade Notes

  • Drop-in from v1.4.1: pip install -U deeptutor; Docker users pull
    ghcr.io/hkuds/deeptutor:latest.
  • If you previously hand-edited data/user/settings/agents.yaml to
    bump Visualize's max_tokens, that value still wins. The new 16k
    default only seeds users whose agents.yaml doesn't mention
    Visualize at all.
  • If you wired a Gemini 2.5+ model and saw empty or truncated outputs,
    no configuration change is needed — the default-off behavior now
    applies automatically.

What's Changed

  • fix(auth): make require_auth async so the user ContextVar reaches the endpoint by @truffle-dev in #485
  • fix(visualize): unblock Gemini 2.5+ and harden Visualize pipeline by @skinred78 in #490

New Contributors

Full Changelog: v1.4.1...v1.4.2

Don't miss a new DeepTutor release

NewReleases is sending notifications on new releases.