DeepTutor v1.4.2 Release Notes

Release Date: 2026.05.28

v1.4.2 is a stability and polish release on top of v1.4.1.
It unblocks Gemini 2.5+ across Visualize and the chat agent, fixes a
ContextVar regression that silently routed authenticated requests to the
admin workspace, hardens the chat protocol for reasoning models with
native tool calling, ships smooth-streaming UX across every chat
surface, and adds support for the Lemonade local provider.

Gemini 2.5+ Reasoning Default-Off

Gemini 2.5 / 3 ship with thinking enabled by default and burn the entire
max_tokens budget on reasoning unless reasoning_effort: "none" is
sent on the request. v1.4.2 centralizes that logic in
reasoning_params.default_reasoning_effort_for, the single source of
truth used by all three execution paths (the OpenAI SDK, the aiohttp
fallback, and the reasoning-kwargs builder). Visualize, Chat, Solve,
and the agentic loop all stop returning empty bodies when configured
against gemini-2.5-pro / gemini-2.5-flash / gemini-3-*.

Visualize Pipeline Hardening

Three independent failure modes are fixed:

Per-capability max_tokens defaults — Visualize now has its own
entry in agents.yaml (16k tokens) seeded from
DEFAULT_AGENTS_SETTINGS, so existing users with a stale
data/user/settings/agents.yaml pick up the higher cap automatically
without hand-editing.
SVG / HTML root trim — when a model wraps its output with prose
("Here you go: <svg>…") or emits a closing fence on the same line
as the closing tag, the generator agent now trims to the outermost
<svg>…</svg> / <!doctype>…</html> so the renderer always receives
a clean root.
Review-step JSON-mode crash → graceful fallback — large or
complex SVGs occasionally trip JSON-mode escaping inside the review
step. Instead of crashing the turn, Visualize now logs the failure
and ships the unreviewed draft so the user still sees a rendered
result.

Authenticated Requests Land In The Right Workspace (#485)

In v1.4.1, require_auth was a sync FastAPI dependency. FastAPI
dispatches sync dependencies via anyio.to_thread.run_sync, which
runs them in a worker thread under a copy of the request context —
so the set_current_user(...) call inside the dependency installed
the user on the thread's context, which was discarded when the
thread returned. The endpoint then read the unset default and fell
back to the admin workspace, silently routing every authenticated
user's reads/writes through the local admin's data.

require_auth and require_admin are now async def, so they
execute in the same asyncio task as the endpoint and the
ContextVar is visible everywhere downstream. HTTP and WebSocket
entry points now share a single _install_current_user helper so
the user object resolved from a token payload is identical across
transports.

Reasoning Models + Native Tool Calling: Label Protocol Fixed

v1.4.1 tried to be clever with reasoning models that have native
tool-calling support — it told them to ignore the TOOL/THINK/
FINISH/PAUSE labels and rely on reasoning_content plus
tool_calls alone, and inside run_labeled_step it treated
<think> preludes and any incoming tool-call delta as implicit
label resolutions. In practice both shortcuts hurt: when a tool
call leaked into the content stream as JSON instead of a real
tool_calls delta, there was no label to repair against, and the
loop happily treated the JSON-as-answer as a FINISH. Multi-turn
reasoning + tool workflows would either burn iterations on repair
retries or silently terminate early.

In v1.4.2:

Reasoning + native-tools system prompt tells the model that
reasoning is displayed in a separate trace area, but the formal
content stream must still start with exactly one of
FINISH/TOOL/THINK/PAUSE.
run_labeled_step no longer treats tool-call deltas as
authoritative for label resolution, and implicit_think_label is
ignored (kept for API compatibility). A missing label always falls
to LABEL_UNKNOWN, so the chat pipeline's protocol-repair path
catches it instead of silently mis-routing the turn.
Inline <think>...</think> preludes are streamed live into the
reasoning sub-trace and stripped from the formal text returned
to the loop — so the answer area no longer leaks raw provider
markers.

Smooth Streaming Across Every Chat Surface

The rAF typewriter (useSmoothStreamText) introduced last week for
the main chat is now wired through AssistantResponse, so the
book chat panel, quiz follow-up tab, and any other surface that
renders an assistant message all get the same frame-aligned cadence
during streaming and a no-op pass-through for completed messages.

Companion fixes:

Book chat panel and quiz follow-up tab moved their autoscroll to
useLayoutEffect and stopped using scrollIntoView({behavior: "smooth"}) — the smooth animation races against the next-frame
layout update during fast streams and produces visible jitter. They
now do a single scrollTop = scrollHeight pin in layout phase,
matching what useChatAutoScroll does on the main chat.
Book chat panel marks its scroller with data-chat-scroll-root so
the global overflow-anchor: none rule applies (the browser's
built-in scroll anchoring fights manual pinning when code blocks
reflow above the cursor).
AssistantResponse is now memoized — completed bubbles stop
re-parsing markdown when an unrelated streaming sibling updates the
parent.

Sidebar Redesign

The expanded sidebar's chat-session list moved into its own
collapsible Recents region with an independent scroll viewport, so
long histories no longer push secondary nav off-screen. The "New chat"
button is gone (clicking Chat in the nav already starts a new
session), and a Docs link to deeptutor.info
sits next to the GitHub link in the footer.

Each session now renders with a deterministic, friendly Lucide icon —
sparkles, leaf, feather, cloud, droplet, sun, moon, flame, star, etc.
— so the sidebar feels varied at a glance without shuffling on
re-render. Running sessions add a gentle wiggle animation; idle ones
stay still.

Lemonade Local Provider

New lemonade provider binding for the AMD Ryzen AI / NPU runtime
(default base URL http://localhost:13305/api/v1). Auto-detected by
port 13305, no API key required, listed in the README Docker host-
gateway section and in the provider configuration docs alongside
Ollama / LM Studio / llama.cpp / vLLM.

Models-Endpoint Probe Honors `DISABLE_SSL_VERIFY`

The context-window auto-detection now passes
aiohttp.TCPConnector(ssl=False) when DISABLE_SSL_VERIFY is set,
matching the behavior of the rest of the HTTP layer. Self-signed local
inference servers no longer fall back to the default context window
just because the probe couldn't verify their cert.

Tests

tests/api/test_auth_contextvar.py — pins the regression from #485:
a sync require_auth would lose the ContextVar; the async version
preserves it across the dependency boundary.
tests/services/llm/test_reasoning_params.py — covers the
centralized default_reasoning_effort_for mapping.
tests/core/test_labeled_step_think_prelude.py — updated to reflect
the new "labels are always required" semantics.
tests/agents/chat/test_agentic_parallel_tools.py — verifies the
reasoning + native-tools path still resolves multi-tool turns.
tests/services/config/test_context_window_detection.py — the
models-endpoint probe honors DISABLE_SSL_VERIFY and passes a
TCPConnector(ssl=False) to the aiohttp session.

Upgrade Notes

Drop-in from v1.4.1: pip install -U deeptutor; Docker users pull
ghcr.io/hkuds/deeptutor:latest.
If you previously hand-edited data/user/settings/agents.yaml to
bump Visualize's max_tokens, that value still wins. The new 16k
default only seeds users whose agents.yaml doesn't mention
Visualize at all.
If you wired a Gemini 2.5+ model and saw empty or truncated outputs,
no configuration change is needed — the default-off behavior now
applies automatically.

What's Changed

fix(auth): make require_auth async so the user ContextVar reaches the endpoint by @truffle-dev in #485
fix(visualize): unblock Gemini 2.5+ and harden Visualize pipeline by @skinred78 in #490

New Contributors

@skinred78 made their first contribution in #490

Full Changelog: v1.4.1...v1.4.2

HKUDS/DeepTutor v1.4.2 on GitHub