v0.8.16-alpha — Prompt Templating, Transcription UX, and Observability
New
Prompt templating and summary control
- Prompt template variables — tag, folder, user-default, and admin-default summary prompts can contain
{{name}}placeholders. Selecting a tag with{{agenda}}exposes an agenda input on the upload form; the value is stored on the recording, substituted at summarisation time, and remains editable from the reprocess summary modal. Caps: 8,000 chars per value, 32,000 total. Single-passre.subsubstitution so values cannot introduce new placeholders or reach Python attributes. - Append vs Replace mode — the reprocess summary modal and the new Customise summary prompt modal each let you Append text to the resolved prompt or Replace it entirely. Append mode runs variable substitution after the append step so appended text can use the same
{{var}}placeholders. - Customise summary prompt split-button (discussion #253) — a control next to Generate Summary opens the Append/Replace modal for recordings that don't have a summary yet, so one-off context (an agenda, custom focus instructions) can be passed in without rewriting your saved prompt.
- Full LLM prompt structure preview — both the admin Default Prompts page and the user Customise-prompts tab now show the complete two-message payload (system prompt with context block, user message with transcription wrapper and language directive). Placeholder chips colour-code system tokens (blue, replaced by the framework) versus user-supplied variables (amber). The user-side preview re-renders live as you type into your custom prompt.
Per-recording transcription control
- Per-upload / per-tag / per-folder transcription model selection (#266) — set
TRANSCRIPTION_MODELS_AVAILABLEand the upload form, reprocess modal, and tag/folder edit forms gain a model dropdown. OptionalTRANSCRIPTION_MODEL_LABELSfor human-friendly names. Tag and folder edit forms warn if a previously-selected default is no longer in the configured list. The dropdown is hidden when only one option would be visible. - Admin-managed transcription model list — when the connector exposes
/v1/modelsdiscovery, admins can curate the list from the dashboard rather than via env var. Stored in the database; overridesTRANSCRIPTION_MODELS_AVAILABLEwhen set. - WhisperX runtime model switching — the
asr_endpointconnector forwardsrequest.modelas?model=...on the WhisperX/asrcall, so per-upload selection actually changes which model transcribes each file. - Per-connector capability gating — added
HOTWORDSandINITIAL_PROMPTcapabilities. Hotwords, initial-prompt, and speaker-count UI elements are hidden for connectors that don't support them, instead of accepting input that is silently ignored. Hotwords now show for OpenAI / Whisper / Azure / Mistral / VibeVoice with each connector mapping the field to its own underlying API. - Mistral Voxtral chunking (#267) —
MISTRAL_ENABLE_CHUNKING=trueplusMISTRAL_MAX_DURATION_SECONDSopts the Mistral connector into app-side chunked transcription for recordings approaching Voxtral's 3-hour timeout. Mistral does not return voice embeddings, so speakers are remapped per chunk.
ASR transcript editor
- Autosave — saves edits 2 seconds after the last keystroke when the user opts in (
Account → Preferences → Autosave editor). - Save without closing + Ctrl+S — new button keeps the editor open after saving; Ctrl+S triggers a save from anywhere in the editor.
- Scroll memory — reopening the editor restores the previous scroll position instead of jumping to the top.
- Double-click to edit — double-clicking a transcript row in the simple view jumps into the editor with that segment highlighted. The target row is briefly highlighted so it stands out.
Account preferences
- Preferences tab — account settings has a new Preferences tab (split from the Account Information tab) using a two-column layout for transcript display, editor behaviour, and language preferences.
- Compact timestamps in simple view — optional
mm:ss(orh:mm:ss) timestamps in the simple transcript view, rendered as a two-part pill alongside the speaker label. The leading segment shows "Start" instead of00:00. - Persist recording-list sort choice (discussion #263) — the Created date / Meeting date toggle now sticks across reloads and sessions on the same browser.
Embeddings and inquire mode
- Configurable embedding model (#262) —
EMBEDDING_MODELswapsall-MiniLM-L6-v2for any sentence-transformers model. Speakr records the model name on first startup and warns if it changes later. - OpenAI-compatible API mode for embeddings —
EMBEDDING_BASE_URL,EMBEDDING_API_KEY, andEMBEDDING_DIMENSIONSroute embeddings through any OpenAI-compatible provider (vLLM, OpenRouter, OpenAI, Together, etc.). Useful for the lite Docker image, low-RAM hosts, or consolidating providers. The Inquire startup banner reflects the active provider. - Re-embed all — admin Vector Store tab gained a Re-embed all action so you can rebuild the index after switching
EMBEDDING_MODELorEMBEDDING_BASE_URL.
Observability and admin
- Per-operation token stats — admin Token Usage card splits into LLM and embedding panels with their own totals, charts, and per-operation breakdown (title, summary, chat, event extraction, embeddings).
- Granular token budgets —
TITLE_MAX_TOKENSandEVENT_MAX_TOKENSjoin the existingSUMMARY_MAX_TOKENS/CHAT_MAX_TOKENSso reasoning models that consume budget on hidden thinking tokens can be tuned per operation. The resolvedmax_tokensis logged with each LLM call. - LLM timeout diagnostics — configured
LLM_REQUEST_TIMEOUTis logged at startup, andAPITimeoutErrorlog entries include elapsed time so it is clear whether the timeout was the actual bound that fired.
API v1
- Folder CRUD endpoints — new
/api/v1/foldersfor list, create, update, delete. - Connector discovery endpoint — exposes the active transcription connector and its capabilities for companion-app integrations.
- Recording field parity (#274) —
/api/v1/recordingsand/api/v1/recordings/{id}now includeaudio_duration,transcription_duration_seconds,summarization_duration_seconds,folder_id,folder,events(detail only),deletion_exempt,prompt_variables, and the per-recording transcription model. - Forwarded per-request overrides —
/api/v1/recordings/{id}/transcribeacceptstranscription_model,hotwords, andinitial_prompt.
Localisation
- Portuguese Brazilian translation (PR #271, lhpereira) — full pt-BR locale added, with backfill of all v0.8.16-alpha keys integrated during merge. All seven locales (en, fr, de, es, ru, zh, pt-BR) now sit at parity with zero missing and zero orphaned keys.
- Locale parity cleanup — removed 149 stale keys from zh.json that no longer reference any code path, backfilled 10 keys missing from non-English locales, and added seven additional language codes (pl, uk, vi, th, tr, id, sv) to the transcription dropdown.
Fixed
- Reprocessing applies tag/folder/user default hotwords + initial_prompt (#265) — previously these only flowed through at upload time. Reprocess now walks the same precedence chain, and the reprocess modal gained the two text fields (gated on the active connector's capabilities).
- Language code normalization (#256) — old user records with
transcription_language="français"were crashing WhisperX with HTTP 500. Added a normalize-on-save helper plus a one-shot migration that maps display names and locale codes to ISO 639-1 on upgrade. - Title generation Unicode escapes (#260) — for non-ASCII transcripts (Cyrillic, Chinese, etc.) titles were occasionally generated with literal
\uXXXXescape sequences. Root cause was slicing the raw transcription JSON before parsing; the slice could land mid-Unicode-escape, the JSON parse failed, and the raw escapes leaked through. Fixed by formatting first, then truncating. - Reprocess modal hid hotwords / initial prompt / model dropdown for non-WhisperX connectors — the gating accidentally required
connectorSupportsSpeakerCountfor the entire block. Fixed via the new capability split. - Technical details panel always populated on transcription failures — when the ASR endpoint returns an HTTP error, Speakr now captures the upstream response body before raising, so the recording's "Technical details" section shows the real failure message (for example faster-whisper's "Invalid model size") instead of a bare status code.
- Vector Store "recordings to process" message — Vue's custom
${...}delimiter was tripping over the nested braces in the i18n call; rewritten to use thet(key, params)parameter form. - CSRF token on the Preferences form — was missing, causing submissions to be rejected.
- Test isolation — synthetic users and recordings created during the test suite are now cleaned up at module teardown so the dev DB stays free of leaked admin flags between runs.
Docs
- New nginx reverse-proxy guidance:
proxy_request_buffering offandclient_max_body_sizein the recommended config (resolves the 500-error class from #273) - Google Gemini OpenAI-compatible setup example for
TEXT_MODEL_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/(#254) - Prompt template variables guide in user-guide/settings.md
- Per-upload / per-tag / per-folder model selection documentation in admin-guide/model-configuration.md
EMBEDDING_BASE_URLAPI mode documentation across inquire-mode, vector-store, and troubleshooting- ASR editor enhancements (autosave, Ctrl+S, scroll memory, double-click) and Append/Replace summary mode in user-guide/transcripts.md
- Re-embed all action and embedding token tracking in admin-guide/vector-store.md
- Per-operation token stats in admin-guide/statistics.md
Infrastructure
- Vitest frontend tests — pure-helper modules in
static/js/modules/utils/are now covered by Vitest. Runnpm test. Currently exercises the prompt-variable extraction and priority-chain logic.
Tests
276 backend tests passing plus 32 frontend tests, including new regression suites for the title truncation bug (#260), reprocess hotwords precedence (#265), language normalization (#256), API v1 parity (#274), the per-upload/tag/folder model override chain (#266), prompt-variable substitution, and the priority-chain helpers.