github murtaza-nasir/speakr v0.8.16-alpha
v0.8.16-alpha — Prompt Templating, Transcription UX, and Observability

latest releases: v0.8.18-alpha, v0.8.17-alpha
5 hours ago

v0.8.16-alpha — Prompt Templating, Transcription UX, and Observability

New

Prompt templating and summary control

  • Prompt template variables — tag, folder, user-default, and admin-default summary prompts can contain {{name}} placeholders. Selecting a tag with {{agenda}} exposes an agenda input on the upload form; the value is stored on the recording, substituted at summarisation time, and remains editable from the reprocess summary modal. Caps: 8,000 chars per value, 32,000 total. Single-pass re.sub substitution so values cannot introduce new placeholders or reach Python attributes.
  • Append vs Replace mode — the reprocess summary modal and the new Customise summary prompt modal each let you Append text to the resolved prompt or Replace it entirely. Append mode runs variable substitution after the append step so appended text can use the same {{var}} placeholders.
  • Customise summary prompt split-button (discussion #253) — a control next to Generate Summary opens the Append/Replace modal for recordings that don't have a summary yet, so one-off context (an agenda, custom focus instructions) can be passed in without rewriting your saved prompt.
  • Full LLM prompt structure preview — both the admin Default Prompts page and the user Customise-prompts tab now show the complete two-message payload (system prompt with context block, user message with transcription wrapper and language directive). Placeholder chips colour-code system tokens (blue, replaced by the framework) versus user-supplied variables (amber). The user-side preview re-renders live as you type into your custom prompt.

Per-recording transcription control

  • Per-upload / per-tag / per-folder transcription model selection (#266) — set TRANSCRIPTION_MODELS_AVAILABLE and the upload form, reprocess modal, and tag/folder edit forms gain a model dropdown. Optional TRANSCRIPTION_MODEL_LABELS for human-friendly names. Tag and folder edit forms warn if a previously-selected default is no longer in the configured list. The dropdown is hidden when only one option would be visible.
  • Admin-managed transcription model list — when the connector exposes /v1/models discovery, admins can curate the list from the dashboard rather than via env var. Stored in the database; overrides TRANSCRIPTION_MODELS_AVAILABLE when set.
  • WhisperX runtime model switching — the asr_endpoint connector forwards request.model as ?model=... on the WhisperX /asr call, so per-upload selection actually changes which model transcribes each file.
  • Per-connector capability gating — added HOTWORDS and INITIAL_PROMPT capabilities. Hotwords, initial-prompt, and speaker-count UI elements are hidden for connectors that don't support them, instead of accepting input that is silently ignored. Hotwords now show for OpenAI / Whisper / Azure / Mistral / VibeVoice with each connector mapping the field to its own underlying API.
  • Mistral Voxtral chunking (#267) — MISTRAL_ENABLE_CHUNKING=true plus MISTRAL_MAX_DURATION_SECONDS opts the Mistral connector into app-side chunked transcription for recordings approaching Voxtral's 3-hour timeout. Mistral does not return voice embeddings, so speakers are remapped per chunk.

ASR transcript editor

  • Autosave — saves edits 2 seconds after the last keystroke when the user opts in (Account → Preferences → Autosave editor).
  • Save without closing + Ctrl+S — new button keeps the editor open after saving; Ctrl+S triggers a save from anywhere in the editor.
  • Scroll memory — reopening the editor restores the previous scroll position instead of jumping to the top.
  • Double-click to edit — double-clicking a transcript row in the simple view jumps into the editor with that segment highlighted. The target row is briefly highlighted so it stands out.

Account preferences

  • Preferences tab — account settings has a new Preferences tab (split from the Account Information tab) using a two-column layout for transcript display, editor behaviour, and language preferences.
  • Compact timestamps in simple view — optional mm:ss (or h:mm:ss) timestamps in the simple transcript view, rendered as a two-part pill alongside the speaker label. The leading segment shows "Start" instead of 00:00.
  • Persist recording-list sort choice (discussion #263) — the Created date / Meeting date toggle now sticks across reloads and sessions on the same browser.

Embeddings and inquire mode

  • Configurable embedding model (#262) — EMBEDDING_MODEL swaps all-MiniLM-L6-v2 for any sentence-transformers model. Speakr records the model name on first startup and warns if it changes later.
  • OpenAI-compatible API mode for embeddingsEMBEDDING_BASE_URL, EMBEDDING_API_KEY, and EMBEDDING_DIMENSIONS route embeddings through any OpenAI-compatible provider (vLLM, OpenRouter, OpenAI, Together, etc.). Useful for the lite Docker image, low-RAM hosts, or consolidating providers. The Inquire startup banner reflects the active provider.
  • Re-embed all — admin Vector Store tab gained a Re-embed all action so you can rebuild the index after switching EMBEDDING_MODEL or EMBEDDING_BASE_URL.

Observability and admin

  • Per-operation token stats — admin Token Usage card splits into LLM and embedding panels with their own totals, charts, and per-operation breakdown (title, summary, chat, event extraction, embeddings).
  • Granular token budgetsTITLE_MAX_TOKENS and EVENT_MAX_TOKENS join the existing SUMMARY_MAX_TOKENS / CHAT_MAX_TOKENS so reasoning models that consume budget on hidden thinking tokens can be tuned per operation. The resolved max_tokens is logged with each LLM call.
  • LLM timeout diagnostics — configured LLM_REQUEST_TIMEOUT is logged at startup, and APITimeoutError log entries include elapsed time so it is clear whether the timeout was the actual bound that fired.

API v1

  • Folder CRUD endpoints — new /api/v1/folders for list, create, update, delete.
  • Connector discovery endpoint — exposes the active transcription connector and its capabilities for companion-app integrations.
  • Recording field parity (#274) — /api/v1/recordings and /api/v1/recordings/{id} now include audio_duration, transcription_duration_seconds, summarization_duration_seconds, folder_id, folder, events (detail only), deletion_exempt, prompt_variables, and the per-recording transcription model.
  • Forwarded per-request overrides/api/v1/recordings/{id}/transcribe accepts transcription_model, hotwords, and initial_prompt.

Localisation

  • Portuguese Brazilian translation (PR #271, lhpereira) — full pt-BR locale added, with backfill of all v0.8.16-alpha keys integrated during merge. All seven locales (en, fr, de, es, ru, zh, pt-BR) now sit at parity with zero missing and zero orphaned keys.
  • Locale parity cleanup — removed 149 stale keys from zh.json that no longer reference any code path, backfilled 10 keys missing from non-English locales, and added seven additional language codes (pl, uk, vi, th, tr, id, sv) to the transcription dropdown.

Fixed

  • Reprocessing applies tag/folder/user default hotwords + initial_prompt (#265) — previously these only flowed through at upload time. Reprocess now walks the same precedence chain, and the reprocess modal gained the two text fields (gated on the active connector's capabilities).
  • Language code normalization (#256) — old user records with transcription_language="français" were crashing WhisperX with HTTP 500. Added a normalize-on-save helper plus a one-shot migration that maps display names and locale codes to ISO 639-1 on upgrade.
  • Title generation Unicode escapes (#260) — for non-ASCII transcripts (Cyrillic, Chinese, etc.) titles were occasionally generated with literal \uXXXX escape sequences. Root cause was slicing the raw transcription JSON before parsing; the slice could land mid-Unicode-escape, the JSON parse failed, and the raw escapes leaked through. Fixed by formatting first, then truncating.
  • Reprocess modal hid hotwords / initial prompt / model dropdown for non-WhisperX connectors — the gating accidentally required connectorSupportsSpeakerCount for the entire block. Fixed via the new capability split.
  • Technical details panel always populated on transcription failures — when the ASR endpoint returns an HTTP error, Speakr now captures the upstream response body before raising, so the recording's "Technical details" section shows the real failure message (for example faster-whisper's "Invalid model size") instead of a bare status code.
  • Vector Store "recordings to process" message — Vue's custom ${...} delimiter was tripping over the nested braces in the i18n call; rewritten to use the t(key, params) parameter form.
  • CSRF token on the Preferences form — was missing, causing submissions to be rejected.
  • Test isolation — synthetic users and recordings created during the test suite are now cleaned up at module teardown so the dev DB stays free of leaked admin flags between runs.

Docs

  • New nginx reverse-proxy guidance: proxy_request_buffering off and client_max_body_size in the recommended config (resolves the 500-error class from #273)
  • Google Gemini OpenAI-compatible setup example for TEXT_MODEL_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/ (#254)
  • Prompt template variables guide in user-guide/settings.md
  • Per-upload / per-tag / per-folder model selection documentation in admin-guide/model-configuration.md
  • EMBEDDING_BASE_URL API mode documentation across inquire-mode, vector-store, and troubleshooting
  • ASR editor enhancements (autosave, Ctrl+S, scroll memory, double-click) and Append/Replace summary mode in user-guide/transcripts.md
  • Re-embed all action and embedding token tracking in admin-guide/vector-store.md
  • Per-operation token stats in admin-guide/statistics.md

Infrastructure

  • Vitest frontend tests — pure-helper modules in static/js/modules/utils/ are now covered by Vitest. Run npm test. Currently exercises the prompt-variable extraction and priority-chain logic.

Tests

276 backend tests passing plus 32 frontend tests, including new regression suites for the title truncation bug (#260), reprocess hotwords precedence (#265), language normalization (#256), API v1 parity (#274), the per-upload/tag/folder model override chain (#266), prompt-variable substitution, and the priority-chain helpers.

Don't miss a new speakr release

NewReleases is sending notifications on new releases.