murtaza-nasir/speakr v0.8.19-alpha on GitHub

v0.8.19-alpha — Inquire-mode performance and re-embed reliability

Patch release on top of v0.8.18-alpha. No new features, no breaking changes.

Performance

Vectorised chunk similarity search. The chunk search loop in Inquire mode previously called sklearn's cosine_similarity once per chunk, costing 13-20 seconds per enriched query on a ~17k-chunk library because each call had Python and sklearn boilerplate around what is ultimately one dot product. Replaced with a single batched np.vstack plus one cosine_similarity(query, matrix) call, with np.argpartition for top-k selection. Per-query search now completes in under one second; a 4-enriched-query inquire turn drops from roughly 60 seconds to 2-3 seconds end-to-end. Eliminates the chat UI timeout symptom that users with large libraries or slower embedding endpoints were hitting.
Dimension-mismatch warnings are folded into a single summary log line per search, instead of one warning per stale chunk, so a partially-migrated embedding configuration does not flood the log on every query.

Reliability

Embedding API retries. _api_embed now retries transient errors (rate limits, timeouts, 5xx, connection blips) with exponential backoff and jitter. Defaults to 3 attempts, tunable via EMBEDDING_API_MAX_RETRIES and EMBEDDING_API_BACKOFF_SECONDS. Auth and model-not-found errors fail fast since retrying will not help.
process_recording_chunks no longer silently loses chunks on partial API failure. Previously the function deleted the recording's existing chunks before calling generate_embeddings, then iterated zip(chunks, embeddings). If the embedding call returned fewer vectors than there were chunks (transient provider failure, exhausted retries), the zip yielded nothing and the function returned True with the deletion already committed. The recording was left with zero chunks. The function now verifies vector count matches chunk count and rolls back the transaction on mismatch, preserving the recording's existing chunks for a later retry.
Re-embed all retry passes. The admin Re-embed all loop now does up to two retry passes over any recording that failed in the first pass, with backoff between passes. Tunable via the retry_passes field in the request body. Combined with _api_embed's internal retries, a single click can survive several layers of transient provider failure.
Re-embed all picks up stale-chunk recordings regardless of status. The original query filtered strictly on status == 'COMPLETED'. Recordings that had stale chunks but were temporarily in another state at click time were silently skipped, leaving old vectors behind. The query now also matches any recording whose id appears in the transcript_chunk table, so existing stale vectors get refreshed even on recordings that are mid-reprocess.

No new features

This release is purely fixes and a performance improvement. Existing functionality is unchanged for users who were not affected by the issues above. Users hitting Inquire timeouts or stuck stale-chunk warnings should see both resolved after upgrading.

murtaza-nasir/speakr v0.8.19-alpha v0.8.19-alpha — Inquire-mode performance and re-embed reliability on GitHub

v0.8.19-alpha — Inquire-mode performance and re-embed reliability

Performance

Reliability

No new features

murtaza-nasir/speakr v0.8.19-alpha
v0.8.19-alpha — Inquire-mode performance and re-embed reliability

on GitHub