v0.8.19-alpha — Inquire-mode performance and re-embed reliability
Patch release on top of v0.8.18-alpha. No new features, no breaking changes.
Performance
- Vectorised chunk similarity search. The chunk search loop in Inquire mode previously called sklearn's
cosine_similarityonce per chunk, costing 13-20 seconds per enriched query on a ~17k-chunk library because each call had Python and sklearn boilerplate around what is ultimately one dot product. Replaced with a single batchednp.vstackplus onecosine_similarity(query, matrix)call, withnp.argpartitionfor top-k selection. Per-query search now completes in under one second; a 4-enriched-query inquire turn drops from roughly 60 seconds to 2-3 seconds end-to-end. Eliminates the chat UI timeout symptom that users with large libraries or slower embedding endpoints were hitting. - Dimension-mismatch warnings are folded into a single summary log line per search, instead of one warning per stale chunk, so a partially-migrated embedding configuration does not flood the log on every query.
Reliability
- Embedding API retries.
_api_embednow retries transient errors (rate limits, timeouts, 5xx, connection blips) with exponential backoff and jitter. Defaults to 3 attempts, tunable viaEMBEDDING_API_MAX_RETRIESandEMBEDDING_API_BACKOFF_SECONDS. Auth and model-not-found errors fail fast since retrying will not help. process_recording_chunksno longer silently loses chunks on partial API failure. Previously the function deleted the recording's existing chunks before callinggenerate_embeddings, then iteratedzip(chunks, embeddings). If the embedding call returned fewer vectors than there were chunks (transient provider failure, exhausted retries), the zip yielded nothing and the function returned True with the deletion already committed. The recording was left with zero chunks. The function now verifies vector count matches chunk count and rolls back the transaction on mismatch, preserving the recording's existing chunks for a later retry.- Re-embed all retry passes. The admin Re-embed all loop now does up to two retry passes over any recording that failed in the first pass, with backoff between passes. Tunable via the
retry_passesfield in the request body. Combined with_api_embed's internal retries, a single click can survive several layers of transient provider failure. - Re-embed all picks up stale-chunk recordings regardless of status. The original query filtered strictly on
status == 'COMPLETED'. Recordings that had stale chunks but were temporarily in another state at click time were silently skipped, leaving old vectors behind. The query now also matches any recording whose id appears in thetranscript_chunktable, so existing stale vectors get refreshed even on recordings that are mid-reprocess.
No new features
This release is purely fixes and a performance improvement. Existing functionality is unchanged for users who were not affected by the issues above. Users hitting Inquire timeouts or stuck stale-chunk warnings should see both resolved after upgrading.