🙏 Special Thanks
This release is entirely the work of @chriscoey, who contributed 5 meticulously researched and well-tested PRs in a single day. Each one identified real bugs through careful code reading — not just surface-level fixes but root-cause analysis with regression tests proving the fix. Outstanding community contribution.
What's Changed
This release consolidates 5 high-quality PRs from @chriscoey that fix critical bugs in the SQLite-vec storage backend, improve security, and add an embedding migration utility.
🆕 Added
- Embedding model migration script (
scripts/maintenance/migrate_embeddings.py): Migrate embeddings between any models, including across different dimensions (e.g., 384-dim → 768-dim). Works with any OpenAI-compatible API (Ollama, vLLM, OpenAI, TEI). Features:--dry-run, auto-detect dimension, timestamped backup, service detection, cross-platform, batched with progress, post-migration integrity verification. Closes #552.
🐛 Fixed
Soft-delete leaks (data correctness):
recall()— both semantic and time-based paths returned deleted memoriesget_memories_by_time_range()— returned deleted memoriesget_largest_memories()— returned deleted memoriesget_memory_timestamps()— counted deleted memoriesget_memory_connections()— tag group counts included deleted memoriesget_access_patterns()— returned content hashes of deleted memoriesupdate_memory_metadata()— could modify soft-deleted memoriesupdate_memories_batch()— same issue for batch update pathdelete()error handler — added explicit rollback to prevent dangling embedding DELETEs
Score formula:
recall()used1.0 - distancebut cosine distance ∈ [0, 2], producing negative scores. Fixed tomax(0.0, 1.0 - distance/2.0)→ correctly maps to [0, 1].
Tag handling:
get_largest_memories()usedjson.loads()to parse tags, but tags are stored as comma-separated stringsget_all_memories(),count_all_memories(),retrieve(),delete_by_timeframe(),delete_before_date()usedLIKE '%tag%'(substring match) instead of GLOB exact-match. A tag query for"test"incorrectly matched"testing"and"my-test-tag".- Added
_escape_glob()helper to prevent GLOB wildcard injection (*,?,[) from user-supplied tag values. search_by_tag_chronological()LIMIT/OFFSET is now parameterized instead of f-string interpolated.
Consolidation system:
_sample_memory_pairs()materialized allcombinations(memories, 2)(~50M pairs for 10k memories) just to sample 100. Now uses random index pair generation — O(max_pairs)._get_existing_associations()filtered bymemory_type=="association"but associations are stored withmemory_type="observation"and tag"association". The filter never matched, so duplicate associations were never prevented. Now usessearch_by_tag(["association"]).
⚡ Performance
- Batch access metadata:
retrieve()now persists access metadata in oneexecutemanycall per query instead of N individualUPDATE+COMMITround-trips. - Hybrid search O(n+m) dedup:
retrieve_hybrid()replaced O(n×m) nested-loop deduplication with O(n+m) dict-based merging. BM25-only memories are now batch-fetched in a single SQL query (capped at 999 to respectSQLITE_MAX_VARIABLE_NUMBER) instead of N+1 individualget_by_hash()calls.
🧪 Tests
- 23 new regression tests covering all fixed methods
- Total: 1,420 tests