github doobidoo/mcp-memory-service v8.7.0
v8.7.0: Cosine Similarity & Maintenance Tools

latest releases: v10.31.2, v10.31.1, v10.31.0...
5 months ago

v8.7.0: Cosine Similarity & Maintenance Tools

🔧 Fixed

Cosine Similarity Migration - 0% → 70-79% Search Accuracy

Problem: Search results consistently showed 0% similarity scores due to L2 distance metric limitations.

  • L2 distance formula max(0, 1 - distance) returned 0 for all distances > 1.0
  • Semantic search completely broken - no meaningful similarity ranking
  • Exact match accuracy only 61% despite having correct content

Solution: Complete migration to cosine distance metric with intelligent retry logic.

  1. Updated Vector Index - Added distance_metric=cosine to vec0 virtual table
  2. Fixed Score Calculation - Changed to 1.0 - (distance/2.0) for proper 0-2 cosine range
  3. Migration Strategy - Drop-and-recreate embeddings table (vec0 doesn't support ALTER)
  4. Retry Logic - Exponential backoff (1s → 2s → 4s) for database locking
  5. Regeneration Script - scripts/maintenance/regenerate_embeddings.py for manual recovery

Impact:

  • ✅ 2605 embeddings successfully regenerated
  • ✅ Search similarity scores now 70-79% (was 0%)
  • ✅ Exact match accuracy improved to 79.2% (was 61%)
  • ✅ Automatic migration on first startup - zero user intervention required

File: src/mcp_memory_service/storage/sqlite_vec.py:187


Dashboard Search Improvements

Fixed: Search threshold parameter was always sent even when not explicitly set by user.

Solution: Only include similarity_threshold in API payload when user explicitly configures it.

File: src/mcp_memory_service/web/static/app.js:283


🚀 Added

Maintenance Scripts Suite - 1800x Performance Boost

Complete suite of database maintenance tools with unprecedented performance improvements.

regenerate_embeddings.py

  • Purpose: Regenerate all embeddings after migrations or corruption
  • Performance: ~5 minutes for 2600 memories using all-MiniLM-L6-v2
  • Use Cases: Post-migration recovery, embedding corruption repair
  • Features: Progress tracking, configurable batch sizes, safe to rerun

fast_cleanup_duplicates.sh

  • Purpose: Bulk duplicate removal using direct SQL
  • Performance: <5 seconds for 100+ duplicates (vs 2.5 hours via API)
  • Speedup: 1800x faster than API-based deletion
  • Strategy: Stops HTTP server, runs direct SQL DELETE, restarts service
  • Safety: Keeps newest copy, normalizes timestamps before comparison

find_all_duplicates.py

  • Purpose: Fast duplicate detection with content normalization
  • Performance: <2 seconds for 2000 memories (vs 30s via API)
  • Speedup: 15x faster than API-based detection
  • Detection: MD5 hash of normalized content (removes timestamps)
  • Output: Detailed duplicate groups with memory counts

Comprehensive README.md

  • Quick reference table with performance benchmarks
  • Detailed usage for each script with examples
  • Best practices for before/after maintenance
  • Troubleshooting guide for common issues
  • Performance comparison table (API vs Direct SQL)

Location: scripts/maintenance/


📦 Installation & Upgrade

New Installations

pip install mcp-memory-service==8.7.0

# Cosine migration happens automatically on first startup
# No manual intervention required!

Upgrades from v8.6.0 or Earlier

pip install --upgrade mcp-memory-service==8.7.0

# Migration runs automatically on next startup
# Watch logs for "Cosine migration complete" message

# Optional: Manual embedding regeneration if needed
python scripts/maintenance/regenerate_embeddings.py

Post-Upgrade Verification

# Check migration status
curl -s "http://127.0.0.1:8000/api/health" | jq

# Run duplicate detection (should find 0 duplicates)
python scripts/maintenance/find_all_duplicates.py

# Search should now show 70-79% similarity scores
curl -s -X POST "http://127.0.0.1:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "test search", "n_results": 5}'

📊 Impact Summary

  • Search Accuracy: 0% → 70-79% similarity scores
  • Exact Matches: 61% → 79.2% accuracy
  • Embeddings Regenerated: 2605 memories
  • Maintenance Performance: 1800x speedup for bulk operations
  • Migration Strategy: Automatic with zero downtime

📖 Documentation


🔗 Links

Full Changelog: v8.6.0...v8.7.0

Don't miss a new mcp-memory-service release

NewReleases is sending notifications on new releases.