Important Notes: Eliminate Bottlenecks in Processing Large-scale Datasets
In production deployments, entity and relation metadata can grow unbounded as documents are continuously ingested. The source_id (chunk IDs) and file_path fields in entities and relations can accumulate thousands of entries, leading to:
- Performance degradation in vector database operations
- Increased storage costs
- Memory pressure during query operations
- Slower merge operations when processing new documents
LightRAG implements a configurable metadata size control system with two key features:
- Source ID limiting: Controls the maximum number of chunk IDs stored per entity/relation
- File path limiting: Controls the maximum number of file paths displayed in metadata (display-only, doesn't affect query performance)
Both features support two strategies:
- FIFO (First In First Out): Removes oldest entries when limit is reached. Best for evolving knowledge bases, keeps most recent information.
- KEEP: Keeps oldest entries, skips new ones when limit is reached. Best for stable knowledge bases, faster (fewer merge operations)
New environment variables with default values:
# Source ID limits (affects query performance)
MAX_SOURCE_IDS_PER_ENTITY=300
MAX_SOURCE_IDS_PER_RELATION=300
SOURCE_IDS_LIMIT_METHOD=FIFO
# File path limits (display only)
MAX_FILE_PATHS=100
What's New
- Feat: Add offline Docker build support with embedded models and cache by @danielaskdd in #2222
- Refact: Limit Vector Database Metadata Size to Support Large Scale Dataset by @danielaskdd in #2240
- Feat: Add Optional LLM Cache Deletion for Document Deletion by @danielaskdd in #2244
- Refact: Add Entity Identifier Length Truncation to Prevent Storage Failures by @danielaskdd in #2245
- Refact: Add Multimodal Processing Status Support to DocProcessingStatus for RayAnything Compatibility by @danielaskdd in #2248
What's Changed
- Refact: Improve query result with semantic null returns by @danielaskdd in #2218
- remove deprecated dotenv package. by @wkpark in #2229
- Refact: Frontend UI Fixes and Performance Improvements by @danielaskdd in #2234
- Security: Fix SQL injection vulnerabilities in PostgreSQL storage by @lucky-verma in #2235
- Update openai requirement from <2.0.0,>=1.0.0 to >=1.0.0,<3.0.0 by @dependabot[bot] in #2238
- Update pandas requirement from <2.3.0,>=2.0.0 to >=2.0.0,<2.4.0 by @dependabot[bot] in #2239
- Optimize PostgreSQL initialization performance by @yrangana in #2237
- fix(docs): correct typo "acivate" → "activate" by @xiaojunxiang2023 in #2243
New Contributors
- @wkpark made their first contribution in #2229
- @lucky-verma made their first contribution in #2235
- @dependabot[bot] made their first contribution in #2238
- @xiaojunxiang2023 made their first contribution in #2243
Full Changelog: v1.4.9.3...v1.4.9.4