Changed
- BREAKING: Model identification consolidated -
model_nameandmodel_idmerged into singlemodelterm throughout codebase - BREAKING: Embedding metadata structure flattened:
- Old:
metadata.model.model_id,metadata.embedding.created_at - New:
metadata.model,metadata.embedded_at - Existing embedding files automatically detected as stale and regenerated
- Old:
- Pattern handling standardized across all commands (
chunk,embed,load,audit):- Commands now accept directory patterns without requiring wildcards
- Automatically appends
/source.jsonor/chunks.jsonto patterns - Examples:
--pattern "docs/subdir"(exact path) or--pattern "external/**"(glob) - Old strict wildcard validation removed
db-statusnow displays document paths instead of filenames in "Recent document activity" section for better identification in subdirectory structure- Audit command now shows full relative paths from content directory instead of just directory names for better error reporting
Added
- Environment variable support for debug logging:
DEBUG=1orLOG_LEVEL=DEBUG - Improved logging: "Creating LLM session" and batch processing messages moved to DEBUG level
- README improvements:
- New "Overview" section explaining processing time, stages, idempotency, and storage
- Updated "Requirements" section clarifying Ollama's role (optional for contextual chunking)
- Library usage examples for
ingest_file()andingest_from_content()functions - "Serving Your Database" section with docs2db-api integration examples
Fixed
- BatchProcessor now accepts sorted lists instead of iterators to ensure file processing order across restarts
- Removed misleading "LLM session created successfully" log message that appeared before actual connection attempts
Internal
EmbeddingandEmbeddingProviderclasses simplified with explicit parameters instead of config dictionariesis_embedding_stale()signature updated to match new metadata structure- All worker functions updated to use new
modelparameter naming - Test fixtures updated to reflect new metadata format
- CONTRIBUTING.md updated with correct
uv sync --extra watsonxcommand