Added
- Document ingestion using Docling (
ingestcommand) for PDF, DOCX, PPTX, and more - Contextual chunking with LLM support (Ollama via OpenAI-compatible API, OpenAI, WatsonX)
- BM25 full-text search with PostgreSQL tsvector and GIN indexing for hybrid search
- Database lifecycle commands:
db-start,db-stop,db-destroy,db-logs db-restorecommand for loading SQL dumpspipelinecommand for end-to-end workflow (ingest → chunk → embed → load → dump)- Multi-tier PostgreSQL configuration precedence (CLI > Env Vars > DATABASE_URL > Compose > Defaults)
- Metadata arguments for pipeline and load commands (
--username,--title,--description,--note) - Metadata tracking for ingested documents and chunking operations
--skip-contextflag to bypass LLM contextual chunking--context-modeland--openai-url/--watsonx-urlflags for LLM provider configuration- Persistent LLM sessions with KV cache reuse for improved performance
- Memory-efficient in-memory document ingestion
- Comprehensive database configuration tests
- Pre-commit hooks for code quality enforcement (ruff, pyright, gitleaks)
Changed
- Default content directory changed from
content/todocs2db_content/ - Commands now use settings defaults:
load,audit, andpipelinefall back tosettings.content_base_dirandsettings.embedding_model - Simplified database lifecycle: removed profile parameter (always uses "prod")
- Improved error messages: database connection errors now suggest
docs2db db-startinstead ofmake db-up - Reduced logging verbosity: suppressed verbose docling library output, moved per-file conversion messages to DEBUG
- Updated
.gitignoreto exclude generated artifacts (docs2db_content/,ragdb_dump.sql) - Improved CLI argument handling with explicit None checks and user-friendly error messages
Fixed
- Typer required argument handling now provides clear error messages instead of TypeErrors
- Removed duplicate error logging in database operations
- Updated compose file password to match default settings (
postgres) - Corrected ingest command docstring to show
docs2db_content/directory