Initial release of docs2db - a tool for building RAG databases from documents using Docling, contextual chunking, and pgvector.
This release provides the core functionality for converting documents into a PostgreSQL database optimized for retrieval-augmented generation (RAG) systems. Once you've created a database with docs2db, use docs2db-api to perform RAG searches with hybrid search, reranking, and other opinionated optimizations.
Added
- Initial implementation of docs2db
- Document ingestion using Docling for PDF, DOCX, PPTX, and more
- Contextual chunking with LLM support (Ollama, OpenAI, WatsonX)
- Embedding generation with multiple model support
- PostgreSQL database with pgvector for vector similarity search
- CLI interface with commands: ingest, chunk, embed, load, audit, db-status, db-dump, db-restore
- Comprehensive test suite
- Development tooling: Makefile, pre-commit hooks, Docker Compose setup