github rhel-lightspeed/docs2db v0.1.0
v0.1.0 - Initial Implementation

latest releases: v0.4.4, v0.4.3, v0.4.2...
5 months ago

Initial release of docs2db - a tool for building RAG databases from documents using Docling, contextual chunking, and pgvector.

This release provides the core functionality for converting documents into a PostgreSQL database optimized for retrieval-augmented generation (RAG) systems. Once you've created a database with docs2db, use docs2db-api to perform RAG searches with hybrid search, reranking, and other opinionated optimizations.

Added

  • Initial implementation of docs2db
  • Document ingestion using Docling for PDF, DOCX, PPTX, and more
  • Contextual chunking with LLM support (Ollama, OpenAI, WatsonX)
  • Embedding generation with multiple model support
  • PostgreSQL database with pgvector for vector similarity search
  • CLI interface with commands: ingest, chunk, embed, load, audit, db-status, db-dump, db-restore
  • Comprehensive test suite
  • Development tooling: Makefile, pre-commit hooks, Docker Compose setup

Don't miss a new docs2db release

NewReleases is sending notifications on new releases.