1.0.5 (2026-06-24)
Highlights:
- No more headless_chrome or chromium dependency. We now use servo-fetch and pdfium-render for the previous usecases. This means the total size is lighter, but since the new dependencies are included the binaries are a bit bigger. It should use less system resources as well.
- Another big win is that we now rebuild the indexes once a day instead of every ingestion. This saves, depending on the size of the db, a significant amount of time.
All changes:
- Infra: CI workflow fixes. CI is now a nix flake check which includes compilation, caching and running tests, clippy, fmt, validation for ort version.
- Docker-compose: The example now references the ghcr image, this is so we can remove the Dockerfile and reducing maintenance scope.
- Refactor: web scraping now uses
servo-fetch(pure-Rust Servo engine) and PDF rendering usespdfium-render(direct PDFium bindings) — reduces Docker image size by ~300MB, improves startup time significantly for PDF rendering, and provides more stable output - Fix: added
pkgs.libglvndtoLD_LIBRARY_PATHin devenv so Servo engine can findlibEGL.soat runtime - Fix: updated Dockerfile to add
libegl1 libegl-mesa0 libgles2 libfontconfig1 libfreetype6runtime dependencies for servo-fetch - Docs: updated architecture, features, and installation docs to reflect the new web processing stack
- Fix: added pre-commit hooks to further maintain code consistency.
- Security: updated some deps because dependabot told me, good bot.
- Refactor: deduplicated test database setup across common/src/storage/.
- Refactor: split knowledge-graph.js monolith into focused functions.
- Evaluations: simplified crate layout — linear pipeline, sharded-only converted store, in-memory ingestion,
db/andcli/modules; namespace reuse state in corpus manifest (removedcache/snapshots/); no legacy JSON/history compatibility (re-run--warmafter upgrade) - Performance: ingestion skips per-task index rebuild; worker runs scheduled
REBUILD INDEX(default every 24h viaindex_rebuild_interval_secs,0disables) - Performance: ingestion persists all artifacts in a single SurrealDB transaction per task (atomic replace by task id)
- Performance: entity embeddings during ingestion use batched
embed_batch, matching chunk embedding - Fix: ingestion reclaims tasks after a successful persist without re-running the pipeline when
mark_succeededfailed - Fix: content deletion clears graph relationships via shared
TextContent::clear_ingested_children - Fix: regression re suggestion of relationships
- Internal: extracted duplicate entity+embedding patterns into
HasEmbeddingandEmbeddingRecordtraits with genericstore_with_embedding,delete_by_source_id, andvector_searchonSurrealDbClient. - Infra:
ort-versionfile removed — version inlined inflake.nixanddevenv.nix;release.ymlreads it vianix eval .#lib.ortVersionfrom the plan job - Infra:
screenshot-graph.webpand.dockerignoredeleted