New Features

Semantic Search

Full vector search support via sentence-embedding models. Documents are chunked
and embedded at index time; search queries are embedded at query time and ranked
by cosine similarity. Two storage backends are supported:

SQLite (default, via bundled sqlite-vec) zero extra infrastructure required
PostgreSQL with pgvector auto-selected when the database is Postgres

Configure the embedding API endpoint, model, dimensions, and chunking parameters
in the new semantic config section. Semantic search is opt-in and off by default.
Relevance scores are shown alongside results when semantic search is active.

OAuth / SSO Authentication

OAuth 2.0 and OpenID Connect (OIDC) providers can now be configured as login
methods. Add one or more entries to the new server.oauth config section with
client_id, client_secret, configuration_url (for OIDC auto-discovery), or
manual auth_url / token_url, and optional scopes. Multiple providers can be
active at the same time alongside the built-in username/password login.

MCP Server

Hister now exposes a Model Context Protocol
endpoint at /api/mcp, enabling LLM agents and MCP-compatible tools to search
the index directly.

Persistent Crawler State Management

Recursive crawl jobs (hister index -r) are now stored in the database and
survive interruptions. Each job gets a unique ID (auto-generated or set via
--job-id). Pass --job-id <id> without --recursive to resume an
interrupted crawl from exactly where it left off, including original validator
rules and visited-URL counts.

New Extractors

Wikipedia extracts the article body and infobox, rewrites relative links, and sanitizes the output
GitHub project extracts repository descriptions and README content from GitHub project pages
Lobste.rs dedicated extractor for Lobste.rs story and comment pages
yt-dlp extracts video metadata (title, description, channel) from video pages via yt-dlp
JSON-LD surfaces structured metadata (@type, headline, description) from pages that embed JSON-LD

All extractors now expose a Description() method, and an extractor information
page is available at /extractors in the web UI.

OpenSearch Suggestions

The server now serves an OpenSearch suggestions endpoint (/api/suggest),
allowing browsers to display search-as-you-type completions when Hister is
configured as a search engine.

Enhancements

Crawler Backend for All Index Operations

The --backend flag (and --backend-option) is now available on both
hister index (plain and --recursive) and hister import-browser, allowing
a headless Chrome/Chromium backend for JavaScript-heavy pages without running a
full recursive crawl:

hister index --backend chromedp https://example.com
hister import-browser --backend chromedp --backend-option exec_path=/usr/bin/chromium

Headers and cookies can also be injected per-invocation:

hister index --header "Accept-Language=en" --cookie "session=abc; Domain=example.com" https://example.com

Cookies use standard Set-Cookie format with a required Domain attribute.

CLI Search Improvements

--limit N flag caps the number of results returned
--fields flag selects which document fields to include in output
--html flag includes raw HTML content in the output
Paging support added to both CLI search and list-urls
list-urls now fetches results from the server by default; --offline connects directly to the index without a running server

Quoted Field Queries

Field-qualified queries now support quoted values, enabling correct deletion and
lookup of URLs that contain spaces (common on Windows file paths):

url:"file:///C:/Users/My Documents/notes.txt"

Preview Panel Polish

Preview title is now clickable (opens the result URL)
Preview panel maximises available content width
JSON-LD metadata surfaced inside the preview panel
Dark theme font colors fixed in preview popup

NixOS / Nix Module

systemd and launchd hardening applied to the Hister service units
New services.hister.environmentFile option for secrets injection
openFirewall now requires explicit opt-in
services.hister.config renamed to services.hister.settings

Other

Executable size reduced ~70 MB by switching to a trimmed lingua-go fork
Sensitive content rejection errors surfaced in the browser extension
--verbose flag on hister delete lists matched URLs before deleting
Priority result deduplication now copies body text from the original result
/suggest endpoint protected by auth middleware and Sec-Fetch-Site header check
Version information included in the MCP endpoint response
Timezone data bundled into the binary for environments without a system tzdata

Bug Fixes

File URLs (file://) now handled correctly in the UI for both opening and deletion (#362)
Browser extension authentication documentation corrected (#366)
URLs no longer lowercased during query building, preventing mismatches on case-sensitive paths
History view correctly filtered per-user in multi-user mode (#314)
Token authentication middleware now respects NoAuth flag (#348)
Documents with no HTML content no longer attempt HTML extraction (#351)
Extension no longer resubmits documents after a 406 Not Acceptable response
Priority results correctly deduplicated against standard results
File indexing fixed on Windows
Wide tables no longer overflow the preview panel
Score field populated correctly in search responses
aws_access_key sensitive content pattern tightened to reduce false positives
Home-manager service units correctly gated on host platform in Nix module

asciimoo/hister v0.13.0 on GitHub