New Features
Semantic Search
Full vector search support via sentence-embedding models. Documents are chunked
and embedded at index time; search queries are embedded at query time and ranked
by cosine similarity. Two storage backends are supported:
- SQLite (default, via bundled
sqlite-vec) zero extra infrastructure required - PostgreSQL with
pgvectorauto-selected when the database is Postgres
Configure the embedding API endpoint, model, dimensions, and chunking parameters
in the new semantic config section. Semantic search is opt-in and off by default.
Relevance scores are shown alongside results when semantic search is active.
OAuth / SSO Authentication
OAuth 2.0 and OpenID Connect (OIDC) providers can now be configured as login
methods. Add one or more entries to the new server.oauth config section with
client_id, client_secret, configuration_url (for OIDC auto-discovery), or
manual auth_url / token_url, and optional scopes. Multiple providers can be
active at the same time alongside the built-in username/password login.
MCP Server
Hister now exposes a Model Context Protocol
endpoint at /api/mcp, enabling LLM agents and MCP-compatible tools to search
the index directly.
Persistent Crawler State Management
Recursive crawl jobs (hister index -r) are now stored in the database and
survive interruptions. Each job gets a unique ID (auto-generated or set via
--job-id). Pass --job-id <id> without --recursive to resume an
interrupted crawl from exactly where it left off, including original validator
rules and visited-URL counts.
New Extractors
- Wikipedia extracts the article body and infobox, rewrites relative links, and sanitizes the output
- GitHub project extracts repository descriptions and README content from GitHub project pages
- Lobste.rs dedicated extractor for Lobste.rs story and comment pages
- yt-dlp extracts video metadata (title, description, channel) from video pages via yt-dlp
- JSON-LD surfaces structured metadata (
@type,headline, description) from pages that embed JSON-LD
All extractors now expose a Description() method, and an extractor information
page is available at /extractors in the web UI.
OpenSearch Suggestions
The server now serves an OpenSearch suggestions endpoint (/api/suggest),
allowing browsers to display search-as-you-type completions when Hister is
configured as a search engine.
Enhancements
Crawler Backend for All Index Operations
The --backend flag (and --backend-option) is now available on both
hister index (plain and --recursive) and hister import-browser, allowing
a headless Chrome/Chromium backend for JavaScript-heavy pages without running a
full recursive crawl:
hister index --backend chromedp https://example.com
hister import-browser --backend chromedp --backend-option exec_path=/usr/bin/chromiumHeaders and cookies can also be injected per-invocation:
hister index --header "Accept-Language=en" --cookie "session=abc; Domain=example.com" https://example.comCookies use standard Set-Cookie format with a required Domain attribute.
CLI Search Improvements
--limit Nflag caps the number of results returned--fieldsflag selects which document fields to include in output--htmlflag includes raw HTML content in the output- Paging support added to both CLI search and
list-urls list-urlsnow fetches results from the server by default;--offlineconnects directly to the index without a running server
Quoted Field Queries
Field-qualified queries now support quoted values, enabling correct deletion and
lookup of URLs that contain spaces (common on Windows file paths):
url:"file:///C:/Users/My Documents/notes.txt"
Preview Panel Polish
- Preview title is now clickable (opens the result URL)
- Preview panel maximises available content width
- JSON-LD metadata surfaced inside the preview panel
- Dark theme font colors fixed in preview popup
NixOS / Nix Module
systemdandlaunchdhardening applied to the Hister service units- New
services.hister.environmentFileoption for secrets injection openFirewallnow requires explicit opt-inservices.hister.configrenamed toservices.hister.settings
Other
- Executable size reduced ~70 MB by switching to a trimmed
lingua-gofork - Sensitive content rejection errors surfaced in the browser extension
--verboseflag onhister deletelists matched URLs before deleting- Priority result deduplication now copies body text from the original result
/suggestendpoint protected by auth middleware andSec-Fetch-Siteheader check- Version information included in the MCP endpoint response
- Timezone data bundled into the binary for environments without a system
tzdata
Bug Fixes
- File URLs (
file://) now handled correctly in the UI for both opening and deletion (#362) - Browser extension authentication documentation corrected (#366)
- URLs no longer lowercased during query building, preventing mismatches on case-sensitive paths
- History view correctly filtered per-user in multi-user mode (#314)
- Token authentication middleware now respects
NoAuthflag (#348) - Documents with no HTML content no longer attempt HTML extraction (#351)
- Extension no longer resubmits documents after a
406 Not Acceptableresponse - Priority results correctly deduplicated against standard results
- File indexing fixed on Windows
- Wide tables no longer overflow the preview panel
- Score field populated correctly in search responses
aws_access_keysensitive content pattern tightened to reduce false positives- Home-manager service units correctly gated on host platform in Nix module