github VersusControl/versus-incident v1.4.3

5 hours ago

[1.4.3] — 2026-05

Added

AI SRE Agent — multi-agent split (Phase 2)

  • Typed task dispatcher (pkg/core/ai_task.go) — new AIAgent
    interface with Name(), Kind(), Run(ctx, AITask) and task kinds
    detect / analyze. Per-kind cache + rate limiter via
    pkg/agent/ai/router/router.go.
  • Eino framework adoptionpkg/agent/ai/eino/chatmodel.go wraps
    the eino-ext/openai client as the sole LLM path. Two constructors:
    NewChatModel (JSON-mode, detect) and NewToolCallingChatModel
    (tool-calling, analyze).
  • DetectAgent relocation — detect logic moved to
    pkg/agent/ai/detect/ with its own embedded
    prompts/{SOUL,INPUTS,OUTPUT,RULES}.md. Compile-time tool-free guard
    enforces no tools are registered.
  • Shared prompt loader (pkg/agent/ai/prompt/loader.go) —
    content-free Assemble / MustAssemble used by both agents so
    prompt assembly stays uniform.
  • AnalyzeAgent (pkg/agent/ai/analyze/) — on-demand triage agent
    triggered via POST /api/admin/incidents/:id/analyze. Tool-calling
    with read-only tools: recent_incidents, pattern_history,
    describe_service. Own prompts/ set (triage analyst identity,
    never re-notifies). Max 3 tool iterations (configurable via
    agent.ai.analyze.max_tool_iterations). Compile-time Emitter-free
    guard.
  • Analyses storagestorage.Provider extended with
    SaveAnalysis, GetAnalysis, ListAnalyses, DeleteAnalysis
    (file + memory backends). Capped at 500 entries (FIFO eviction).
  • Admin endpoints (gated by X-Gateway-Secret):
    • POST /api/admin/incidents/:id/analyze
    • GET /api/admin/incidents/:id/analyses
    • GET /api/admin/analyses/:analysis_id
    • DELETE /api/admin/analyses/:analysis_id
  • Per-agent system-prompt endpointGET /api/agent/ai/system-prompt
    now accepts ?kind=detect|analyze (defaults to detect). Response
    includes source file list and assembly order.
  • Per-task AI configagent.ai.detect.* and agent.ai.analyze.*
    sub-blocks for model, temperature, max_tokens, max_calls_per_hour,
    cache_ttl overrides.

UI

  • Run Analysis button on the incident detail page (replaces the
    coming soon pill on the Analysis card when ai.enable is true).
  • AnalysisCard rendering root-cause hypotheses, evidence list
    (collapsible, source-tagged), next steps, related pattern links,
    and tool-call audit trail.
  • Past analyses collapsible section listing prior runs (newest
    first) with timestamp, model, and duration; click to expand.

Documentation

  • New src/agent/ai-analyze-mode.md covering on-demand analysis
    configuration, pipeline, admin endpoints, and worked example.
  • New data-source pages: src/agent/data-sources/graylog.md,
    src/agent/data-sources/splunk.md.

AI SRE Agent — analyze tools expansion (Phase 2.5)

  • get_related_logs tool — pulls a redacted raw-log slice from
    configured signal sources around the incident window. Bridge via
    SignalReader in pkg/agent/analyze_adapter.go (no import cycle).
    Window default 15m (cap 1440m), limit default 50 (cap 200).
  • recent_changes tool — reads one or more remote git repositories'
    commit histories to correlate incidents with recent deploys. Repos
    configured in tools.yaml (tools.recent_changes.git.repos[]). Each
    remote is mirror-cloned into a local cache on first use and fetched on
    later lookups. Global + per-repo auth: HTTPS token via
    http.extraHeader (never persisted to the mirror), SSH key via
    GIT_SSH_COMMAND. Window default 120m (cap 1440m), newest first.
  • describe_dependencies tool — surfaces upstream/downstream
    service neighbours from the service-dependency graph in tools.yaml
    (tools.describe_dependencies.services), with a
    has_recent_incident flag per neighbour. Reverse edges derived
    automatically from depends_on.
  • tools.yaml sibling config — new optional per-tool DATA
    configuration file (same directory as config.yaml). Supports
    ${VAR} expansion. Not a tool allow-list — tools are wired in code.
  • tool_timeout knob (root of tools.yaml, default 20s) — caps
    each tool dispatch; a timeout surfaces as a tool error, never a hard
    failure.
  • parallel_tools knob (root of tools.yaml, default false) —
    run multiple tool calls in one model turn concurrently while
    preserving deterministic trace ordering.

Changed

  • Legacy pkg/agent/ai/openai.go deleted — Eino is the only LLM
    path going forward. The core.AISRE adapter is removed; all callers
    use core.AIAgent via the router.
  • Prompt fragments relocated — moved from pkg/agent/ai/prompts/
    to pkg/agent/ai/detect/prompts/. Each agent owns its own fragments.
  • BuildAIBuildAIs in pkg/agent/factory_ai.go — returns
    AIBundle{Router, Detect, Analyze, Cache, Rate, AnalyzeRate}.
  • agent.ai.analyze config block added; analyze.enable defaults to
    true when ai.enable is true (no separate opt-in flag).
  • tool_timeout and parallel_tools moved from agent.ai.analyze
    to the root of tools.yaml — they apply to every tool dispatch.
  • analyzetools.Default signature extended to accept SignalReader,
    DependencyGraph, and ChangeFeed dependencies.

Fixed

  • Incident detail page no longer shows stale analysis status after
    triggering a new run.

[1.4.2] — 2026-05

Added

Data sources (AI agent)

  • Graylog signal source (pkg/signalsources/graylog.go) — polls
    /api/search/universal/absolute (synchronous, sorted ascending) with
    optional stream_id, configurable query, message_field, and
    extra fields. Auth supports HTTP Basic and the Graylog API-token
    convention (<token>:token). Cursor advances on the max message
    timestamp seen; inclusive-from duplicates are filtered client-side.
  • Splunk signal source (pkg/signalsources/splunk.go) — streams
    results from /services/search/v2/jobs/export (NDJSON). Auth via
    bearer token (preferred) or HTTP Basic. Sub-second epoch
    earliest_time / latest_time; cursor is the max _time seen.
    Search string is auto-prefixed with search when missing.
  • Agent supports type: graylog and type: splunk in
    agent_sources.yaml alongside the existing sources.

Examples & tooling

  • New docker-compose examples under
    examples/docker-compose/{graylog,splunk}/ — fully wired stacks
    (Graylog + MongoDB + OpenSearch; Splunk Enterprise with HEC) plus
    ready-to-use agent_sources.yaml.
  • scripts/generate_noisy_logs.py gains GraylogSink (GELF UDP) and
    SplunkSink (HEC) plus --graylog-* / --splunk-* CLI flags
    (env-var aware: GRAYLOG_HOST, SPLUNK_HEC_TOKEN, …).
  • scripts/run_noisy_logs.sh adds graylog and splunk targets.

Don't miss a new versus-incident release

NewReleases is sending notifications on new releases.