[1.4.3] — 2026-05
Added
AI SRE Agent — multi-agent split (Phase 2)
- Typed task dispatcher (
pkg/core/ai_task.go) — newAIAgent
interface withName(),Kind(),Run(ctx, AITask)and task kinds
detect/analyze. Per-kind cache + rate limiter via
pkg/agent/ai/router/router.go. - Eino framework adoption —
pkg/agent/ai/eino/chatmodel.gowraps
theeino-ext/openaiclient as the sole LLM path. Two constructors:
NewChatModel(JSON-mode, detect) andNewToolCallingChatModel
(tool-calling, analyze). - DetectAgent relocation — detect logic moved to
pkg/agent/ai/detect/with its own embedded
prompts/{SOUL,INPUTS,OUTPUT,RULES}.md. Compile-time tool-free guard
enforces no tools are registered. - Shared prompt loader (
pkg/agent/ai/prompt/loader.go) —
content-freeAssemble/MustAssembleused by both agents so
prompt assembly stays uniform. - AnalyzeAgent (
pkg/agent/ai/analyze/) — on-demand triage agent
triggered viaPOST /api/admin/incidents/:id/analyze. Tool-calling
with read-only tools:recent_incidents,pattern_history,
describe_service. Ownprompts/set (triage analyst identity,
never re-notifies). Max 3 tool iterations (configurable via
agent.ai.analyze.max_tool_iterations). Compile-time Emitter-free
guard. - Analyses storage —
storage.Providerextended with
SaveAnalysis,GetAnalysis,ListAnalyses,DeleteAnalysis
(file + memory backends). Capped at 500 entries (FIFO eviction). - Admin endpoints (gated by
X-Gateway-Secret):POST /api/admin/incidents/:id/analyzeGET /api/admin/incidents/:id/analysesGET /api/admin/analyses/:analysis_idDELETE /api/admin/analyses/:analysis_id
- Per-agent system-prompt endpoint —
GET /api/agent/ai/system-prompt
now accepts?kind=detect|analyze(defaults todetect). Response
includes source file list and assembly order. - Per-task AI config —
agent.ai.detect.*andagent.ai.analyze.*
sub-blocks for model, temperature, max_tokens, max_calls_per_hour,
cache_ttl overrides.
UI
- Run Analysis button on the incident detail page (replaces the
coming soonpill on the Analysis card whenai.enableis true). - AnalysisCard rendering root-cause hypotheses, evidence list
(collapsible, source-tagged), next steps, related pattern links,
and tool-call audit trail. - Past analyses collapsible section listing prior runs (newest
first) with timestamp, model, and duration; click to expand.
Documentation
- New
src/agent/ai-analyze-mode.mdcovering on-demand analysis
configuration, pipeline, admin endpoints, and worked example. - New data-source pages:
src/agent/data-sources/graylog.md,
src/agent/data-sources/splunk.md.
AI SRE Agent — analyze tools expansion (Phase 2.5)
get_related_logstool — pulls a redacted raw-log slice from
configured signal sources around the incident window. Bridge via
SignalReaderinpkg/agent/analyze_adapter.go(no import cycle).
Window default 15m (cap 1440m), limit default 50 (cap 200).recent_changestool — reads one or more remote git repositories'
commit histories to correlate incidents with recent deploys. Repos
configured intools.yaml(tools.recent_changes.git.repos[]). Each
remote is mirror-cloned into a local cache on first use and fetched on
later lookups. Global + per-repo auth: HTTPS token via
http.extraHeader(never persisted to the mirror), SSH key via
GIT_SSH_COMMAND. Window default 120m (cap 1440m), newest first.describe_dependenciestool — surfaces upstream/downstream
service neighbours from the service-dependency graph intools.yaml
(tools.describe_dependencies.services), with a
has_recent_incidentflag per neighbour. Reverse edges derived
automatically fromdepends_on.tools.yamlsibling config — new optional per-tool DATA
configuration file (same directory asconfig.yaml). Supports
${VAR}expansion. Not a tool allow-list — tools are wired in code.tool_timeoutknob (root oftools.yaml, default20s) — caps
each tool dispatch; a timeout surfaces as a tool error, never a hard
failure.parallel_toolsknob (root oftools.yaml, defaultfalse) —
run multiple tool calls in one model turn concurrently while
preserving deterministic trace ordering.
Changed
- Legacy
pkg/agent/ai/openai.godeleted — Eino is the only LLM
path going forward. Thecore.AISREadapter is removed; all callers
usecore.AIAgentvia the router. - Prompt fragments relocated — moved from
pkg/agent/ai/prompts/
topkg/agent/ai/detect/prompts/. Each agent owns its own fragments. BuildAI→BuildAIsinpkg/agent/factory_ai.go— returns
AIBundle{Router, Detect, Analyze, Cache, Rate, AnalyzeRate}.agent.ai.analyzeconfig block added;analyze.enabledefaults to
truewhenai.enableis true (no separate opt-in flag).tool_timeoutandparallel_toolsmoved fromagent.ai.analyze
to the root oftools.yaml— they apply to every tool dispatch.analyzetools.Defaultsignature extended to acceptSignalReader,
DependencyGraph, andChangeFeeddependencies.
Fixed
- Incident detail page no longer shows stale analysis status after
triggering a new run.
[1.4.2] — 2026-05
Added
Data sources (AI agent)
- Graylog signal source (
pkg/signalsources/graylog.go) — polls
/api/search/universal/absolute(synchronous, sorted ascending) with
optionalstream_id, configurablequery,message_field, and
extra fields. Auth supports HTTP Basic and the Graylog API-token
convention (<token>:token). Cursor advances on the max message
timestamp seen; inclusive-fromduplicates are filtered client-side. - Splunk signal source (
pkg/signalsources/splunk.go) — streams
results from/services/search/v2/jobs/export(NDJSON). Auth via
bearer token (preferred) or HTTP Basic. Sub-second epoch
earliest_time/latest_time; cursor is the max_timeseen.
Search string is auto-prefixed withsearchwhen missing. - Agent supports
type: graylogandtype: splunkin
agent_sources.yamlalongside the existing sources.
Examples & tooling
- New docker-compose examples under
examples/docker-compose/{graylog,splunk}/— fully wired stacks
(Graylog + MongoDB + OpenSearch; Splunk Enterprise with HEC) plus
ready-to-useagent_sources.yaml. scripts/generate_noisy_logs.pygainsGraylogSink(GELF UDP) and
SplunkSink(HEC) plus--graylog-*/--splunk-*CLI flags
(env-var aware:GRAYLOG_HOST,SPLUNK_HEC_TOKEN, …).scripts/run_noisy_logs.shaddsgraylogandsplunktargets.