Priivacy-ai/spec-kitty v3.2.0a5 on GitHub

Fixed

CLI auth now consumes the server Tranche 2 contract end to end: logout posts refresh tokens to /oauth/revoke, local credential cleanup failures are reported truthfully, refresh handles benign 409 replay without resubmitting a spent token, and auth doctor --server checks /api/v1/session-status with safe re-authentication guidance (#902).
Fix spec-kitty upgrade silently leaving projects in PROJECT_MIGRATION_NEEDED state by stamping schema_version after metadata save (#705, WP01).
spec-kitty init in a non-git directory now prints an actionable "run git init" message (#636, WP05).
Suppress misleading "shutdown / final-sync" red error lines after a successful spec-kitty agent mission create --json payload (#735, WP06).
Deduplicate "Not authenticated, skipping sync" / "token refresh failed" diagnostics to at most once per CLI invocation (#717, WP06).
Fix read_events() raising KeyError('wp_id') on DecisionPointOpened / DecisionPointResolved events that share status.events.jsonl with lane-transition events. Restores finalize-tasks / materialize / dashboard for any mission that uses the Decision Moment Protocol (#830, WP08).

Changed

Loosen .python-version from a hard 3.13 pin to 3.11 (the floor declared by pyproject.toml) and restore mypy --strict cleanliness on mission_step_contracts/executor.py (#805, WP03).

Removed

Retire the deprecated /spec-kitty.checklist command surface from every supported agent's rendered output. The canonical requirements checklist at kitty-specs/<mission>/checklists/requirements.md is unaffected (#815, supersedes #635, WP04).

Internal

Add regression tests confirming --feature aliases stay hidden from --help while remaining accepted (#790, WP07).
Add regression test confirming spec-kitty agent decision command shape stays consistent across docs / help / skill snapshots (#774, WP07).

Added

Frontend Freddy agent profile — browser-side implementer specialising in HTML/CSS/JavaScript/TypeScript, component frameworks (React, Vue, Svelte), WCAG 2.1 accessibility, Core Web Vitals performance, and frontend testing (vitest, Playwright). Specialises from implementer-ivan. Self-review protocol enforces lint, type-check, unit/component tests, e2e smoke, axe accessibility gate, and bundle budget. Avoidance boundary explicitly names Node Norris's server-side domain.
Node Norris agent profile — server-side Node.js implementer specialising in HTTP APIs (Express/Fastify/NestJS), async/Promise discipline, streaming, npm security (npm audit), and integration testing (supertest). Specialises from implementer-ivan. Avoidance boundary explicitly names Frontend Freddy's browser-rendering domain. The two profiles are mutually exclusive by design.
BDD paradigm (behaviour-driven-development) — encodes BDD as a three-phase collaboration practice: Discovery (Three Amigos conversations), Formulation (Given/When/Then specifications), and Automation (executable living documentation). References DIRECTIVE_034 and DIRECTIVE_037.
BDD Scenario Lifecycle procedure (bdd-scenario-lifecycle) — covers the Formulation → Automation → Maintenance phases that follow an Example Mapping Workshop. Toolchain-agnostic (Cucumber-JVM, Cucumber-JS, Behave, SpecFlow). Encodes four anti-patterns: imperative Gherkin, rubber-stamp scenarios, shared mutable state, and orphaned step definitions.
New tactics:
- reference-architectural-patterns — structured selection of named reference patterns (Layered, Hexagonal, Event-Driven, CQRS, Microservices, Modular Monolith) scored against coupling, scalability, and operational complexity constraints.
- development-bdd — architecture-level BDD tactic for expressing observable behavioral contracts at system boundaries before implementation; distinct from the existing behavior-driven-development technique tactic.
- bug-fixing-checklist — language-agnostic test-first defect resolution: write a reproduction test before touching production code.
- test-readability-clarity-check — dual-perspective reconstruction check: read only tests, reconstruct system understanding, compare against spec to surface documentation gaps.
- code-documentation-analysis — brownfield boundary discovery by extracting and clustering domain terminology from code and documentation artifacts. Contributes foundational analysis tactics toward the brownfield investigation skill described in #666.
- terminology-extraction-mapping — systematic extraction and relationship mapping of domain terms across multiple sources to produce a maintainable glossary. Complementary artifact to the bounded-context linguistic discovery approach targeted by #666.
Tactic directory normalization — shipped tactics reorganised into four category subdirectories: testing/ (15 tactics), analysis/ (14), communication/ (7), architecture/ (14). Cross-cutting tactics remain in the shipped/ root. The existing rglob loader requires no changes.
tasks-finalize command skill — added to CANONICAL_COMMANDS in the agent skills pipeline and deployed to .agents/skills/spec-kitty.tasks-finalize/. Closes the gap where this command was missing from Codex/Vibe skill packages.

Changed

Profile enrichment — four existing profiles updated with additive tactic and paradigm references:
- implementer-ivan: bug-fixing-checklist tactic reference (propagates to all specialist profiles via resolve_profile() union merge).
- reviewer-renata: test-readability-clarity-check and bdd-scenario-lifecycle tactic references; behaviour-driven-development paradigm in context sources.
- architect-alphonso: development-bdd tactic reference; BDD paradigm, example-mapping-workshop, and bdd-scenario-lifecycle in additional context sources.
- java-jenny: behavior-driven-development and bdd-scenario-lifecycle tactic references; bdd-scenarios self-review step (Cucumber-JVM + Serenity BDD gate).
behavior-driven-development tactic enriched — extended notes with a toolchain landscape section (Cucumber family, Playwright, Selenium, Serenity BDD, custom DSLs; source: patterns.sddevelopment.be/primers/toolchain-and-automation/bdd); three new failure_modes (rubber-stamp scenarios, shared mutable state between scenarios, orphaned step definitions); cross-references to the new BDD paradigm and procedure.
tactic-references union-merged in resolve_profile() — tactic-references added to _LIST_FIELDS in src/doctrine/agent_profiles/repository.py. Specialist profiles now inherit base-profile tactic references via _union_merge at resolution time rather than overriding them.
Tactic compliance test extended — test_tactic_compliance.py ARTIFACT_DIRS now includes procedure and paradigm types, enabling cross-type reference validation for tactics that reference procedures or paradigms.
Shared package boundary cutover (mission shared-package-boundary-cutover-01KQ22DS) — spec-kitty-runtime is no longer a dependency of spec-kitty-cli. The CLI now owns its own runtime internally under src/specify_cli/next/_internal_runtime/; spec-kitty next works from a clean install of spec-kitty-cli alone. spec-kitty-events and spec-kitty-tracker are external PyPI dependencies consumed via their public import surfaces (spec_kitty_events, spec_kitty_tracker). The vendored events tree under src/specify_cli/spec_kitty_events/ has been removed (~23 kLoC). Developers who relied on editable cross-package overrides should consult docs/development/local-overrides.md; operators upgrading from a pre-cutover release should consult docs/migration/shared-package-boundary-cutover.md. Decision rationale recorded in ADR 2026-04-25-1.

Removed

constraints.txt — the file existed solely to paper over a transitive pin conflict with the retired spec-kitty-runtime package and is no longer needed.

Fixed

spec-kitty agent config list/status now checks global command roots for slash-command agents instead of reporting missing project-local command directories after init.
spec-kitty agent config add/sync --create-missing no longer recreates retired project-local command directories for globally managed slash-command agents.
spec-kitty agent config remove/sync now removes only the managed command surface for project-local agent directories, preserving unrelated files such as .github/workflows/.

Added — Documentation mission composition rewrite (#502, #461, Phase 6 WP6.4)

Documentation mission now runs on the StepContractExecutor composition substrate, mirroring research (#504) and software-dev (#503). The runtime resolves the new composed step contracts ahead of the legacy mission.yaml workflow via the existing _resolve_runtime_template_in_root precedence — no loader changes were required.
New runtime sidecar templates: src/specify_cli/missions/documentation/mission-runtime.yaml and src/doctrine/missions/documentation/mission-runtime.yaml.
Six shipped step contracts under src/doctrine/mission_step_contracts/shipped/documentation-{discover,audit,design,generate,validate,publish}.step-contract.yaml.
Six action doctrine bundles under src/doctrine/missions/documentation/actions/{discover,audit,design,generate,validate,publish}/ (governance guidelines + directive/tactic indices).
DRG action nodes and edges for action:documentation/{discover,audit,design,generate,validate,publish} in src/doctrine/graph.yaml.
Composition wiring in src/specify_cli/next/runtime_bridge.py: _COMPOSED_ACTIONS_BY_MISSION["documentation"] and a fail-closed guard branch in _check_composed_action_guard() raising a structured error for unknown documentation actions. src/specify_cli/mission_step_contracts/executor.py adds six _ACTION_PROFILE_DEFAULTS entries (researcher-robbie for discover/audit, architect-alphonso for design, implementer-ivan for generate, reviewer-renata for validate/publish).
Real-runtime integration walk at tests/integration/test_documentation_runtime_walk.py proving SC-001 / SC-003 / SC-004 from a freshly initialized temp repo.

Backward compatibility

The legacy src/specify_cli/missions/documentation/mission.yaml and src/doctrine/missions/documentation/mission.yaml files remain on disk for backward reference. Existing documentation-mission projects that authored against the legacy workflow continue to work; runtime template resolution prefers the new mission-runtime.yaml ahead of the legacy file via the existing precedence in _resolve_runtime_template_in_root (no loader changes in this PR).

Added

Upgrade compatibility planner — spec-kitty upgrade now separates CLI
update guidance from current-project schema compatibility. New flags
--cli, --project, --yes, and --no-nag support CLI-only guidance,
project-only migrations, non-interactive confirmation, and explicit nag
suppression. spec-kitty upgrade --dry-run --json emits the stable
compatibility-plan contract for automation.
Host-surface parity matrix at docs/host-surface-parity.md — authoritative record of how each of the 15 supported host surfaces teaches the advise/ask/do governance-injection contract. Closes the remaining #496 host-surface breadth rollout.
Mode of work runtime derivation — every advise, ask, do invocation now records its mode_of_work (advisory, task_execution, mission_step, query) on the started event. Derivation is from the CLI entry command.
Correlation links — spec-kitty profile-invocation complete accepts --artifact <path> (repeatable) and --commit <sha> (singular); each appends an additive event to the invocation JSONL for single-file request→artifact/commit correlation.
SaaS read-model policy at src/specify_cli/invocation/projection_policy.py — typed module mapping (mode, event) to projection rules. Documented in docs/trail-model.md.
Tier 2 SaaS projection decision — decisively documented as deferred in docs/trail-model.md. Tier 2 evidence stays local-only in 3.2.x.
README Governance layer subsection — entry point for operators discovering the advise/ask/do surface.
Decision Moment Ledger (V1) — new spec-kitty agent decision subgroup with five
subcommands: open, resolve, defer, cancel, verify. Mints ULID decision_ids
at interview ask-time, writes paper trail under kitty-specs/<mission>/decisions/
(index.json + DM-<id>.md), and appends DecisionPointOpened(interview) /
DecisionPointResolved(interview) events to status.events.jsonl. Local-only;
no SaaS sync required.
Charter integration — spec-kitty charter interview now calls decision open
before each question and the appropriate terminal command after each answer.
answers.yaml behavior is unchanged.
Specify + Plan template updates — specify.md and plan.md source templates
gain a Decision Moment Protocol section instructing the LLM to call decision
subcommands at ask/resolution time and write  anchors
for deferred decisions.
decision verify gate — scans spec.md / plan.md for
[NEEDS CLARIFICATION: ...]  sentinels and
cross-checks against the decisions index. Exits non-zero on drift
(DEFERRED_WITHOUT_MARKER, MARKER_WITHOUT_DECISION, STALE_MARKER).
Widen Mode (#758) — spec-kitty agent decision widen + resolve --from-widen
lifecycle. Writes widen-pending.jsonl, emits DecisionPointWidened events,
integrates with charter/specify/plan widen affordances. Surfaces decision
write-back errors explicitly instead of silently suppressing them.

Changed

Project schema compatibility is now enforced by the centralized compat
planner. Out-of-date CLI notices are passive and throttled; incompatible
project schemas block unsafe commands with exit codes 4, 5, or 6 and exact
remediation guidance.
spec-kitty profile-invocation complete --evidence is now mode-gated: rejected on advisory / query invocations with InvalidModeForEvidenceError. Rejection occurs before any write; the invocation stays open.
_propagate_one consults the new projection policy after the sync-gate and authentication lookup. Existing task_execution / mission_step projection behaviour is preserved exactly.
Dashboard user-visible wording: the mission selector, current-mission header, overview heading, analysis heading, and empty-state prompt now read "Mission Run" / "mission" instead of "Feature". Backend identifiers (CSS classes, HTML IDs, cookie keys, API route segments, JSON field names) are unchanged.
spec-kitty-events bumped to ==4.0.0 — vendored copy at
src/specify_cli/spec_kitty_events/ refreshed. Introduces
DecisionPointOpenedInterviewPayload, DecisionPointResolvedInterviewPayload,
OriginSurface.PLANNING_INTERVIEW (origin_surface: planning_interview),
OriginFlow enum (values specify, plan), DecisionPointWidened, and
TerminalOutcome enum.
[tool.uv.sources] redirects spec-kitty-events to ../spec-kitty-events/
in editable mode for monorepo development. Dev-only; ignored by pip / PyPI.

Deferred

spec-kitty explain (issue #534) remains deferred to Phase 5 pending DRG glossary addressability (#499, #759).

Out of scope (tracked separately)

SaaS sync projection for widened decisions — tracked in spec-kitty-saas#110, #111.
Tasks-phase interview support — future mission.

Migration notes

No operator action required for routine upgrade. The trail model is additive:

Pre-mission invocation records (no mode_of_work) continue to accept --evidence and project under legacy task_execution rules.
Existing SaaS dashboards see no change for task_execution / mission_step traffic.
New advisory events now appear in the SaaS timeline as minimal entries without body — this is a deliberate behaviour change documented in the SaaS Read-Model Policy table.

Added (Phase 4 trail follow-on)

docs/trail-model.md: Formal operator documentation for the Phase 4 trail contract,
mode-of-work taxonomy, tier promotion rules, SaaS projection policy, intake positioning,
and explain deferral (WP04).
"Governance context injection" section in .agents/skills/spec-kitty.advise/SKILL.md
for Codex/Vibe hosts, enabling Tier 1 trail recording without host-side SaaS auth (WP03).
"Standalone invocations (outside missions)" section in
src/doctrine/skills/spec-kitty-runtime-next/SKILL.md for Claude Code and gstack hosts,
covering when to open an invocation record outside the mission workflow (WP04).
End-to-end invocation integration tests in
tests/specify_cli/invocation/test_invocation_e2e.py covering Tier 1 JSONL write,
complete-event append, local-only list read, and sync-gate suppression (WP05).

Fixed

propagator.py (_propagate_one): Invocation events are now suppressed when
effective_sync_enabled = False, even when the user is authenticated. Previously,
sync-disabled checkouts could still emit SaaS events if a WebSocket client was
connected (WP01).
executor.complete_invocation now calls promote_to_evidence() when the
--evidence flag is supplied, enabling correct Tier 2 artifact promotion (WP03).

Changed

Issue #496: Priority-surface slice complete in 3.2.x (Claude Code via
spec-kitty-runtime-next doctrine skill, Codex CLI via SKILL.md governance context
injection). Remaining 9 surfaces tracked in #496 for a follow-on patch or Phase 5.
Issue #534: spec-kitty explain explicitly deferred to Phase 5 (requires DRG
glossary addressability, issue #499). A partial implementation without glossary
citations would be misleading.