KitTools Release Notes
Changelog and release history for the KitTools Claude Code plugin.
On this page
- 2.4.3 — 2026-05-22
- Added
- Changed
- Fixed
- 2.4.2 — 2026-04-24
- Added
- 2.4.1 — 2026-04-24
- Fixed
- Added
- Changed
- 2.4.0 — 2026-04-17
- Added
- Changed
- Fixed
- Removed
- 2.3.1 — 2026-04-10
- Fixed
- 2.3.0 — 2026-04-06
- Added
- Fixed
- Changed
- 2.2.2 — 2026-04-04
- Added
- Changed
- 2.2.1 — 2026-04-03
- Changed
- 2.2.0 — 2026-04-01
- Added
- Changed
- Removed
- 2.1.4 — 2026-03-18
- Fixed
- 2.1.3 — 2026-03-13
- Changed
- 2.1.2 — 2026-03-11
- Added
- Changed
- 2.1.1 — 2026-03-07
- Fixed
- Changed
- 2.1.0 — 2025-03-04
- Added
- Changed
- Removed
- 2.0.0 — 2026-03-01
- Breaking Changes
- Changed
- 1.6.6 — 2026-03-01
- Added
- Fixed
- Changed
- 1.6.5 — 2026-02-26
- Fixed
- Changed
- Added
- 1.6.4 — 2026-02-23
- Added
- Changed
- 1.6.3 — 2026-02-23
- Fixed
- 1.6.2 — 2026-02-23
- Added
- 1.6.1 — 2026-02-23
- Fixed
- 1.6.0 — 2026-02-22
- Added
- Changed
- Deprecated
- 1.5.4 — 2026-02-19
- Fixed
- 1.5.3 — 2026-02-09
- Added
- Fixed
- Changed
- 1.5.2 — 2026-02-07
- Added
- Changed
- Removed
- 1.5.1 — 2026-02-06
- Added
- Fixed
- 1.5.0 — 2026-02-06
- Added
- Changed
- Removed
- 1.4.0 — 2025-02-02
- Added
- Changed
- 1.3.0 — 2025-02-01
- Added
- Changed
- 1.1.0 — 2025-01-28
- Added
- Changed
- 1.0.0 — 2025-01-27
- Added
#2.4.3 — 2026-05-22
#Added
- Codebase fit reviewer — New review dimension for
/kit-tools:validate-epic. Deeply explores the actual codebase to verify implementation hints, find missed reuse opportunities, check pattern conformance, and identify duplication risks. Every finding includes file paths and function names grounded in real code exploration. - Signal feedback hook — New
harvest_signalsStop hook silently captures skill telemetry from KitTools artifacts for retrospective analysis of skill performance across projects.
#Changed
- Validate-epic runs reviewers in parallel — All five reviewers now spawn concurrently instead of sequentially, with consolidated finding presentation and selective re-run of individual reviewers after spec updates.
- Epic pause behavior is mode-dependent — Autonomous and guarded modes now run continuously between specs. Supervised mode pauses between specs for user review. Previously, the default caused unintended pauses in autonomous/guarded execution.
#Fixed
- Autonomous/guarded epic pausing between specs — Example config and skill documentation hardcoded pause-between-specs as enabled, causing agents to set it regardless of execution mode.
#2.4.2 — 2026-04-24
#Added
- KitTools commit signing — Commits created by KitTools agents during orchestration now include a
Co-Authored-By: KitTools + Claudetrailer, making it easy to identify KitTools-originated commits in your git history.
#2.4.1 — 2026-04-24
#Fixed
- Dirty-tree self-block on resume — The orchestrator writes a run header to
EXECUTION_LOG.mdafter the clean-worktree check passes, but before story execution begins. If the orchestrator then crashes, the log is left dirty and every subsequent relaunch fails. Fixed by committing the log header immediately after writing it, closing the dirty-tree window.
#Added
- Hybrid model escalation — New
escalationmodel role (defaults to Opus). On retry for specs markedsize: Lorsize: XL, the implementation session upgrades from Sonnet to the escalation model. First attempt is always Sonnet — cheap exploration that produces learnings. Retry gets Opus for stories where the context is too large for Sonnet to process within the timeout. size:frontmatter field — Feature specs can now declaresize: S | M | L | XLto control session timeouts and model escalation on retry.
#Changed
- Story sizing raised to 5–7 criteria — Sweet spot raised from 3–5 to 5–7 acceptance criteria per story. The old 3–5 range caused planners to drop criteria to fit, producing under-specified stories. The new guidance: more stories with well-defined criteria is always better than fewer stories with compressed scope.
- Story-quality-reviewer hard ceilings — Two new critical (execution-blocking) triggers: more than 10 acceptance criteria, or spanning 3+ architectural layers. Previously all oversized stories were warnings, meaning they could proceed to execution and time out.
- Plan-epic sizing step — The final scope check now includes guidance for setting
size:frontmatter based on spec complexity.
#2.4.0 — 2026-04-17
Foundation refactor. 2.4.0 is a deep audit of the plugin: hardening, architectural cleanup, cross-agent consistency, and a full decomposition of the orchestrator. The user-visible workflow is unchanged — every skill you invoke still behaves the same — but the internals are substantially more robust and easier to extend.
#Added
- Model configurability — Per-run model selection for orchestrator-spawned sessions. Defaults: Sonnet for implementation (cost-optimized for bulk generation), Opus for verification (quality gate), Opus for post-execution validation. Override per-run via a
model_configblock in the execution config, or pick a preset at launch via/kit-tools:execute-epic. - Unified finding schema for review agents — Every review agent (
code-quality-validator,security-reviewer,feature-compliance-reviewer,drift-detector,template-validator, and all spec/vision reviewers) now emits findings in one canonical JSON shape. Skills parse a single format instead of three text-block dialects, which makes adding new review dimensions a lot easier. - Feature spec frontmatter schema doc — New
templates/specs/SCHEMA.mddocuments every valid field onfeature-*.mdandepic-*.mdfiles, with validation rules and examples for standalone, epic-child, and epic-final cases. - Vision review split — The single
vision-revieweragent (which had three different modes baked into one prompt) is now three focused agents:vision-completionist-reviewer,vision-feasibility-reviewer, andvision-readiness-reviewer. Each has one clear job and one output shape. - Structured event logging — Orchestrator now writes a machine-readable JSONL event log (
kit_tools/.execution-events.jsonl) alongside the human-readable stdout stream. Instrumented at critical failure sites. Post-mortem debugging withjqjust works now. - EXECUTION_LOG.md rotation — The execution log now rotates past 5 MB, keeping one
.1backup. No more unbounded log growth across resumed runs. - Clean-worktree precondition — The orchestrator now checks for a clean git worktree before creating any branches. If the tree is dirty or the directory isn’t a git repo, you get a clear error up front instead of a confusing failure deep in branch creation.
- Git recovery detection — When
git merge --abortorgit revertleaves the repo stuck inMERGING/REVERTING/CHERRY-PICKING/REBASINGstate, the orchestrator now detects it and escalates with specific remediation guidance rather than blindly retrying. Usesgit rev-parse --git-dirso it works inside linked worktrees too. - State schema versioning —
.execution-state.jsoncarries aschema_versionfield. Newer-than-supported state aborts with a clear message; older-than-current is tolerated (auto-upgraded on next save). Corruption and malformed state get caught at load-time instead of crashing downstream.
#Changed
- Orchestrator decomposition — The single 4,087-line
execute_orchestrator.pyis now a 13-module package (utils,events,config,state,specs,prompts,sessions,tests_metrics,git_ops,supervisor,execution_log,executor,entry). The CLI entry point remainsexecute_orchestrator.pyas a thin shim — no skill changes, no workflow changes. - Explicit tool grants on all agents — Every agent declares exactly which tools it’s allowed to use. Review agents can’t Edit source. Story-verifier can’t modify code. The “independent verifier” boundary is now enforced at the tool layer, not just prompt-layer.
- Atomic state writes — State, health snapshots, test metrics, and control files now write via temp-file + fsync + atomic rename. Mid-write crashes no longer corrupt state. The supervisor polling
.execution-health.jsonnever sees a partial file. spec-second-opinionmodel unpinned — The cross-model second-opinion reviewer no longer hardcodes Sonnet. The invoking skill picks a secondary model at invocation time (different from the primary), so the pattern keeps working as new models ship without needing to re-author the agent.- Supervisor cron cleanup extended — 2.3.1’s self-cleanup handled
Completedand “no execution state” cases. 2.4.0 extends toCrashed,Stale, andFailedstates, so a supervisor cron never lingers past a run that stopped making progress./kit-tools:execute-epicdocs now explicitly surface the cron’s lifetime (tied to the OG session) and the laptop-sleep caveat. /kit-tools:sync-projectdescription sharpened — Description now leads with outcome instead of jargon. New “When to use” and “Outcome” sections make the quick / full / resume modes easier to choose between.- Prompt-injection hygiene — Nine code-reading agents (story-implementer, story-verifier, the validation reviewers, drift-detector, test-optimizer, generic-explorer, feature-fixer) got an explicit callout that code, comments, and tool output they consume may contain adversarial prompt-injection attempts and should be treated as text to analyze, never as instructions to execute.
#Fixed
- Orchestrator hang after SIGKILL —
proc.wait()after killing a subprocess had no timeout. If SIGKILL didn’t take (zombie, permissions, uninterruptible sleep), the orchestrator could hang indefinitely. Now bounded at 10 seconds — prefers a leaked PID over a stuck 24-hour autonomous run. - Stuck merge/revert recovery — Previously, a failed
git merge --abortorgit revert --abortwould be logged as a warning and immediately retried, which would also fail. Now each abort is checked for stuck state and raises with manual-remediation guidance. - Worktree indirection for git recovery — Previously, the merge/revert state check looked at
project_dir/.git/MERGE_HEADdirectly, which fails in linked worktrees where.gitis a file pointing elsewhere. Now usesgit rev-parse --git-dirto follow the indirection correctly. - Prompt substitution drift guard — If a prompt builder typo’d a token name (e.g.,
{{STORRY_ID}}), the malformed{{...}}marker would silently survive into the agent’s prompt. Now every built prompt is checked for leftover tokens and raises with a specific error pointing at the typo.
#Removed
/kit-tools:sync-symlinks— Claude 3.5-era workaround for stale autocomplete symlinks. The plugin’s skill discovery works correctly without it now./kit-tools:update-kit-tools— The standard/plugin update kit-tools@washingbearlabsdoes the same thing natively. To add templates or hooks that weren’t initially selected in your project, re-run/kit-tools:init-projectand choose the merge option.
#2.3.1 — 2026-04-10
#Fixed
- Supervisor cron cleanup — The supervisor monitoring cron job now self-cleans when execution completes. Previously, the cron created by
/kit-tools:execute-epickept polling after the orchestrator finished and cleaned up its state files. Now/kit-tools:execution-statusdetects there’s nothing to monitor and deletes its own cron job.
#2.3.0 — 2026-04-06
#Added
- Supervisor monitoring mode — New
--monitoroption for autonomous and guarded execution. When enabled, the launching Claude session stays active as a supervisor, checking orchestrator health every 30 minutes. The supervisor can detect crashes, split oversized stories, pause on repeated failures, and restart the orchestrator — all without requiring system-level permissions (communication happens through JSON files, not shell commands). - Story splitting — The supervisor can split stories that repeatedly fail due to scope. It writes full replacement story definitions (with proper US-NNN IDs) to a control file, and the orchestrator applies the split to the feature spec automatically.
- Graduated intervention — The supervisor follows an escalation path: observe retries → intervene after exhaustion → escalate to user if intervention fails. Prevents both premature intervention and runaway failure loops.
- 24-hour safety net — Orchestrator self-terminates after 24 hours with a critical notification.
- Health snapshots — Orchestrator writes health data (heartbeat, memory, PIDs, failure counts) after every story attempt. The supervisor reads these to assess health without running system commands.
- Test metrics tracking — New
kit_tools/testing/test-metrics.jsontracks per-file test pass/fail counts, durations, timeouts, and last run dates across orchestration runs. Portable JSON — no external dependencies. - Verifier:
tests_runresult field — The verifier now reports which test files it executed, their pass/fail status, and duration. Feeds into test metrics for identifying slow or flaky tests.
#Fixed
- Orchestrator: orphaned process cleanup — Claude sessions now kill their entire process group on normal exit, not just on timeout. Previously, child processes (pytest, vitest, node workers) spawned during sessions survived after the session completed, accumulating across stories in an epic and eventually exhausting system memory.
- Orchestrator: regression check process handling — Regression tests now run with proper process group isolation. Timeouts kill pytest and all its children instead of only the wrapper shell.
- Orchestrator: graceful process termination — Process groups are now terminated with SIGTERM first (with a grace period) before SIGKILL, allowing child processes to clean up.
- Orchestrator: tmux cleanup timeout —
kill_tmux_sessionnow has a timeout to prevent hanging if tmux is unresponsive.
#Changed
- Verifier: no more full-suite fallback — When targeted test detection finds no matches, the verifier now identifies and runs only relevant tests from the diff instead of falling back to the full test suite. Prevents multi-minute test runs in large codebases. Broader coverage is still enforced by the regression check and end-of-epic validation.
#2.2.2 — 2026-04-04
#Added
- New Skill:
/kit-tools:optimize-tests— Full test suite audit covering mapping completeness, stale test detection, coverage overlap, performance profiling, KitTools convention alignment, and suite verification. Run periodically to keep your test suite healthy as the codebase grows. - Orchestrator: intelligent retry system — Failed stories now receive structured retry context based on failure type (timeout, test failure, criteria mismatch). The orchestrator classifies failures automatically and tailors guidance for each retry attempt.
- Orchestrator: adaptive timeouts — Implementation and verification sessions use separate timeout budgets (900s/600s). Optional
size: S/M/L/XLin spec frontmatter scales timeouts for larger stories. - Orchestrator: pre-flight checks — Before each story, the orchestrator checks for oversized scope and test mapping gaps. Warnings are logged but don’t block execution.
- Orchestrator: cross-story regression detection — After merging a story, the orchestrator runs prior stories’ tests to catch regressions. If a regression is detected, the merge is reverted and execution halts with a notification.
- Orchestrator: learnings persistence — Execution learnings now persist across epics in a JSONL file. Future runs benefit from lessons learned in prior epics.
- Verifier: pass-with-warnings verdict — The verifier can now return a third verdict for non-blocking concerns (style, naming). Stories merge immediately; warnings accumulate for review during validation.
#Changed
- Orchestrator: smarter test targeting — Complete rewrite of test detection with tiered matching (T0: explicit mapping, T1: heuristic). Directory-scoped matching preferred over global search. Match caps prevent timeout-causing over-matching.
- Completionist reviewer — New “Integration & Wiring Completeness” dimension checks for UI gaps, unwired artifacts, missing cross-layer connections, and scope narrowness.
- Story quality reviewer — New anti-pattern detection (vague verbs, compound criteria) and story ordering checks.
#2.2.1 — 2026-04-03
#Changed
/kit-tools:validate-feature→/kit-tools:validate-implementation— Renamed to better reflect that this skill validates the implementation (code on a branch), not the feature spec itself. No behavioral changes./kit-tools:complete-feature→/kit-tools:complete-implementation— Renamed for consistency with the epic-forward workflow. No behavioral changes.- All cross-references updated across skills, agents, hooks, orchestrator, templates, and documentation.
#2.2.0 — 2026-04-01
#Added
-
Pre-execution validation (
/kit-tools:validate-epic) — Quality gate between planning and execution. Runs four sequential agent reviews on every feature spec in an epic before coding starts:- Completionist reviewer — Missing stories, uncovered goals, flow gaps
- Story quality reviewer — Story sizing, ID format, vague criteria, integration scope
- Salty engineer reviewer — Adversarial GAN-style review for implementation traps, hand-waving, and deployment risks
- Second opinion (Sonnet) — Cross-model review using a different AI model to evaluate architecture decisions, feasibility, over-engineering, and alternative approaches. All alternatives require explicit trade-off statements.
- Interactive: revise specs and re-run reviews between agents. Produces a go/no-go readiness verdict.
-
Epic-first planning (
/kit-tools:plan-epic) — All work is now structured as an epic, even single-spec features. Replaces the old binary “epic detection” gate with a scope assessment that determines how many feature specs are needed. Always generates anepic-*.mdwrapper alongside feature specs. -
Desktop notifications for autonomous execution — The orchestrator now sends OS-level notifications (macOS and Linux) on story failures, execution completion, crashes, and pauses. No more discovering failures hours after they happen.
-
READ_ME.html— Single-file HTML5 documentation page with an interactive 8-phase workflow flowchart, skills grid, hooks table, and install guide.
#Changed
/kit-tools:execute-epic(formerlyexecute-feature) — Epic-first entry point: selects the epic fromepic-*.mdfiles, derives execution order from the Decomposition table./kit-tools:complete-implementation— Enhanced learnings capture: gotchas go to GOTCHAS.md, conventions to CONVENTIONS.md, spec-writing notes to Implementation Notes. Context-aware next steps guide to the next epic or feature.- Workflow handoffs improved —
seed-project,start-session, andcomplete-implementationnow include clear next-step guidance, closing the workflow loop from init through completion and back to planning. - Repositioned from “documentation framework” to “framework for AI-assisted development” — Updated across all repos and documentation.
#Removed
/kit-tools:plan-feature— Replaced by/kit-tools:plan-epic/kit-tools:execute-feature— Replaced by/kit-tools:execute-epic/kit-tools:migrate— v1.x → v2.0 migration no longer supported as a dedicated skill
#2.1.4 — 2026-03-18
#Fixed
- Orphaned subprocess cleanup on timeout — Timed-out execution sessions now kill the entire process group (claude + all child processes like pytest, node, etc.) instead of just the direct child. Previously, orphaned test runners would accumulate and consume CPU indefinitely after session timeouts.
#2.1.3 — 2026-03-13
#Changed
- Smart test scoping — Story verification now runs only related tests instead of the full suite. Tests are matched by naming convention (e.g.,
foo.py→test_foo.py) or explicit test mappings in your project’sTESTING_GUIDE.md. The full suite runs only at the validate-implementation gate. - Test output control — Quiet flags suppress per-test PASSED noise while preserving full failure tracebacks and assertion diffs. A safety-net output cap prevents runaway output without hiding failure details.
#2.1.2 — 2026-03-11
#Added
- Inline diff for verifier — The verifier agent now receives the full diff content inline (up to 20KB) instead of reading files one-by-one via tool calls. Large diffs are truncated with a stat summary and the verifier falls back to reading full files.
- Fail-fast test flags — Verification test commands now include fail-fast flags for known runners (pytest
-x, jest--bail, vitest--bail 1), stopping at the first failure instead of running the full suite. The full suite is preserved for validate-implementation. - Completion strategy — Choose how execution finishes with a new
completion_strategyoption:- Create PR (default) — Pushes branch and creates a GitHub PR via
gh - Merge to main — Auto-merges to main (blocked if validation finds critical issues, falls back to PR)
- None — Leaves branch as-is for manual handling
- The orchestrator now handles completion directly instead of spawning a separate Claude session
- Create PR (default) — Pushes branch and creates a GitHub PR via
#Changed
- Verifier diff accuracy — Diffs use explicit commit-based two-dot syntax instead of merge-base, eliminating ambiguity in multi-commit scenarios
- Verifier workflow — Review step updated to start from the inline diff, using the Read tool only when more context is needed
- Epic completion — Epics now complete via the same completion strategy instead of spawning a separate completion session
/kit-tools:execute-feature— New Step 2b prompts for completion strategy; pre-flight checks verifyghauth when PR strategy is selected
#2.1.1 — 2026-03-07
#Fixed
- Epic automation state mismatch — Fixed a crash when running epic execution in autonomous mode. The skill was pre-creating state with the wrong schema; now the orchestrator handles state creation for both single-spec and epic modes.
- Orchestrator crash resilience — Crash handler now registers before config load; leaked attempt branches are cleaned up on startup; archive operations are atomic (write-then-delete instead of modify-then-move)
- Agent output parsing — Orchestrator now handles common LLM output quirks: markdown code fences around JSON, preamble text, and trailing commas
- Verification session errors — Session errors are now checked before reading result files, preventing stale result reads
- Scratchpad hook feedback — Scratchpad creation failures are now reported instead of silently swallowed
- Placeholder validation accuracy — Tightened patterns to stop flagging legitimate markdown like
[note]or[example]as unfilled placeholders - Manifest completeness — Added missing templates to both SEED_MANIFEST and SYNC_MANIFEST
#Changed
- Execute feature skill — State initialization now defers to the orchestrator for autonomous/guarded modes, preventing schema mismatches
- Execution status skill — Token estimate display handles missing data gracefully
- Story quality pre-flight — Execute-feature now checks story quality before launching (flags vague criteria, under-specified stories)
- Learnings cap — Per-story learnings capped at 20 at write time to prevent state file bloat
- Template versions — All 30 templates normalized to version 2.0.0
- Network retry clarity — Rewrote session retry logic for clearer error categorization
- Dead code cleanup — Removed unused functions and imports from orchestrator
#2.1.0 — 2025-03-04
#Added
- New Skill:
/kit-tools:create-vision— Interactive product vision definition with AI-assisted review- Guided conversation captures your vision, target users, value proposition, success criteria, and feature areas
- Two-pass review: completeness scoring across 6 dimensions, then feasibility assessment
- Surfaces gaps and suggestions between rounds for iterative refinement
- Produces
kit_tools/PRODUCT_VISION.md— one strategic document per project
- New Template:
PRODUCT_VISION.md— Singular root-level strategic document replacing Product Briefs- Sections: Vision Statement, Target Users & Personas, Value Proposition, Success Criteria, High-Level Feature Areas, Constraints & Assumptions, Open Questions
#Changed
/kit-tools:plan-feature— Now checks for Product Vision instead of Product Briefs- Reads vision doc for strategic context when planning features
- Step 12 updates both
BACKLOG.mdandMILESTONES.mdwith priority confirmation - Feature specs use
vision_ref:instead ofbrief:frontmatter
/kit-tools:init-project— Recommended workflow updated: init → seed → create-vision → plan-feature/kit-tools:migrate— New vision/brief migration steps: creates blank vision doc if missing, flags legacy briefs for review, checks v2.0 completeness- Feature Spec and Epic templates —
brief:field replaced withvision_ref:(references a section in PRODUCT_VISION.md)
#Removed
- Product Brief template (
PRODUCT_BRIEF.md) — Replaced by Product Vision
#2.0.0 — 2026-03-01
#Breaking Changes
kit_tools/prd/→kit_tools/specs/— The feature specs directory has been renamed. All internal paths, config keys, state keys, and agent tokens updated to match.- Run
/kit-tools:migrateto update existing projects automatically - Config keys renamed:
prd_path→spec_path,epic_prds→epic_specs - State keys renamed:
prd→spec,prds→specs,current_prd→current_spec
- Run
#Changed
/kit-tools:migraterewritten — Now handles v1.x → v2.0 migration: directory rename, file renames (prd-*.md→feature-*.md), config/state key migration, hook path updates, and documentation path sweep. All steps are idempotent — safe to run multiple times.- Backwards compatibility preserved —
detect_phase_completionhook checks bothkit_tools/specs/andkit_tools/prd/paths. Archive dependency lookups check bothfeature-*.mdandprd-*.mdpatterns.
#1.6.6 — 2026-03-01
#Added
- PRD Compliance Agent — PRD compliance review is now a dedicated subagent (
prd-compliance-reviewer) that runs in parallel with code quality and security reviews during feature validation. Previously this ran inline in the validation session, consuming context window. - Diff summarization — Large branch diffs are automatically truncated per-file (60KB budget) before being passed to validator agents. Agents are instructed to read full files when they need more context.
- Prompt size guard — Implementation and verification prompts are automatically trimmed if they approach context limits (480K chars). Removes prior learnings and previous attempt diffs first, with a hard-truncate fallback.
- Result schema validation — Agent result files are now validated on read. Missing required fields (like
story_id,status,verdict) return clear errors instead of causing cryptic failures downstream.
#Fixed
- Permanent error handling — Context window and token limit errors are now classified as permanent and cause immediate failure with notification, instead of retrying indefinitely
- PRD checkbox scoping — Checkbox replacement now uses regex with line-start anchoring, preventing false positives when
- [ ]appears inside descriptions or hint text - Git operation visibility — All git operations now log warnings on failure instead of silently ignoring errors
- Pause timeout — Paused execution now auto-resumes after 24 hours with periodic log reminders, preventing indefinite hangs
#Changed
- Parallel validation — Feature validation Steps 3 (quality), 4 (security), and 5 (compliance) can now all run in parallel as independent subagents
#1.6.5 — 2026-02-26
#Fixed
- Nested session errors — The orchestrator now strips the
CLAUDECODEenvironment variable before spawningclaude -psubprocesses, eliminating the “cannot be launched inside another Claude Code session” error in autonomous/guarded mode - Cleanup on error exits — All orchestrator exit paths (Ctrl+C, max retries, dependency failures, crashes) now properly clean up tmux sessions, commit tracking files, and remove temporary result files
- Merge conflict handling — If merging an attempt branch into the feature branch fails, the orchestrator now aborts the merge and retries instead of silently marking the story as completed
- Result file cleanup — Temporary result files are now cleaned on all retry paths, preventing stale data from being misread on restart
- Hook robustness — All hooks now wrap file I/O in error handling to prevent tracebacks on encoding errors or permission issues
#Changed
- Notifications simplified — Removed macOS native notifications (
osascript). All execution progress is now reported through in-session notifications surfaced on your next prompt. No more context-switching to Notification Center. - tmux self-cleanup — The orchestrator now kills its own tmux session on completion. No more orphaned sessions lingering after execution finishes.
#Added
- Git health check —
/kit-tools:start-sessionnow checks branch state, uncommitted changes, stash, remote sync status, and recent commits before orienting. Issues are flagged with suggestions, but no actions are taken without your approval. - Plugin discoverability — Projects using KitTools now include an install hint in
SYNOPSIS.mdso new contributors can find and install the plugin.
#1.6.4 — 2026-02-23
#Added
- Execution Notification System — Two-pronged notifications keep you informed during autonomous/guarded execution
- macOS native alerts fire immediately on completions, failures, crashes, and pauses — no need to check manually
- In-session notifications via a
UserPromptSubmithook surface a batched summary the next time you send a message to Claude - Nine notification points cover the full execution lifecycle: story pass, story failure, single-PRD complete, validation pause, epic PRD complete, between-PRD pause, all epic PRDs complete, dependency blocked, and crash
- Crash detection — An
atexithandler detects unexpected orchestrator exits, sets state tocrashed, and sends both an OS alert and a file notification
- Crashed status in execution-status —
/kit-tools:execution-statusnow recognizes thecrashedstate with resume/reset actions (same options as stale state)
#Changed
- Distribution cleanup — Test files and dev dependencies removed from the shipped plugin. Only runtime files are included in installs.
#1.6.3 — 2026-02-23
#Fixed
- Unique tmux session names — Autonomous execution now uses descriptive, per-feature session names (
kit-exec-{feature}) instead of a single hardcoded name- Running multiple projects concurrently no longer risks killing each other’s tmux sessions
- Session names are stored in the execution config so
/kit-tools:execution-statuscan find the right session - Backwards compatible with older runs
#1.6.2 — 2026-02-23
#Added
- New Skill:
/kit-tools:execution-status— Check progress of autonomous execution from within Claude Code- Shows completion percentage, per-story status table, session stats (tokens, time elapsed)
- Detects stale state when the orchestrator has crashed or exited
- Offers contextual actions based on current state: pause, resume, attach to tmux, retry
- Epic mode: shows per-PRD progress table
#1.6.1 — 2026-02-23
#Fixed
- Autonomous execution launch — The orchestrator now launches in a detached tmux session instead of running in the background from within a Claude session
- Fixes nested
claude -pcalls being blocked by Claude Code’s recursion prevention - If tmux is not installed, a copy-pasteable command is printed for running in a separate terminal
- Pre-flight checks now verify tmux availability for autonomous/guarded modes
- Monitoring commands (attach, tail log, check state, pause) reported after launch
- Fixes nested
#1.6.0 — 2026-02-22
#Added
- Unit Test Suite — 75 tests for the execute orchestrator covering PRD parsing, story extraction, prompt building, and test command detection
- File-Based Agent Results — Agents write structured JSON result files (
.story-impl-result.json,.story-verify-result.json) instead of stdout parsing, eliminating ~33% false failure rate from LLM output formatting - Branch-per-Attempt Strategy — Each implementation attempt runs on a temporary branch; successful attempts merge, failed attempts are deleted cleanly (no more destructive
git reset) - Patch-Based Retry Context — Failed attempt diffs are included in retry prompts so the agent takes a different approach
- Token Estimation — Per-session input/output token tracking logged in execution state
- Auto-Detect Test Command — Automatically finds the project’s test runner by checking package.json, pyproject.toml, pytest.ini, Makefile, and TESTING_GUIDE.md
- Test Execution in Validation —
/kit-tools:validate-implementationnow runs the project’s test suite; failed tests are logged as critical findings - Auto-Injected Test Criteria —
/kit-tools:plan-featureautomatically adds “Tests written/updated” and “Full test suite passes” criteria to every code story (doc/config-only stories are exempt) - Implementation Hints — Per-story hints flow from planning to implementation, reducing agent exploration time
plan-featuregenerates hints during refinement (key files, patterns, gotchas)- Implementer agent receives hints as part of its prompt
- Pause on Critical Findings — Autonomous execution pauses when validation finds critical issues, creating a
.pause_executionfile referencing the findings. Resumes when the file is removed after review.
#Changed
- YAML Parsing — Replaced hand-rolled frontmatter parser with PyYAML for proper handling of lists, booleans, and edge cases
- Verifier Independence — Verifier agent receives git-sourced file lists (
git diff --name-only) instead of trusting implementer claims - Reference-Based Context — Agent prompts pass file paths instead of inlining full contents, reducing prompt size ~80% for large projects
- Skill Structure — Four pipeline skills (execute-feature, plan-feature, validate-implementation, complete-implementation) split into SKILL.md (core workflow) + REFERENCE.md (detailed formats and examples), reducing context consumption significantly
- PRD Template — Updated to v1.3.0 with Implementation Hints section and auto-injected test criteria
#Deprecated
- Stdout-based result parsing — Kept for backward compatibility but superseded by file-based JSON results
reset_to_commit()— Replaced by branch-per-attempt strategy
#1.5.4 — 2026-02-19
#Fixed
- Hook path resolution — Project-level hook commands now use
$CLAUDE_PROJECT_DIRinstead of relative paths- Previously, hooks used
python3 kit_tools/hooks/...which breaks if shell CWD drifts during a session - Now uses
python3 "$CLAUDE_PROJECT_DIR/kit_tools/hooks/..."— resolves correctly regardless of CWD - Fixes an infinite loop scenario where a Stop hook file-not-found error re-triggers the Stop event
- Existing projects: run
/kit-tools:update-kit-toolsto get the updated hook paths
- Previously, hooks used
#1.5.3 — 2026-02-09
#Added
- Epic Chaining — Multi-PRD epics now execute automatically on a shared
epic/[name]branch- PRD template gains
epic,epic_seq,epic_finalfrontmatter fields /kit-tools:execute-featuredetects epic PRDs and offers sequential execution- Orchestrator chains PRDs: stories -> validate -> tag checkpoint -> archive -> next PRD
- Hard dependency gate blocks execution if
depends_onPRDs aren’t archived - Git tags mark each PRD checkpoint (e.g.,
oauth/oauth-schema-complete) - Resume support: skips already-completed PRDs on restart
- Cross-PRD learnings carried forward to subsequent story prompts
- PRD template gains
- Pause Between PRDs — Option to review after each PRD before continuing the epic
- Recommended default for epic execution
- Epic-Aware Completion —
/kit-tools:complete-implementationhandles mid-epic and final-epic PRDs- Mid-epic: tag + archive only (no PR or artifact cleanup)
- Final epic PRD: PR references all PRDs and checkpoint tags
#Fixed
- Verifier output parsing — Strips markdown code fences before parsing, fixing ~33% false failure rate when the verifier wraps output in triple backticks
- Fallback verdict detection scans for pass/fail signals when the structured block is missing
- Raw output logged on parse failure for diagnosis
- Verification-only retry — When implementation succeeded but verifier parsing failed, retries now skip re-implementation and only re-run verification
- Failure detail sanitization — Log entries no longer contain raw template content from session errors
- Verifier template — Now explicitly instructs the LLM to output the structured block as plain text, not inside code fences
#Changed
- Orchestrator — Refactored into
run_single_prd()andrun_epic()with shared story execution loop /kit-tools:plan-feature— Epic decomposition now sets chaining fields (epic,epic_seq,epic_final)/kit-tools:execute-feature— Epic detection, dependency hard gate,epic/[name]branching,epic_prdsconfig format
#1.5.2 — 2026-02-07
#Added
- New Skill:
/kit-tools:validate-implementation— Full branch-level validation against PRD- Reviews entire branch diff (
git diff main...HEAD) — all changes across the feature - Three independent review passes: code quality, security, and PRD compliance
- Automatic fix loop (max 3 iterations) for critical findings
- Autonomous mode: spawns a fixer agent; supervised mode: fixes inline
- Reviews entire branch diff (
- Dedicated Security Review Agent — Security gets focused attention in its own review pass
- Covers injection vulns, auth gaps, secrets, input validation, insecure defaults, dependency risks
- Dedicated Fix Agent — Targeted fixes for validation findings in autonomous mode
- Automatic validation after execution — The orchestrator now spawns a validation session after all stories complete
#Changed
- Code quality validator — Narrowed to quality-only (security and intent alignment moved to dedicated agents)
/kit-tools:execute-feature— Completion messaging now directs to validate-implementation/kit-tools:complete-implementation— Now cleans up execution artifacts, handles feature branch (PR/merge), and references validate-implementation/kit-tools:close-sessionand/kit-tools:checkpoint— Use inline quality checks for session-level diffs instead of the full feature validationdetect_phase_completionhook — Only suggests validate-implementation when all PRD criteria are complete, not on every checkbox
#Removed
/kit-tools:validate-phase— Replaced by validate-implementation (branch-level validation)
#1.5.1 — 2026-02-06
#Added
- New Skill:
/kit-tools:sync-symlinks— Force-refresh skill symlinks after a plugin update- Reads
installed_plugins.jsonto find the correct install path - Useful when skills appear stale after
/plugin update
- Reads
#Fixed
sync_skill_symlinkshook — Now reads~/.claude/plugins/installed_plugins.jsonas the source of truth for the plugin install path- Fixes issue where skill symlinks remained pointed at the previous version after a plugin update
$CLAUDE_PLUGIN_ROOTcan be stale after updates; the hook now bypasses it in favor of the authoritative JSON
#1.5.0 — 2026-02-06
#Added
- Native Autonomous Execution —
/kit-tools:execute-featurereplaces the previous Ralph integration- Three execution modes: Supervised, Autonomous, and Guarded
- Supervised: in-session with user review between stories
- Autonomous: spawns independent
claude -psessions per story (unlimited retries by default) - Guarded: autonomous with human oversight on failures (3 retries default)
- Story Implementer Agent —
agents/story-implementer.mdimplements a single user story- Explores codebase, implements changes, self-verifies, commits
- Structured output format for orchestrator parsing
- Story Verifier Agent —
agents/story-verifier.mdindependently verifies acceptance criteria- Skeptical assessment — reads actual code, doesn’t trust implementer claims
- Runs typecheck/lint/tests as specified in criteria
- Execution Orchestrator —
scripts/execute_orchestrator.pymanages multi-session execution- Spawns fresh Claude sessions per story (implementation + verification)
- Pause/resume via
touch kit_tools/.pause_execution - Dual-track state: PRD checkboxes + JSON sidecar
- Execution log at
kit_tools/EXECUTION_LOG.md
- Git Branch Isolation — All execution happens on
feature/[prd-name]branches- Failed retries reset working tree, never touch main
- Branch ready for user review when all stories complete
#Changed
- PRD Template —
ralph_readyfield renamed tosession_ready /kit-tools:plan-feature— Removed Ralph references, usessession_readyandexecute-feature/kit-tools:complete-implementation— Removed Ralph cleanup step, updated Related Skills
#Removed
/kit-tools:export-ralph— Replaced by nativeexecute-feature/kit-tools:import-learnings— Learnings captured natively during execution
#1.4.0 — 2025-02-02
#Added
- Epic Detection & Decomposition —
/kit-tools:plan-featurenow detects large features and decomposes them- Automatic detection of epic-sized scope (>7 stories, multiple subsystems, scope keywords)
- Proposes breakdown into multiple focused PRDs
- Tracks dependencies between related PRDs with
depends_onfield
- Ralph-Ready Validation —
/kit-tools:export-ralphvalidates PRD scope before export- Checks story count (target <=7), acceptance criteria count (target <=35)
- Soft warning with strong recommendation if PRD exceeds limits
- Suggests decomposition via
plan-featureif PRD is too large
- Senior Dev Persona — Skills now act as senior dev reviewers
- Push back on scope creep and poorly-scoped PRDs
- Ensure PRDs are set up for implementation success
#Changed
- PRD Template — Updated to v1.1.0 with new frontmatter fields
ralph_ready: true/false— Indicates if PRD is properly scopeddepends_on: []— Array of feature names this PRD depends on- Added session-fit guidelines in template comments
/kit-tools:plan-feature— Enhanced with scope validation- Final scope check before generating PRD
- Story count limits (5-7 ideal, 8+ triggers warning)
- Acceptance criteria limits (3-5 per story, <=35 total)
#1.3.0 — 2025-02-01
#Added
- PRD (Product Requirements Document) System — New workflow for feature planning
kit_tools/prd/directory for PRD files with YAML frontmatterkit_tools/prd/archive/for completed PRDs- PRD template with user stories (US-XXX), acceptance criteria, functional requirements (FR-X)
- New Skill:
/kit-tools:complete-implementation— Mark PRD as completed and archive it - New Skill:
/kit-tools:export-ralph— Convert KitTools PRD to ralph’s prd.json format - New Skill:
/kit-tools:import-learnings— Import ralph progress.txt learnings back to PRD
#Changed
/kit-tools:plan-feature— Now generates PRDs (prd-[name].md) instead ofFEATURE_TODO_*.md- User story format with acceptance criteria
- Functional requirements in FR-X format
- Implementation Notes section for capturing learnings
/kit-tools:start-session— Now checkskit_tools/prd/for active features/kit-tools:close-session— Prompts for Implementation Notes when working on a PRD/kit-tools:checkpoint— Captures learnings to active PRD’s Implementation Notes
#1.1.0 — 2025-01-28
#Added
- New Skill:
/kit-tools:validate-phase— Code quality, security, and intent alignment validation- Three-pass review: quality & conventions, security, intent alignment
- Findings written to persistent
AUDIT_FINDINGS.mdwith unique IDs and severity tracking
- New Agent:
code-quality-validator.md— Prompt template for the validation subagent - New Template:
AUDIT_FINDINGS.md— Persistent audit findings log- Status tracking (open / resolved / dismissed)
- Severity levels (critical / warning / info)
- New Hook:
detect_phase_completion— Advisory hook for TODO task completions
#Changed
/kit-tools:checkpoint— Added validation step for code changes/kit-tools:close-session— Added validation step/kit-tools:start-session— Reviews open audit findings
#1.0.0 — 2025-01-27
#Added
- Initial public release
- Core Skills: init-project, seed-project, migrate, start-session, close-session, checkpoint, plan-feature, sync-project, update-kit-tools
- Automation Hooks: create_scratchpad, update_doc_timestamps, remind_scratchpad_before_compact, remind_close_session
- Project Type Presets: API/Backend, Web App, Full Stack, CLI Tool, Library, Mobile, Custom
- 25+ Documentation Templates across Core, API, Ops, UI, and Patterns categories