Capitão Command Center — Plan

Part III — Data Model

8. Graph substrate

The data layer is fully specified in docs/architecture/data-layer.md. Summary of what v1.5 locks:

Core schema:

CREATE TABLE nodes (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  type            text NOT NULL,
  status          text,                                       -- projection column
  priority        text,                                       -- projection column
  occurred_at     timestamptz,                                -- projection column
  archived_at     timestamptz,                                -- soft delete
  props           jsonb NOT NULL DEFAULT '{}',
  tags            text[] DEFAULT '{}',
  full_text       tsvector,
  embedding       vector(512),                                -- voyage-3.5-lite output (ADR #82)
  producer_id     uuid NOT NULL REFERENCES nodes(id),         -- provenance
  owner_id        uuid REFERENCES nodes(id),
  external_source text,
  external_id     text,
  valid_during    tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
  created_at      timestamptz DEFAULT now(),
  updated_at      timestamptz DEFAULT now(),
  redaction_policy text DEFAULT 'none',
  confidence      real
);

CREATE TABLE edges (
  id           uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id      uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
  to_id        uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
  type         text NOT NULL,
  props        jsonb NOT NULL DEFAULT '{}',
  valid_during tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
  producer_id  uuid NOT NULL REFERENCES nodes(id),
  confidence   real,
  created_at   timestamptz DEFAULT now()
);

Bitemporal shadow tables (history_nodes, history_edges) record every mutation with op (insert/update/delete/archive), actor (from hq.actor GUC), reason (from hq.reason GUC), full prior row as JSONB.

Indexes: btree on id/external_id; gin on props (jsonb_path_ops), full_text, tags; hnsw on embedding; gist on valid_during; partial btree on (type, status), (type, priority), (type, occurred_at).

Extensions used: pgcrypto, pg_trgm, btree_gin, pg_stat_statements, pg_cron, AGE (Cypher fallback), pgvector, pgvectorscale, timescaledb-apache, pg_partman, pg_duckdb, pg_search, auto_explain.

9. Node types (complete catalog — 28 types)

CRM core:

Type Purpose
entity Person or organization (customers, partners, prospects, own ventures)
contact Individual contact; multi-entity-capable
engagement Commercial unit — one per proposed deal; stages discovery→proposal→contract→delivery (or partner/declined)

Execution hierarchy:

Type Purpose
project Execution wrapper; priority axis + status + tech stack
feature Discrete deliverable within a project; complexity + acceptance criteria
task Unit of work; dev_* + ops_* fields; Todoist mirror
deliverable Shipped artifact (URL, repo commit, document) — distinct from document

Communication:

Type Purpose
interaction Every conversation — email, WhatsApp, phone, meeting
conversation Session wrapper around agent runs, meetings, or email threads

Commercial flow:

Type Purpose
quote Proposal draft before engagement promotion
invoice Mirrored from TOConline
payment Settles invoice; partial/full/credit note
expense Charged to engagement

Signal & value layer (the v1.4b unlock):

Type Purpose
intent What a customer is asking for (ask_reply, ask_budget, ask_document, ask_development, ask_fix, ask_meeting, ask_decision, inform, confirm, approve, complain, churn_signal, expand_signal, thank)
expectation Accountability unit — what's owed, by whom, when; we_owe / they_owe / mutual
turn_state Per-conversation state: theirs / ours / third_party / blocked / closed

Knowledge & outputs:

Type Purpose
document File, PDF, transcript, invoice, contract
memo Synthesized output (WBR, briefing, proposal text)
kb_article Wiki mirror from ~/knowledge-base/wiki/
decision ADR; supersedable
risk / open_question Unresolved concerns affecting projects

Agent & governance:

Type Purpose
producer Intake registry — every data source is a first-class node
event Timeline marker (milestone / development / action / state_change)
agent_run One tool-calling session of an agent
agent_session Supervisor-spawned session; parent of agent_runs
agent_handover_memo Memory bridge between sessions
memory_entry Distilled fact, pattern, preference, anti_pattern, watch_item — with salience + decay
reasoning_trace LLM call audit (inputs, output, confidence, canary_id)
agent_incident Watchdog-detected issue
review Confidence-gated or destructive decision awaiting human/agent resolution
playbook Procedural memory (auto-proposed, canary-rolled, auto-demoted)
proposal Agent's proposed write; awaits dispatcher resolution
triage_decision Companion node capturing triage-proposed values vs. actual writes
alert Prometheus-fired alert
shopify_order Imported from Shopify

10. Edge types (complete catalog)

Provenance & participation:

Containment & composition:

Relationships:

Commercial:

Dependencies & dependencies semantics (v1.5 richer than v1.3.1):

Intent & expectation flow:

Agent & learning:

Multi-role assignment (DACI):

11. Producer registry (#26)

Every fact carries produced_by → producer. Five kinds:

Kind Example slugs Attribution
conversation_channel email:zoho-wilson, whatsapp:wilson-personal, meeting:zoom, phone:meo Per-channel ingest worker
agent_session claude-code:vps, claude-code:laptop, hermes:vps, langchain:vps Per-session; agent_run rolled up
project project:garq-pdm-consulta, project:capitao-command-center Semantic emission only (events, not raw)
external_system toconline:capitao, shopify:petvitaclub, todoist:wilson, github:wcapitao Per-system poller / webhook
internal_worker cc:view-renderer, cc:triage-worker, cc:agent:ventures:chief, cc:agent:ventures:operator, cc:agent:project:garq-pdm Self-referential

Each agent is a producer.kind='internal_worker' with a role prop (ventures:chief, ventures:operator, project:garq-pdm, etc.). This enables per-agent queries: "show me everything chief produced this week" is one WHERE producer.slug = 'cc:agent:ventures:chief'. Subagent attribution flows through the action ledger's parent_id (#75 §1).

12. Intent + expectation + turn_state — the value unlock

These three node types elevate the schema from "record of what happened" to "state of obligations and turns."

12.1 Intent

Every interaction produces 0..N intents via the enrichment worker.

intent
  kind                ask_reply | ask_budget | ask_document | ask_development
                      | ask_fix | ask_meeting | ask_decision | ask_intro
                      | inform | confirm | approve | complain
                      | churn_signal | expand_signal | thank
  urgency             blocker | impactful | nice
  explicit_due_at     timestamptz (when the sender stated a deadline)
  confidence          0.0-1.0 (enrichment confidence)
  evidence_span       text quote
  status              open | fulfilled | abandoned | superseded

  edges:
    extracted_from → interaction
    about          → entity | engagement | project | feature
    addressed_to   → contact
    fulfilled_by   → interaction | document | event | task

12.2 Expectation

Every promise — ours or theirs — is an expectation.

expectation
  kind                commitment_made | ask_received | sla | recurring_obligation
                      | deliverable_promised
  direction           we_owe | they_owe | mutual
  asked_at            timestamptz
  due_at              timestamptz (nullable)
  resolved_at         timestamptz (nullable)
  status              open | overdue | resolved | abandoned | superseded
  severity            blocker | impactful | nice
  description_md      short text
  sla_source          engagement | policy | explicit | derived

  edges:
    about           → entity | engagement | project | feature
    spawned_from    → intent | event | interaction
    owed_by         → contact | entity
    owed_to         → contact | entity
    fulfilled_by    → interaction | document | event | task
    supersedes      → expectation (when replaced)

Recurring obligations (monthly invoice, weekly standup, quarterly review) use kind='recurring_obligation' + RRULE. pg_cron materializes next instance as predecessor closes.

12.3 turn_state

Maintained per conversation by the enrichment worker.

turn_state (1:1 with conversation)
  state                 theirs | ours | third_party | blocked | closed
  last_turn_at          timestamptz
  last_turn_by          contact_id | agent_id
  turnaround_sla_hours  integer (inherited from engagement)
  overdue_at            timestamptz (computed: last_turn_at + sla when state='ours')

One query answers "who am I ignoring right now?":

SELECT conversation.id, entity.name, now() - last_turn_at AS waiting
  FROM turn_state JOIN conversations USING (conversation_id)
                  JOIN entities ON ...
 WHERE state = 'ours' AND overdue_at < now()
 ORDER BY overdue_at ASC;

13. Memory persistence contract (#63, amended by #71)

Every persistent agent's memory is three-layered. Full rationale in DECISIONS.md #71 and #63 (amended).

13.0 Three-layer model

  1. Static layer — per-agent .md files in the workspace (agents/<scope>/<slug>/CLAUDE.md, playbook.md, personality.md; project agents add customer-profile.md, domain-knowledge.md). Committed to git, rarely changes, token-budgeted (see §17.x). Loaded verbatim as static context at session start.
  2. Dynamic working layerMEMORY.md per agent. Anthropic-standard 25 KB cap. Written by the agent during a session via memory: project SDK frontmatter. Mirrored from the graph nightly by memory-tender. Not authoritative — a cache only.
  3. Long-term graph-backed layermemory_entry, agent_handover_memo, reasoning_trace, agent_session nodes (unchanged, see §13.1–§13.3 below). Authoritative source of truth.

Invariant C11': MEMORY.md is a cache. Agent code MUST NOT treat MEMORY.md as authoritative. Missing or stale files MUST be rebuilt by memory-tender from the graph. agent-watchdog (#64) enforces this: if MEMORY.md is absent or stale beyond decay_after_days on session open, a tender pass runs before any event is processed.

Context-budget alarm (from #59) lowered from 75% → 70% to compensate for the static layer.

Every persistent agent's long-term memory lives in the graph. Three node types form the contract:

13.1 agent_session

One per agent run lifecycle. Created on spawn; finalized on task-complete or rotation.

agent_session
  agent_id
  started_at / ended_at
  turns_taken
  cost_usd
  model_used
  trigger_kind            beat | outbox | dm
  task_summary_md         short narrative of what this session did
  snapshot_md             the handover memory snapshot
  reason_closed           from <task-complete reason="..."/>

  edges:
    produced_by → agent's producer node
    part_of     → conversation (if multi-session task)

13.2 memory_entry

Distilled facts the agent carries forward.

memory_entry
  kind                 fact | pattern | preference | anti_pattern | watch_item
  body_md              ≤500 chars, concise
  salience             0.0-1.0
  created_in_session   uuid
  last_validated_at    timestamptz
  decay_after_days     int (default: 30 facts, 90 patterns, 180 prefs, infinite anti_patterns)
  tags                 text[]

  edges:
    about         → entity | project | feature | engagement (optional)
    produced_by   → agent_producer
    validated_by  → reasoning_trace (when agent reconfirms in later session)
    superseded_by → memory_entry (when replaced)

13.3 agent_handover_memo

Written at session close.

agent_handover_memo
  body_md                200-400 words, human-readable
  open_tasks[]           uuid array
  open_subscriptions[]
  pending_proposals[]

  edges:
    produced_by → agent_producer
    part_of     → agent_session

13.4 Reconstruction header injected on fresh session

# Session context reconstruction

You are agent:<role> starting a fresh session after the previous one
closed with reason: "{{ last_session.reason_closed }}".

## Recent memory entries (salience ≥ 0.3, last 30 days)
{{ hq memory query --role <role> --limit 40 --min-salience 0.3 }}

## Last handover memo
{{ last_handover_memo.body_md }}

## Currently open tasks where you are the driver
{{ hq task list --driven-by agent:<role> --status in_progress }}

## Open proposals you emitted still pending
{{ hq proposal list --by-agent <role> --status pending }}

## Open subscriptions
{{ last_handover_memo.open_subscriptions }}

---
Current trigger: {{ current_trigger.summary }}
Proceed. For older context: `hq memory search --role <role> --query ...`

~2-8k tokens total. Bounded. Deterministic. Cheaper than replaying a full transcript.

13.5 Memory decay and reconciliation

Weekly memory-tender job (Haiku, ~$0.05/week). As of #71, gains a sync pass:

Keeps memory pool under ~200 active entries per agent — fits comfortably in the reconstruction header.

13.6 Invariants (enforced in code)

14. Bitemporal audit (#16)

Every mutation writes a history_* row via AFTER triggers.

CREATE TABLE history_nodes (
  history_id  bigserial PRIMARY KEY,
  id          uuid NOT NULL,
  op          text NOT NULL CHECK (op IN ('insert','update','delete','archive')),
  actor       text NOT NULL,               -- hq.actor GUC
  reason      text,                        -- hq.reason GUC
  recorded_at timestamptz DEFAULT now(),
  row         jsonb NOT NULL
) PARTITION BY RANGE (recorded_at);        -- monthly partitions

Actor convention:

Time-travel query: hq as-of <timestamp> describe <slug> calls public.nodes_as_of(timestamptz) which unions current nodes with history rows matching the window.

15. Materialized projections + gap views

Materialized views (refreshed nightly unless noted):

Gap views (new in v1.5 — the "what's missing" engine):

~20 deterministic gap views. Each agent queries its domain's gap view first, acts second.


Part IV — Agent Organization

16. Capitão Ventures team — two operational agents (#74)

The team is two always-on operational generalists, not a roster of specialists. They differ only in which side of the system they face.

Capitão Ventures team
├── chief     — Outward.  Customers, prospects, partners, Wilson.
└── operator  — Inward.   Data, code, files, finance, KB, the graph.

On-demand subagents (spawned in-process by either top agent)
├── chief    spawns: email-drafter, proposal-writer, customer-brief, pipeline-analyst, Explore
└── operator spawns: project-agent:<slug>, code-reviewer, test-engineer, debugger,
                     database-specialist, kb-ingest, Explore

Infrastructure layer (horizontal, unchanged from v1.7)
├── agent-supervisor — event routing, concurrency cap, RAM-aware spawning
├── agent-watchdog   — heartbeat, loop detection, budget, incident reporting
├── meta-watchdog    — watches the watchdog
└── memory-tender    — weekly memory + workspace reconciliation

16.1 Why two agents (not 10, not 3)

v1.7 specified 10 persistent agents differentiated by domain (account-manager, project-manager, sales-bd, …). v1.8 rejects that model on two grounds:

  1. Operational coherence beats specialization. A solo founder running 12 projects across 3 ventures needs an agent that knows everything about a thread; not 10 agents that each know one slice. The cookbook's "single agent with rich context" pattern beats the multi-agent split-brain pattern at this scale.
  2. Always-on presence is the load-bearing feature. Differentiated cron schedules (08:00 account-manager, 08:30 project-manager, …) are anti-presence. Two warm agents that respond in seconds beat ten cold agents that respond in minutes.

The chief / operator split exists for safety isolation: customer-facing speech (chief) runs separately from system-mutating action (operator), so a model regression in one workspace cannot accidentally compromise the other. Both can read the full graph; only operator can mutate it. Both can converse; only chief speaks outward.

16.2 Shared workspace skeleton (#71 amended)

Both agents inherit the same workspace shape:

agents/
├── _shared/
│   ├── CLAUDE.md             # mission + 5 invariants + voice rules         (≤  900 tok)
│   ├── ventures-index.md     # 3 ventures + 12 projects, 1 line             (≤  600 tok)
│   ├── customers-index.md    # 1 line per active engagement                 (≤  600 tok)
│   ├── peer-card.md          # how to reach the other agent (mailbox API)   (≤  300 tok)
│   ├── glossary.md           # node kinds, slugs, conventions               (≤  400 tok)
│   └── opus-triggers.md      # the mandatory-Opus list (#74 §2)             (≤  300 tok)
├── chief/
│   ├── agent.md              # frontmatter only (name, model, tools, memory, opus_triggers)
│   ├── CLAUDE.md             # role, voice, ownership, walk-throughs        (≤ 1800 tok)
│   ├── playbook.md           # standard operating procedures                (≤ 1500 tok)
│   ├── personality.md        # tone, style, customer-by-customer notes      (≤  900 tok)
│   └── MEMORY.md             # dynamic cache, decay-managed                 (≤ 1200 tok)
└── operator/
    ├── agent.md
    ├── CLAUDE.md, playbook.md, personality.md, MEMORY.md

The Anthropic memory tool (memory_20250818) mounts /memories/ on both agents read-write. The directory tree under /memories/ mirrors agents/_shared/ and agents/<agent>/ exactly, so workspace-as-source and memory-as-runtime stay byte-identical.

Project workspaces live at /memories/projects/<slug>/CLAUDE.md and are loaded on demand by operator when it spawns a project subagent (see §18.5). They are not loaded into either top agent's static context.

Token budget at spawn. chief ≈ 22 K tokens (≈11% of 200 K); operator ≈ 22 K. The pre-commit hook tools/check-agent-budget.py (cl100k tokenizer via tiktoken) enforces the per-file caps above and the per-agent total of 22 K. Hook failure blocks the commit; an --override-budget path requires explicit Wilson approval in the commit trailer.

16.3 Model policy (#74 §2)

Both agents share the same model policy. Default is Sonnet; Opus is mandatory when any of these triggers fire:

Trigger Model
Multi-step plan with ≥3 sequenced actions Opus 4.7
Customer-facing artifact (proposal, contract, brief, post-mortem) Opus 4.7
ADR drafting / decision-ledger entry Opus 4.7
Morning brief synthesis (chief, 07:00) Opus 4.7
Daily learning loop (operator, 22:00) Opus 4.7
Destructive-action 4-class review Opus 4.7
Routine email reply / task update / file edit Sonnet 4.6
Single-step lookup, classification, triage Haiku 4.5

Implementation. Each agent's agent.md declares an opus_triggers list. The runtime evaluates triggers in priority order before each turn and overrides the default model per turn via the SDK's query(options={"model": ...}) parameter. Trigger evaluation runs in <50 ms (regex + JSON predicates over the current event batch); cost is negligible.

16.4 Always-on lifecycle

Both agents run as systemd USER units under athena (#73 A3) with Restart=always. The supervisor (#68) holds an in-memory presence table; an agent is warm when its query loop is mid-task or within the 90-second post-task-complete cooldown.

State Definition Latency to first token
Warm Query loop alive, prompt cache hot <200 ms
Cooldown Within 90 s of <task-complete/>, prompt cache hot <200 ms
Cold systemd active, query loop dormant ~1.5 s
Stopped systemd inactive (manual or watchdog kill) ~3 s + restart cost

The supervisor's RAM-aware scheduler (#68) holds spawns when /proc/meminfo shows <600 MB available; under the 8 GB envelope (#66) this is a rare event because typical resident usage with both agents warm is 1.8–2.2 GB.

16.5 Inter-agent CLI mailbox (two-party, #71 amended)

Inter-agent communication collapses from N-party to two-party:

hq agent ask <peer> "<msg>"     # synchronous RPC, blocks for response (default 30 s, configurable)
hq agent send <peer> "<msg>"    # async FYI, no wait
hq agent reply <id> "<msg>"     # response to an outstanding ask
hq agent inbox                  # list unread messages
hq agent presence               # is the peer warm? returns {warm|cooldown|cold|stopped}

<peer> is exactly one of chief or operator. The supervisor enforces the peer set; unknown peers raise unknown_peer. The mailbox is implemented over ops.agent_inbox + Postgres LISTEN/NOTIFY (#71) — no Redis pub/sub, no external broker.

Typical handoff. chief receives an email asking for a project status update → hq agent ask operator "current state of guisoft ticketing dashboard?" → operator queries graph, returns a 3-paragraph summary → chief drafts the reply (Sonnet, or Opus if customer artifact threshold) → chief sends. Both halves of the handoff are persisted in ops.agent_actions (#75); the handoff itself is captured as two action rows linked by parent_id.

16.6 Cross-cutting protocols

17. chief — outward-facing operator

Workspace: agents/chief/. Mounted memory directory: /memories/agents/chief/.

17.1 Role and ownership

chief is the one and only outward-facing voice of Capitão Ventures. It owns every artifact a customer, prospect, partner, or Wilson would read.

Owns (full customer-outcome surface, per #74 amended by #77):

Does not own (system-level state — routes to operator):

The clean rule: chief owns customer-facing outcomes; operator owns system-level state. When in doubt, ask operator — the round-trip is cheap.

17.2 Triggers

Kind Value
Beat 0 7 * * * (07:00 morning brief, Opus, plan-mode)
Outbox topics email.received, email.thread.updated, entity.temperature_changed, engagement.stage_changed, proposal.draft_requested, customer.churn_signal, wilson.dm, prospect.created, interaction.overdue
Wilson inbox enabled (always)
Calendar webhook on event creation/cancellation if attendees include external contacts

17.3 Tools (#77 expanded surface)

SDK built-ins: Read, Grep, Glob, Bash (curated allow-list), Agent, WebSearch, WebFetch, Monitor, Edit, Write (NEW per #77 — scoped to customer-artifact paths via runtime path allow-list; writes outside the allow-list raise typed errors and route to operator).

Path allow-list for Edit/Write (enforced by runtime middleware):

Capitão registry tools (from src/tools/registry.ts, see §33):

MCP servers (in-process via create_sdk_mcp_server, #67 amended):

17.4 Subagents allowed (max 2 in-process + 1 project subagent, per #77)

Existing (v1.8):

New per #77 (load-bearing for deep work):

Subagents are loaded via the Agent tool from agents/chief/subagents/<name>.md.

17.5 Budget

Control Value
daily_usd_soft $2.50
daily_usd_hard $7.00
tokens_per_run_cap 200 000
max_turns_per_query 30
opus_turns_per_day_cap 12 (alarms at 10)

17.6 Permissions (#77 expanded)

17.7 Success metrics

Text-axis (artifact-level, from #75):

Trajectory-axis (procedure-level, from #76) — equally load-bearing:

Operational:

Capability-utilization (added per #77 §9):

17.8 Decision tree — act / spawn / ask (#77 §4)

When an inbound customer ask arrives, chief walks this tree before doing anything else. Plain-language version; the threshold "up to 2 sources" is a heuristic about context-window hygiene, explained immediately below.

Inbound customer ask arrives
│
├── Can chief answer by reading up to 2 small sources directly?
│   ├── Yes → chief reads inline; drafts.
│   └── No  → chief spawns `customer-deep-dive` with a focused question;
│             receives a structured brief; drafts on top of it.
│
├── Does the ask require changing an artifact?
│   ├── In-draft customer artifact (proposal/contract/brief still being built)
│   │     → chief edits the file directly via Edit/Write.
│   ├── Post-dispatch contract (already sent to the customer / signed)
│   │     → chief drafts an AMENDMENT (new file) + `request_approval` (cross-scope per #69 — legal binding).
│   └── Customer-driven change inside a project repo (e.g., copy fix on the marketing site)
│         → chief spawns `project-agent:<slug>` via `hq project run`.
│         (If the change is system-driven, not customer-driven, ask operator instead.)
│
└── Is any factual claim about graph state involved?
    (project status, scheduling, blockers, invoice state, who-said-what-when)
    → ALWAYS `hq agent ask operator` BEFORE drafting. No exceptions.
       Operator owns the graph; chief is not allowed to guess facts.

What counts as a "source". A source is one discrete chunk of context chief has to read to answer. Each of these counts as one: the inbound email thread (always source #1 by default), one entity record in the graph (hq describe entity:<…>), one project state file, one KB article, one contract/proposal file, one drive file, one past meeting transcript, one prior email thread (different from inbound), one external URL the customer linked.

Why the threshold. It's about where the reading happens:

Sources needed Inline cost Subagent cost Winner
1 ~3 s, ~2 K tokens added ~10 s, ~1 K tokens added Inline — subagent overhead doesn't pay off
2 ~6 s, ~5 K tokens added ~15 s, ~1 K tokens added Inline — barely; depends on source size
3+ ~15+ s, ~15-30 K tokens added (pollutes context) ~25 s, ~1.5 K tokens added Subagent — keeps chief's context clean for drafting

The threshold is heuristic, not a hard rule. Chief should err toward the subagent if individual sources are large (long PDFs, multi-page contracts) even at 2 sources, and toward inline if all sources are tiny (a single timeline + 1 KB article = ~500 words total).

Concrete examples — up to 2 sources, read inline: "What time is our meeting tomorrow?" (calendar = 1), "Did João reply about the SLA last week?" (thread + 1 timeline query = 2), "What's our standard response-time SLA?" (1 KB article). Three or more sources, spawn deep-dive: "Can you summarize where we are with Frama overall?" (engagement + last 5 interactions + project state + open tasks + KB = 5+), the contract clause example from §17.1 (contract + 2 amendment precedents + KB compliance article + prior threads + operator check = 5+), "What did we promise the customer in the kickoff vs. what's in the contract?" (transcript + contract + proposal + RFP = 4).

17.9 Workspace files

agents/chief/CLAUDE.md — role, voice, ownership, the decision tree (§17.8 mirrored), the operational walk-throughs (deep-dive synthesis, contract amendment, customer-driven project work, morning brief, escalation, proposal drafting), the hq examples find usage rules, the destructive-action gate language. agents/chief/playbook.md — standard procedures: how to triage an inbound email, how to draft a proposal, how to draft a morning brief, how to handle a customer escalation, how to handle a missed deadline. agents/chief/personality.md — voice rules, language defaults, per-customer style notes (Frama formal+brief, PetVitaClub warm+chatty, Garq technical+precise, …). agents/chief/MEMORY.md — dynamic cache, decay-managed by memory-tender. agents/chief/subagents/customer-deep-dive.md, agents/chief/subagents/kb-search.md — subagent definitions per #77 §3.

18. operator — inward-facing operator

Workspace: agents/operator/. Mounted memory directory: /memories/agents/operator/.

18.1 Role and ownership

operator is the one and only system-mutating actor in Capitão Ventures. It owns the graph, the codebase, the file system, the KB, and the action ledger itself.

Owns:

Does not own:

18.2 Triggers

Kind Value
Beat 0 22 * * * (22:00 daily learning loop, Opus, plan-mode); 0 3 * * * (03:00 nightly graph + ledger reconciliation)
Outbox topics producer.unmapped, event.unclassified, task.assigned_to.operator, task.assigned_to.project:*, proposal.kind=schema_change, proposal.kind=migration, feature.status_changed.*, kb.ingest.completed, finance.anomaly, agent_incident.created, wilson.dm
Wilson inbox enabled (always)
File watchers ~/capitao-knowledge-base/raw/, ~/capitao-command-center/proposals/ (newly emitted proposals from chief)

18.3 Tools

SDK built-ins: Read, Write, Edit, Bash (curated allow-list with broader scope than chief), Glob, Grep, Agent, Monitor, NotebookEdit.

Capitão registry tools (from src/tools/registry.ts, see §33):

MCP servers (in-process):

18.4 Subagents allowed (max 2 concurrent + 1 project subagent)

project-agent:<slug> (one at a time per project; loaded on demand from /memories/projects/<slug>/CLAUDE.md), code-reviewer, test-engineer, debugger, database-specialist, kb-ingest, Explore.

18.5 Budget

Control Value
daily_usd_soft $4.00
daily_usd_hard $10.00
tokens_per_run_cap 250 000
max_turns_per_query 50
opus_turns_per_day_cap 18 (alarms at 14)

18.6 Permissions

18.7 Success metrics

18.8 Workspace files

agents/operator/CLAUDE.md — role, ownership, the six operational walk-throughs (data-organization focus), the destructive-action gate, the action-ledger discipline. agents/operator/playbook.md — standard procedures: how to ingest a new producer, how to reconcile entities, how to spawn a project subagent, how to run the 22:00 learning loop, how to draft an ADR. agents/operator/personality.md — voice rules for internal artifacts (terse, citation-heavy, structured); how to write commit messages; ADR rhetoric. agents/operator/MEMORY.md — dynamic cache, decay-managed.

18.5 Project subagents — on-demand (#74 amends #53)

Project agents are no longer persistent. They are loaded on demand by operator via the SDK Agent tool, with a system prompt assembled from three markdown files at runtime.

18.5.1 Lifecycle

For every active project node with priority IN ('focus', 'now'):

When operator needs to act on a project, it calls:

hq project run <slug> "<task description>"

Sugar for Agent(subagent_type="project:<slug>", prompt="<task description>"). The Agent tool reads the workspace, composes the system prompt (/memories/_shared/CLAUDE.md + /memories/projects/<slug>/CLAUDE.md + /memories/projects/<slug>/MEMORY.md), runs to <task-complete/>, and exits.

Cold-start cost: ~2 seconds (no warm window). RAM peak: ~600 MB while running, freed on exit.

18.5.2 Workspace template

/memories/projects/_template/CLAUDE.md:

---
name: project:{{slug}}
description: Operator for project {{title}}. Reads assigned tasks, executes, reports, flags blockers.
parent: agent:ventures:operator
---

# Project agent — {{title}}

## Scope
This project only. {{description_md}}.

**MAY:** read/write own project repo (via worktrees), create/update tasks and features within this project, propose milestones/developments/actions as events, spawn Explore/code-reviewer/test-engineer/debugger subagents, flag blockers.

**MAY NOT:** touch other projects, write to entity nodes, sign off deliverables, commit to main without code-reviewer pass.

## Model policy
Inherits #74 §2 — Sonnet default; Opus on the mandatory triggers.

## Tools
Inherits operator's inward set + per-project additions per `customer-profile.md` declared `project_type`.

18.5.3 Initial roster (Wave 3)

Based on 2026-04-22 priorities (unchanged from v1.7):

Focus (hard cap 3):

Now (~6):

Other projects (arisilvahelenos, ferroembrasa, guisoft, safaa, personal, ti-milha) keep workspaces at /memories/projects/<slug>/ but are not loaded by operator until a triggering event arrives.

19. Infrastructure agents

19.1 agent-supervisor (#68 amended by #74)

19.2 agent-watchdog (#64)

19.3 meta-watchdog

19.4 memory-tender


Part V — Runtime & Infrastructure

20. Agent runtime

20.1 AgentRuntime class — shape and responsibilities

Python module at src/runtime/agent_runtime.py (~500 LOC total). One file; all agents share it. Per-agent behavior comes from markdown config + prompt, not from code.

class AgentRuntime:
    """
    Runs one agent for one trigger batch. Exits after <task-complete/>.
    Reloaded per spawn by the supervisor.
    """

    def __init__(self, role: str, config_path: str, events_stdin: list[dict]):
        self.role = role
        self.config = MarkdownConfigParser(config_path).parse()   # §17.3 format
        self.events = events_stdin
        self.session_id = None
        self.store = PostgresSessionStore(os.environ["HQ_DB_URL"])
        self.budget = CostBudget.from_config(self.config.budget)

    async def run_once(self) -> int:
        """Entrypoint. Returns exit code (0=ok, 1=budget, 2=error, 3=watchdog-killed)."""
        await self.store.connect()
        self.session_id = await self.store.get_or_create_session(self.role)

        if self.store.is_fresh_session(self.session_id):
            reconstruction = await self._build_reconstruction_header()
        else:
            reconstruction = ""        # resumed session still has context

        prompt = reconstruction + self._render_event_batch(self.events)

        try:
            async for message in query(
                prompt=prompt,
                options=self._build_sdk_options()
            ):
                await self._on_message(message)
                if self._detect_task_complete(message):
                    await self._finalize_session(message)
                    return 0
        except BudgetExceeded:
            await self._freeze_self()
            return 1
        except KeyboardInterrupt:           # SIGTERM from watchdog
            await self._partial_finalize()
            return 3

        return 2                            # fell off without task-complete

Full implementation spec is a Wave 1 artifact (§49).

20.2 PostgresSessionStore — see ADR #72

Canonical schema and adapter live in DECISIONS.md #72 (Anthropic PostgresSessionStore reference port; storage table ops.agent_sessions(id BIGSERIAL, key TEXT, entries JSONB, created_at TIMESTAMPTZ DEFAULT now()) indexed on (key, id); CI conformance gate via claude_agent_sdk.testing.run_session_store_conformance(...); local-disk primary at /var/cache/capitao/sessions, Postgres mirror async + best-effort; cold-start restore order pg_restore → disk_restore → fresh). The earlier hand-rolled schema in this section was superseded by #72 in v1.7 and removed in v1.8.

20.3 Markdown config parser

~80 lines of Python. Reads the agent's .md file; extracts:

21. Task-complete lifecycle (#62)

21.1 The sentinel

Every agent prompt contains:

When you have truly finished your current unit of work AND are not waiting on any tool result, subagent, review decision, Wilson input, or other agent's proposal — emit on its own line:

<task-complete reason="..."/>

Only emit when truly done. If waiting for anything, stay in the turn.

21.2 Stop hook

async def _detect_task_complete(self, message) -> bool:
    """Scan final message for the sentinel."""
    for block in getattr(message, "content", []):
        if getattr(block, "type", None) == "text":
            if "<task-complete" in block.text:
                self.task_complete_reason = self._extract_reason(block.text)
                return True
    return False

async def _finalize_session(self, final_message):
    # 1. Ask SDK for a compact memory snapshot turn
    snapshot = await self._request_memory_snapshot()

    # 2. Write agent_session node
    await self.db.execute("INSERT INTO nodes (type, props, ...) VALUES ('agent_session', ...)")

    # 3. Parse snapshot into memory_entry nodes
    await self._persist_memory_entries(snapshot)

    # 4. Write agent_handover_memo
    await self._write_handover_memo(snapshot, final_message)

    # 5. Archive transcript
    await self.store.archive(self.session_id)

    # 6. Emit outbox event
    await self.db.execute("INSERT INTO ops.outbox (topic, payload) VALUES ('agent.session_closed', $1)", ...)

    # 7. Process exits (caller returns from run_once with 0)

21.3 Warm window (chief + operator only)

Both top-level agents have a 90-second warm window post-task-complete. Project subagents and on-demand worker subagents skip the warm window (full process exit per task).

if self.config.warm_window_seconds > 0:
    await self._wait_for_event_or_timeout(self.config.warm_window_seconds)
    if new_event_arrived:
        # Fresh session starts in same process
        self.session_id = None
        await self.run_once()           # recurse with new events
    else:
        return 0                         # exit process

Saves cold-start overhead during bursts. Default warm_window_seconds = 90 for chief and operator; 0 for everyone else.

21.4 Fallback lifecycles

Fallback Trigger Purpose
Auto-compaction Context > 75% of window In-place compaction; keep current task intact
Nightly rotation 03:00 local, still-live sessions Forced clean rollover with handover memo
Budget-cap rotation Hard cap hit Freeze + fresh session after un-freeze
Crash recovery systemd restart Resume from PostgresSessionStore

22. The agent-supervisor process

22.1 Implementation sketch

// cmd/agent-supervisor/main.go  (~200 LOC)
package main

import (
    "github.com/lib/pq"
    ...
)

type Supervisor struct {
    db          *sql.DB
    routing     map[string]AgentRoute    // topic -> agent role
    running     map[string]*exec.Cmd     // role -> process
    concurrency int                       // cap = 3
    mu          sync.Mutex
}

func (s *Supervisor) Listen() {
    listener := pq.NewListener(dsn, ...)
    for _, topic := range s.subscribedTopics() {
        listener.Listen(topic)
    }
    for notif := range listener.Notify {
        events := s.coalesce(notif)      // batch same-role events within 2s
        s.trySpawn(events)
    }
}

func (s *Supervisor) trySpawn(events []Event) {
    s.mu.Lock()
    defer s.mu.Unlock()

    role := events[0].Role
    if _, alreadyRunning := s.running[role]; alreadyRunning {
        s.enqueue(events)                // buffer; dispatched when current finishes
        return
    }
    if len(s.running) >= s.concurrency {
        s.enqueue(events)
        return
    }
    if !s.ramAvailable(600_000_000) {    // 600 MB free required
        s.enqueue(events)
        return
    }

    cmd := exec.Command("hq", "agent", "run", role)
    cmd.Stdin = strings.NewReader(eventsJSON(events))
    cmd.Start()
    s.running[role] = cmd

    go func() {
        cmd.Wait()
        s.mu.Lock()
        delete(s.running, role)
        s.drainBuffer()                  // dispatch queued events if slots free
        s.mu.Unlock()
    }()
}

Full implementation is Wave 1 artifact (§49.5).

23. Service matrix

23.1 Always-on services

Service Language RAM steady CPU Purpose
postgresql C 700-1000 MB burst Graph store, queue, analytics
valkey C 60-100 MB low Cache, pub/sub, rate-limiter
pgbouncer C 5-10 MB low Connection pooling :6432
agent-supervisor Go 20-30 MB low Event routing, concurrency
agent-watchdog Python 50-70 MB low Health checks
prometheus Go 100-150 MB low Metrics
next.js Node 180-220 MB burst Admin UI
caddy Go 20-30 MB low Reverse proxy, TLS, service wake-up
node_exporter Go 15-20 MB low OS metrics
postgres_exporter Go 20-30 MB low Postgres metrics
hq-exporter Node 35-45 MB low Custom metrics
ubuntu + systemd ~300 MB OS base

Always-on total: ~1.5-1.9 GB.

23.2 On-demand services

Service Trigger RAM when active Auto-shutdown
whisper-stt meeting-transcribe triggers ~1.5-2 GB on completion
grafana First /grafana/* request via Caddy ~130 MB 10 min idle
next.js admin (if idle-tuned further) First HTTP request ~180 MB configurable

Voyage 3.5-lite (ADR #82) is SaaS, not on-demand local — no RAM cost, no socket activation. Reached over HTTPS by embed-worker.

On-demand services run 0 MB when idle.

23.3 Always-on agents (chief + operator with 90 s warm window)

Agent RAM during warm window RAM during active query
chief ~280-400 MB (Python + SDK + workspace context) ~500-700 MB (with 1 subagent active)
operator ~280-400 MB ~600-850 MB (with project subagent or code-reviewer subagent active)
agent-watchdog ~50-70 MB ~70-100 MB (during SQL-heavy checks)
agent-supervisor ~25-35 MB (Go binary) same
ledger-flusher ~30-50 MB ~60-90 MB (during batch flush)

23.4 Ephemeral subagents (spawn-run-exit)

Subagent RAM while running Duration typical
Single in-process subagent (Explore, code-reviewer, email-drafter, …) ~150-280 MB on top of parent 10 s - 3 min
Two concurrent subagents (cap) ~300-560 MB on top of parent rare; heavy analysis
Project subagent with code tasks ~400-600 MB on top of operator minutes
Worker subagent (kb-ingest, finance-import) ~80-180 MB on top of operator seconds to minutes

24. Service tuning — locked day-0 flags

24.1 Postgres 17 (/etc/postgresql/17/main/postgresql.conf)

shared_buffers = 256MB
effective_cache_size = 2GB
work_mem = 8MB
maintenance_work_mem = 64MB
max_connections = 30
wal_buffers = 16MB
random_page_cost = 1.1
track_io_timing = on
jit = off
max_parallel_workers_per_gather = 2

Expected RSS: 700-1000 MB steady.

24.2 Valkey (/etc/valkey/valkey.conf)

maxmemory 96mb
maxmemory-policy allkeys-lru
save ""
appendonly no
tcp-keepalive 60

Expected RSS: 60-100 MB.

24.3 Prometheus flags

--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=800MB
--query.max-samples=5000000
--scrape.interval=30s

24.4 Next.js

NODE_OPTIONS="--max-old-space-size=200 --no-warnings"
NEXT_TELEMETRY_DISABLED=1

24.5 Caddy site blocks (/etc/caddy/Caddyfile)

Grafana wake-up handler (socket-activated; cold start on first request):

grafana.internal.capitao {
    @first_visit not header Cookie *grafana_session*
    handle @first_visit {
        exec systemctl start grafana.service
        respond "Starting Grafana, refresh in 2s..." 202
    }
    reverse_proxy localhost:3000
}

Command Center UI (subdomain, #78). Day-0 binds to Tailscale; F27 lock at Wave 2 may swap bind tailscale0 for a public posture (OAuth or IP allow-list):

command-center.capitao.consulting {
    bind tailscale0          # Wave 1: Tailscale-only. Removed on F27 lock.
    encode gzip zstd
    log {
        output file /var/log/caddy/command-center.log
        format json
    }
    @md query format=md
    handle @md {
        header Content-Type "text/markdown; charset=utf-8"
        reverse_proxy 127.0.0.1:3001
    }
    reverse_proxy 127.0.0.1:3001
}

The @md matcher implements #25's ?format=md symmetry: HTML and markdown share one upstream (the Next.js process) and the route handler decides which view to render. Drift detection: curl …/roadmap?format=md byte-equals state/roadmap.md (modulo whitespace).

24.6 systemd socket activation for Whisper

(TEI socket activation removed per ADR #82 — embedding inference moved off-host to Voyage 3.5-lite. Whisper stays local.)

# /etc/systemd/system/whisper-stt.socket
[Socket]
ListenStream=127.0.0.1:8210

[Install]
WantedBy=sockets.target

# /etc/systemd/system/whisper-stt.service
[Service]
ExecStart=/usr/local/bin/whisper-start-wrapper
EnvironmentFile=/etc/capitao/secrets.env

The wrapper script starts Whisper, keeps alive 5 min of idle, then stops. VOYAGE_API_KEY lives in the same secrets.env (mode 0600, athena:athena) and is loaded by embed-worker via EnvironmentFile= in its own systemd unit.

24.7 cgroup limits per agent

/etc/systemd/system/capitao-agent@.service.d/limits.conf:

[Service]
MemoryMax=900M
MemorySwapMax=400M
CPUQuota=200%

Protects the box from a runaway agent.

25. RAM budget — 8 GB envelope (#66)

25.1 Realistic usage over time

Scenario RAM % of 8 GB
Overnight (supervisor + watchdogs only) ~1.6 GB 20%
Normal business hours ~2.0-2.8 GB 25-35%
Busy afternoon (2 agents concurrent) ~3.0-3.5 GB 38-44%
3 agents + 1 subagent each (realistic peak) ~3.5-4.0 GB 44-50%
Ceiling: 3 agents × 2 subagents + Grafana ~4.4-4.7 GB 55-59%
+ Whisper transcribing (briefly allowed over) ~7.0-7.5 GB 88-94%

Headroom at typical load: 4-5 GB free for Postgres page cache, burst absorption, Grafana sessions. Page cache keeps search queries <40ms p95.

25.2 Supervisor RAM-aware rules

if available_ram < 600 MB:        hold new agent spawns; queue events
if available_ram < 400 MB:        kill concurrency cap to 1 until memory frees
if swap_used > 500 MB sustained:  alert Telegram + pause non-essential agents

26. Authentication — Max OAuth (#57)

26.1 Setup (one-time per 12 months)

# On the VPS, as capitao user:
claude setup-token

# Result: prints 1-year OAuth token.
# Store in /etc/capitao/agents.env (chmod 600, owner capitao:capitao):
#   CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...

26.2 systemd unit drop-in

[Service]
EnvironmentFile=/etc/capitao/agents.env
User=capitao

Applied to every agent service.

26.3 Token rotation watcher

A weekly cron job decodes the JWT, checks expires_at. If < 30 days remain, opens a review node asking Wilson to re-run claude setup-token. No silent expiry.

26.4 License compliance

Max OAuth is authorized by Anthropic for local development and personal automation. Capitão Command Center operates Capitão Ventures internally; it is not resold. Authorized use.

If Command Center ever becomes a SaaS product, switch to API-key authentication (ANTHROPIC_API_KEY). No code changes needed — SDK auto-detects.

27. Rate-limit and cost control

27.1 Plan-level limits

Max 20× plan: 5-hour rolling windows. With 2 always-on top agents + on-demand subagents (typically 1-2 active at a time during business hours), typical spend stays under 30% of plan cap. Bursts during customer incidents or heavy code work can hit 70%+.

27.2 Mitigations (built into runtime)

  1. Event-driven, not cron-driven. Both top agents wake on outbox events; the only fixed cron beats are 07:00 (chief brief) and 22:00 (operator loop). Burn rate scales with workload, not with the clock.
  2. Supervisor concurrency cap (§19.1) — 2 top agents + max 2 in-process subagents per top agent = 4 query loops total.
  3. Exponential backoff on 429 — Valkey-shared rate-limiter coordinates across both agents.
  4. Cost-aware demotion — if 7-day moving avg trends toward plan cap, the per-turn model picker demotes routine Sonnet turns to Haiku; Opus triggers (#74 §2) remain mandatory and are never demoted.
  5. Circuit breaker at 90% of plan — pause operator's 22:00 learning loop and any non-emergency project subagents; keep chief live for customer-facing work.
  6. Per-agent daily hard caps — agent freezes itself at its own cap, opens review via request_approval.
  7. hq autonomy freeze --reason "..." — manual emergency stop.

27.3 Cost telemetry

Every LLM call writes to ops.llm_call_log:

CREATE TABLE ops.llm_call_log (
  id           bigserial PRIMARY KEY,
  agent_id     text NOT NULL,
  session_id   uuid,
  trace_id     uuid REFERENCES nodes(id),
  model        text NOT NULL,
  tokens_in    int,
  tokens_out   int,
  cost_usd     numeric(8,4),
  latency_ms   int,
  canary_id    text,
  purpose      text,
  started_at   timestamptz DEFAULT now()
);
CREATE INDEX ON ops.llm_call_log (agent_id, started_at DESC);

Grafana panel per-agent-cost-24h + Prom gauge agent_run_cost_usd_24h{agent="..."}.


Part VI — Tools, Skills, and Surfaces

28. The hq CLI — canonical action surface

Form: hq <noun> <verb> [--filters] [--json | --text]

Exit codes: 0 ok / 1 user-error / 2 system-error / 3 not-found.

Universal flags: --json, --text (default), --actor='<string>', --reason='<string>'.

28.1 Reads (safe)

hq search <query>
hq describe <slug|uuid>
hq entity find --email|--phone
hq entity profile <slug>
hq engagement list --stage <stage>
hq project list --priority <focus|now|next|backlog>
hq task list --owner --priority
hq interaction list --entity --limit
hq timeline [--since] [--entity]
hq review list | show <id>
hq expectation list [--status] [--direction]
hq intent list [--kind] [--status]
hq gap list                         # all gap_* views
hq gap show <gap_name>
hq as-of <timestamp> describe <slug>
hq memory search --role <role> --query <terms>
hq trace {show,inputs,decided,replay,explain} <trace-id>
hq producer health [<slug>]
hq playbook {list,show}
hq proposal {list,show} --kind <kind>
hq autonomy status
hq watchdog status
hq agent {list,status,attach} [<role>]

28.2 Writes (produce outbox events)

hq entity create --name --kind person|org [--email] [--phone]
hq entity merge <loser> --into <winner>     # always reviews
hq interaction log --channel --from|to|cc --subject --body-file --thread-id
hq conversation create --kind --started-at [--participant]
hq engagement create --entity --stage --name --price [--maintenance-months]
hq engagement stage <slug> --to <stage>
hq project create --entity --name --slug [--lead] [--tech-stack]
hq feature create --project --title --slug [--complexity S|M|L|XL]
hq task create --title [--project|--feature] --priority --owner --due
hq task complete <slug>
hq task assign <slug> --to <contact>
hq event create --kind --about [--impact]
hq intent create --kind --about --extracted-from [--urgency] [--due-at]
hq expectation create --kind --direction --about --owed-by --owed-to [--due-at]
hq review resolve <id> --choice <opt-N> [--note]
hq review defer <id> [--until <ts>]
hq review dismiss <id> --reason
hq proposal propose --kind --evidence <json> [--actor <agent>]
hq proposal rollback <id>
hq autonomy freeze [--reason] [--until <ts>]
hq autonomy thaw
hq autonomy kill --loop <playbook|calibration>
hq playbook archive <slug> --reason

28.3 Agent control (new in v1.5)

hq agent run <role>                     # supervisor entrypoint; reads events from stdin
hq agent send <role> <message>          # DM an agent; tails response
hq agent attach <role>                  # live-tail transcript
hq agent pause <role> [--for <duration>]
hq agent resume <role>
hq agent restart <role> [--fresh-session]
hq agent handover <role>                # force session rotation now
hq agent status <role>

28.4 MCP fallback

hq mcp-serve                            # only enabled on hot paths; socket-activated

Not used in default config. Reserved for measured need.

29. Skills catalog

29.1 Mandatory skills (every agent)

Skill Grunt / state Purpose
caveman Full Token compression (internal reasoning + inter-agent writes)
caveman-compress installed Compresses long memory files
hq-actor-attribution always on Ensures hq.actor GUC set on every write
cost-budget-guard always on Enforces daily caps; aborts on overrun
session-distill always on Stop-hook: reads transcript, proposes events

29.2 Role-specific skills (catalog reference) — v1.8 collapsed

Full catalog in .skills/INDEX.md. The v1.7 per-role skill split (10 sets × 4 skills) collapses into two larger sets owned by the two top agents. Many skills still exist; ownership simplifies.

Agent Skills
chief entity-brief, interaction-log, draft-outreach, relationship-temperature, pipeline-report, proposal-draft, stage-advance, visitor-analytics-digest, renewal-watch, upsell-probe, nps-signal, action-now-render, morning-brief, weekly-digest, examples-find
operator project-health, roadmap-show, blocker-probe, scope-diff, search, adr-draft, dependency-audit, complexity-review, invoice-chase, recurring-materialize, revenue-variance, playbook-draft, kb-gap-scan, kb-ingest, kb-query, daily-learning-loop, examples-promote, examples-find, incident-cluster, prompt-propose, config-propose, seed-case-author
Project subagent (template) search, scope-diff, adr-draft, complexity-review, blocker-probe (loaded from /memories/projects/<slug>/playbook.md)

29.3 Community skills via gh skill

gh skill install JuliusBrussee/caveman caveman
gh skill install JuliusBrussee/caveman caveman-compress
gh skill update --all                                 # weekly cron

Own skills authored locally; not gh skill published (internal only).

30. Caveman policy (#55)

30.1 Default state

All agents operate under caveman full for internal reasoning, tool calls, inter-agent writes (proposal bodies, reasoning_trace notes, enrichment output).

30.2 Mandatory carve-outs — switch to normal mode

Every agent's prompt embeds:

Before producing ANY artifact intended for Wilson or a customer — memo,
task.title, task.description, review.question_text, email body, proposal
text, invoice line items — emit `normal mode` on its own line, produce
the artifact in clear human English (or Portuguese), then emit
`/caveman full` on its own line before continuing.

NEVER apply caveman to:
  - memo.content_md
  - task.title / task.description (human-visible)
  - review.question_text / review.options[]
  - interaction.body (outbound)
  - playbook.body_md (read by LLMs AND Wilson)
  - any document body (contracts, proposals, invoices)

30.3 Language fallback

31. MCP policy (#67)

Default: no persistent MCP server. Agents invoke hq <verb> via Bash.

Conditions for enabling hq mcp-serve:

  1. A tool is called >50× per agent per hour, measured over 7 days
  2. Subprocess-spawn latency (>100ms p95) measurably harms agent latency
  3. Observed in production, not theoretical

When enabled: socket-activated; starts on first request; exits after 5 min idle. Same on-demand pattern as Whisper STT.

32. The five agent surfaces (#21)

32.1 AGENTS.md hierarchy

32.2 .skills/ catalog

agentskills.io-compliant. Symlinked to ~/.claude/skills/capitao/. See §29.

32.3 state/ filesystem mirror

Maintained by view-renderer worker. ≤5s lag. Read-only for agents.

Layout:

state/
├── INDEX.md
├── focus.md · now.md · next.md · backlog.md
├── action-now.md                       ← the killer view (§40 of workflows)
├── projects/<slug>/README.md
├── ventures/<slug>.md
├── tasks/{focus,now,blocked,due-this-week}.md
├── agents/
│   ├── INDEX.md
│   ├── ventures/<role>.md
│   └── projects/<slug>.md
├── producers/INDEX.md
├── timeline/YYYY-MM-DD.md
└── system/
    ├── learning.md                     ← nightly autonomy digest
    ├── agent-incidents.md              ← watchdog output
    └── agent-costs.md                  ← per-agent daily/weekly

32.4 schemas/ JSON Schema catalog

Every node type + edge type + CLI command + webhook has a schema. Flat (no $ref to externals). Consumed by LangChain's strict tool mode.

32.5 hq CLI (§28)


33. Agent tool registry & per-agent curation (#69, #70)

Decision #70 takes the shell-invocable surface (#67, §28) and adds a structured layer: every worker action and every hq verb is registered once as an Anthropic-format tool definition (JSON Schema, strict: true, additionalProperties: false, optional input_examples), exposed natively to agent runtimes (Claude Agent SDK, Hermes, LangChain). Decision #69 sets the write-contract on top of that surface: agents write directly within their domain scope; gating is post-hoc and reserved for blast-radius actions only.

33.1 Single source of truth

src/tools/
├── registry.ts                    ← canonical TypeScript tool spec — one entry per tool
├── handlers/
│   ├── hq_task.ts                 ← per-tool handler + types
│   ├── worker_run.ts
│   └── … (one file per tool)
├── exporters/
│   ├── anthropic.ts               ← → tools[] for Claude Agent SDK
│   ├── hermes.ts                  ← → ./.hermes/plugins/<name>.md
│   └── langchain.ts               ← → BaseTool[]
└── schemas/tools.json             ← auto-generated, committed for diffing
schemas/tools.md                   ← human-readable docs auto-rendered by `hq tools docs`

One source. Three exporters. Schema doc auto-published. Adding a 19th tool = one PR touching one directory. No drift across consumers.

33.2 The 18-tool catalog

Consolidated by domain — action enums collapse what would otherwise be 50+ verbs.

Tool Type Purpose
hq_search read Polymorphic search across all node kinds (slug, text, vector, hybrid). Always-on.
hq_describe read Get one node's full state and direct edges. Always-on.
hq_timeline read Chronological event/interaction stream for an entity, project, or engagement.
hq_trace read Replay an LLM reasoning trace — inputs, decision, evidence, replay, explain.
hq_memory read Query agent memory by role, salience, time window, free-text.
hq_producer read Producer health — last_seen_at, throughput, error rate per data source.
hq_playbook read List/show playbooks with status (canary / active / decayed / archived).
hq_tools meta Discovery. Returns the full catalog with per-tool access status (granted / request). Cheap (~200 tokens). Loaded for every agent.
hq_task write `action='create'
hq_engagement write `action='create'
hq_event_log write Record event.kind ∈ {action, development, state_change, milestone}. Always-on for non-read agents.
hq_entity write `action='find'
hq_interaction write Log a manual interaction (note / call / in-person). Ingest workers log automatically.
hq_review write `action='list'
hq_proposal gated The path for the four gated cases (destructive / cross-scope / system-behavior / heuristic-flag — see §33.6 via #69). `action='propose'
hq_autonomy control `action='status'
hq_agent control `action='list'
hq_examples_find read RAG retrieval over the success-examples DB (#75 §5). --action-type X --tags Y --query Z --top N. Both agents call this before high-stakes drafts.
hq_action_log / hq_edit_log read Admin queries on the action ledger and Wilson-edit ledger. Restricted to operator.
hq_examples_pin write Manual pin of a success example so it never decays (operator invokes on Wilson's request).
worker_run execute Trigger any of the 16 workers on a specific input. Workers also run autonomously — see §33.5.

Total tokens fully loaded ≈ 12.6 K. With per-agent curation (§33.4) the median agent loads ~5.5 K (~55% reduction).

33.3 Tool definition shape

Every tool follows the Anthropic tool-definition contract:

{
  "name": "hq_<noun>",
  "description": "<3-5 detailed sentences — the single biggest performance lever per Anthropic>",
  "strict": true,
  "input_schema": {
    "type": "object",
    "properties": { "action": { "enum": ["..."] }, "...": { "..." } },
    "required": ["action"],
    "additionalProperties": false
  },
  "input_examples": [ { "..." }, { "..." } ]
}

Output contract — every handler returns this shape:

{
  "ok": true,
  "action": "create",
  "data": { "slug": "...", "id": "..." },
  "audit_event_id": "evt_01HXY...",
  "trace_id": "trc_01HXY...",
  "error": null
}

High-signal returns only — slugs, UUIDs, counts. Never raw rows. audit_event_id enables one-command rollback via hq_proposal rollback; trace_id feeds hq_trace explain.

33.4 Per-agent curation (v1.8 collapsed roster)

Each agent's markdown config declares ## Tools with base: true (always-on) and a role-specific add: [...] list. The agent-supervisor (§22, #68) reads this at boot and assembles the registry subset passed as tools=[...] on every API request.

Base set (every agent, ~2 K tokens): hq_search, hq_describe, hq_event_log, hq_tools.

Agent Adds (on top of base) Total Tokens (≈)
chief hq_timeline, hq_entity (read), hq_engagement (read+propose), hq_interaction, hq_proposal, hq_examples_find, hq_action_log (read-only on own trajectories), hq_trace_show, hq_project_run, gmail_thread_read, gmail_search (full archive), gmail_send, calendar_read, calendar_create_event, telegram_send_to_wilson, kb_search (read-only), Edit + Write (path allow-list), drive-mcp (read), calendar-mcp (read history), kb-mcp (read-only) 26 ~14.0 K
operator hq_timeline, hq_entity (read+write), hq_engagement, hq_task, hq_proposal, worker_run, hq_trace, hq_memory, hq_producer, hq_autonomy, hq_agent, hq_playbook, hq_examples_find, hq_examples_promote, hq_examples_pin, hq_action_log, hq_edit_log, migration_plan, migration_apply, kb_ingest_run, graph_reconcile, finance_import 26 ~16.5 K
project-subagent (template, loaded on demand) hq_timeline, hq_task, worker_run, hq_proposal, hq_examples_find 9 ~6.0 K

Median: chief ~14.0 K / operator ~16.5 K (both grew with #77 / #76 deltas). Discovery: any tool not in an agent's set is one hq_tools(action='list') hop away; access expansion via hq_tools(action='request', tool='X', reason='Y') lands as a proposal for Wilson approval (#69 cross-scope class).

Note (v1.8 tool count): both agents now cross the ~20-tool embedding-search threshold (#74 amended #67; #77 expands chief). Loading strategy below (§33.7) opts both into Phase 2 (Tool Search Tool with defer_loading=true for cold-tier tools).

Edit/Write for chief — path allow-list (#77 §6): the runtime middleware enforces a path allow-list on chief's Edit and Write calls. Allowed paths: outputs/proposals/, outputs/contracts/ (in-draft only — post-dispatch routes through request_approval), outputs/briefs/, outputs/post-mortems/, outputs/status-reports/, outputs/customer-facing/<customer>/, agents/chief/personality.md, agents/chief/MEMORY.md. Writes outside the allow-list raise a typed cross_scope_violation error and a hint to ask operator. Violations are tracked as cross_scope_violation_count_14d (target: 0).

33.5 Dual-run workers — autonomous + tool

Decision #70 keeps autonomous worker execution intact and adds the tool surface as a second invocation path. Same handler, two callers.

Worker Autonomous trigger Tool surface
email_ingest IMAP IDLE (always-on) worker_run(worker='email_ingest', params={force_refresh:true})
whatsapp_ingest daemon push worker_run(worker='whatsapp_ingest', params={since_ts})
meeting_transcribe inotify on hot folder worker_run(worker='meeting_transcribe', params={file_path})
enrich BullMQ on interaction.created worker_run(worker='enrich', params={interaction_id}) for re-enrich
summarize BullMQ on payloads >50 K tokens worker_run(worker='summarize', params={interaction_id})
reconcile_llm enrichment writer sub-call worker_run(worker='reconcile_llm', params={candidate_set})
triage BullMQ on interaction.enriched worker_run(worker='triage', params={interaction_id})
view_renderer BullMQ on every ops.outbox event worker_run(worker='view_renderer', params={node_type, slug}) debug
embed_worker BullMQ on nodes.text_changed worker_run(worker='embed_worker', params={node_ids})
kb_indexer inotify on wiki/ worker_run(worker='kb_indexer', params={path})
profile_worker BullMQ on entity-touching outbox event worker_run(worker='profile_worker', params={entity_id})
review_applier BullMQ on review.resolved worker_run(worker='review_applier', params={review_id})
semantic_dedup nightly cron worker_run(worker='semantic_dedup', params={window_days})
playbook_proposer nightly cron worker_run(worker='playbook_proposer', params={trigger_kind})
calibration_analyzer weekly cron worker_run(worker='calibration_analyzer', params={since:'7d'})
calibration_applier event on calibration_proposal.created worker_run(worker='calibration_applier', params={proposal_id})
agent_research 22:00 daily cron worker_run(worker='agent_research', params={window:'24h'})
proposal_analytics_mirror 5-min poll worker_run(worker='proposal_analytics_mirror', params={customer_slug})
toconline_sync 30-min poll worker_run(worker='toconline_sync', params={since_ts})
shopify_sync webhook + 1-h poll worker_run(worker='shopify_sync', params={shop, since_ts})
todoist_mirror webhook + 5-min poll worker_run(worker='todoist_mirror', params={direction})

The producer registry (#26) attributes both: triggered_by: 'cron:agent_research' vs triggered_by: 'agent:project:gopecauto'.

33.6 Direct-write default — friction-floor-zero (#69)

Decision #69 sets the write contract on top of the tool registry. Default = direct write with audit trail. Every tool call lands in ops.outbox + ops.llm_call_log + agent_runs; audit_event_id enables one-command rollback (hq_proposal rollback <id>).

Synchronous gating reserved for four classes only:

  1. Destructive actions — financial / legal / compliance writes, schema removals, edge-orphaning merges, deletion of another agent's writes.
  2. Cross-scope writes — project-A agent touching project-B's subgraph; any agent touching company-level state or another agent's config.
  3. System-behavior changes — autonomy thresholds, agent prompts, model routing, canary fractions (continues through #49).
  4. Heuristic flags — magnitude cap exceeded, novelty, conflicting evidence, security-scan hit.

These four route via the request_approval custom tool to Wilson (#69 + #74). The handoff writes a proposal node (audit trail) and a Telegram nudge; Wilson's accept/reject lands as an ops.wilson_edits row tied to the originating action. There is no terminal-writer chokepoint for routine writes; the action ledger (#75) provides the audit plane.

33.7 Loading strategy

Phase When Strategy Token cost
Phase 1 — Now ≤ 25 tools per agent's curated set Load all curated tools directly. No Tool Search Tool. Median 5.5 K / agent
Phase 2 When agent-specific catalogs exceed 30 tools Tool Search Tool — keep hq_search / hq_task / hq_describe / worker_run / hq_event_log always loaded; defer the rest with defer_loading: true. ~85% reduction
Phase 3 Bulk orchestration (e.g., operator's 22:00 loop mining 200 incidents + 50 wilson_edits) Programmatic Tool Calling — model writes Python in code-execution sandbox; intermediate results never enter context. ~37% reduction on bulk tasks

33.8 Restrictions matrix (v1.8)

Tool / verb Granted to
hq_autonomy (any verb) operator (with Wilson confirmation via request_approval for freeze and kill)
hq_agent write verbs (pause/resume/restart/handover) operator
hq_proposal rollback operator (chief proposes via mailbox handoff)
hq_entity merge nobody direct — always proposal via request_approval
gmail_send chief only (operator has no outbound email permission). Default mode: approval-required — every call wrapped by request_approval per #74 §9. Direct send only after Wilson runs hq agent ungate chief --action=email_send [--scope=…].
migration_apply operator only, plus request_approval if migration is irreversible

Anyone needing a restricted tool escalates via hq_tools(action='request', tool='X', reason='Y') — request becomes a proposal Wilson approves.

33.9 What this changes in the existing plan

33.10 Inter-agent CLI mailbox (#71)

Full rationale in DECISIONS.md #71. CLI verbs:

hq agent send <to> <message>                   # fire-and-forget
hq agent ask <to> <message> --timeout 30s      # sync RPC, blocks on reply (hard ceiling 5 min)
hq agent broadcast <group> <message>           # @ventures | @projects | @all
hq agent reply <message-id> <body>
hq agent inbox [--unread] [--from <agent>]
hq agent roster [--scope <team|project>]
hq agent presence <role>                       # last-seen, current state

Wire model. hq agent ask writes a row into ops.agent_inbox (extended per #60 amendment: from_agent TEXT, correlation_id UUID, expects_reply BOOLEAN) and emits an agent.message outbox event. Supervisor (#68) routes the event to the recipient — fork-execing it cold if not warm. Recipient calls hq agent reply <message-id> <body>, which emits agent.reply.<correlation_id>. Caller's runtime LISTENs on that channel and unblocks. Every agent stays a top-level SDK process (no SDK nesting; preserves #50). Sub-30 ms transport when both ends are warm; ~1.5 s when target is cold.

SDK surface. Inside each agent's session, a custom SendMessage tool (Anthropic-pattern JSON-schema, strict: true, per #70) wraps hq agent send / hq agent ask as a subprocess. This is NOT in the base toolset; agents declare it explicitly via add: [send_message]. from_agent is required on all agent-to-agent rows; NULL means Wilson. from_agent='wilson' magic strings are forbidden (CHECK constraint).

Coordination class split (amends #50). Synchronous Q&A and short-form delegation use the CLI mailbox (the fast path). Multi-step proposals, cross-tier reviews, and anything that must survive a process exit or be replayed by hq state rebuild continue through the graph (the primary path). Rule of thumb: if you'd want it replayable, it goes through the graph.


Part VII — Safety, Governance, and Learning

34. Autonomy framework (#49) — four layers

Every self-improvement (playbook promotion, calibration change, prompt tweak, config change) flows through:

Layer 1: Pre-apply gates
  ✓ dangerousness check (destructive → review queue)
  ✓ sample size (n ≥ threshold)
  ✓ magnitude cap (per-week delta limits)
  ✓ security scan (invisible Unicode, prompt-injection patterns, fenced context)
  ✓ freeze state (hq autonomy freeze → hold)
         │ passes
         ▼
Layer 2: Canary rollout
  • test_fraction = 20% traffic (default)
  • adaptive window: 48h min, 8 uses target, 14d max
  • canary_id on every reasoning_trace
         │ window closes
         ▼
Layer 3: Auto-decide
  • regression → rollback + alert
  • no change → shelved (30d cooldown)
  • improvement → promote + digest note
         │ promoted
         ▼
Layer 4: Drift monitor (continuous, 7-day rolling)
  • metric degradation > tolerance for 48h → auto-rollback
  • reason logged + Telegram to Wilson

Destructive carve-out (always reviews, never canary):

Emergency controls:

35. Confidence gating

Confidence Action
≥ 0.95 Auto-apply deterministically
0.85 – 0.95 Auto-apply + soft review (compounds to profile)
0.70 – 0.85 Auto-apply with high scrutiny (canary eligible)
< 0.70 Queue as review node; human decides
Destructive, any confidence Queue as review node

36. Watchdog tiers (#64)

Tier Trigger Action
Soft First timeout, suspected loop DM agent: "check in" — 2 min grace
Medium Confirmed loop, repeated timeout, error burst Force-stop current query, fresh session on next trigger
Hard Crash, budget runaway, RSS explosion systemd restart + review
Critical >5 hard trips in 1 hour systemd-stop + quarantine status + review

Every trip writes agent_incident node. Meta-watchdog guards watchdog.

37. Direct-write discipline (#69, #74 — replaces single-writer)

The v1.7 single-writer model (terminal writes only by triage-dispatcher) is dropped. With only two top agents, write-race risk is dominated by accidental overlap, not by deliberate dedup, and is solved with simpler tooling:

  1. Per-agent scope. chief writes outbound (email, customer artifacts). operator writes inward (graph, files, code, finance). The tool registry (#70) blocks cross-scope writes at the harness layer.
  2. Friction-floor-zero with destructive gate (#69). Each agent writes directly within its own scope. Destructive, cross-scope, system-behavior, and heuristic-flagged actions route to the request_approval custom tool, which becomes a proposal node visible to Wilson.
  3. Action ledger as single source of truth (#75). Every write produces one ops.agent_actions row, regardless of agent. Conflicts surface as duplicate-target rows with overlapping timestamps; a nightly reconcile job (operator @ 03:00) flags them for the morning brief.
  4. Inter-agent coordination via mailbox (#71). When chief needs a graph mutation, it hq agent ask operator. Operator owns the write. The handoff is logged as parent/child action rows.

Prevents write races (scope isolation), duplicate work (mailbox handoff), and actor-attribution confusion (action ledger).

38. Memory invariants

39. Action ledger + trajectory + success examples (#75 + #76) + Daily learning loop (#65 amended by #74)

The v1.7 "daily agent-research" agent collapses into operator's 22:00 learning loop. The loop runs four passes — trajectory annotation review, success-pattern promotion (text + procedure), incident mining, and roll-up — as a single Opus session, gated by friction-floor-zero (#69), and writes into the three-layer ledger: actions (#75 §1) + trajectories (#76) + success examples (#75 §3 + #76 trajectory_summary).

39.1 Action ledger schema (#75 §1, amended by #76 §1)

ops.agent_actions(id, actor, session_id, task_id, action_type, target_kind, target_id, input, output, status, model, cost_usd, duration_ms, parent_id, embedding, created_at). Every tool call by either top agent or any subagent appends one row. The task_id UUID groups all actions within a single agent task (one inbound event → one task → one task_id; subagents inherit task_id from their parent). The agent_runtime PostToolUse hook writes asynchronously to a Valkey stream; ledger-flusher drains the stream into Postgres in 1-second batches. Storage cost ~50 KB/row × 50 rows/day × 365 days ≈ 900 MB/year. Kept forever.

39.2 Wilson edit log (#75 §2)

ops.wilson_edits(id, action_id, edit_type, final_output, diff_summary, diff_score, notes, edited_at). Captured via three paths:

Path Detection edit_type values
Email send Outbox watcher diffs chief's draft against the actually-sent message in Gmail accepted, tweaked, rewrote, rejected
Task / proposal / file edit hq CLI wrappers on every Wilson-driven mutation persist pre/post snapshots accepted, tweaked, rewrote
Acceptance with no change Outbox watcher emits edit_type='accepted', diff_score=0.0 after 24 h with no Wilson modification accepted
Rejection hq action reject <id> --reason=… rejected, abandoned

Diff summary: 1-line Haiku 4.5 generation (~$0.0002 per diff). Diff score: deterministic cosine distance on Voyage 3.5-lite embeddings (ADR #82) — the same embedding service that powers nodes.embedding. Each diff scoring call is ~2 short embed requests (draft + sent) at $0.02/M tokens; ~50 diffs/day × ~500 tokens each = ~$0.0005/day, rounded into the embedding line below.

39.2.5 Trajectory capture and per-action annotation (#76)

ops.email_reply_sessions — one row per email reply attempt. Captures: task_id, thread_id, customer_slug, inbound_thread (full snapshot), classification, draft_output, draft_model, trajectory_action_ids[] (ordered list of action_ids — the procedure), retrieved_example_ids[] (which success-examples chief used as RAG anchors), approval_status, final_output, final_diff_score. Schema is generalizable to other artifact types (proposal_sessions, adr_sessions, kb_ingest_sessions) in later waves; the email case is Wave 1 priority.

ops.action_annotations — Wilson's per-action grades. Captures: action_id, task_id, grade ∈ {good, bad, missing, unnecessary}, note, annotator, created_at. Annotations are written from two paths:

Path Trigger
Inline approval UI Wilson clicks thumbs/comment on any action in the trajectory pane while reviewing a draft (request_approval page)
Retrospective CLI hq action annotate <action_id> --grade=… --note="…"

For grade='missing', the annotation is attached to the closest preceding action_id with a note describing what should have happened; the runtime renders this as an interleaved gap when displaying the trajectory.

Approval UI three-pane layout (#76 §4): left = inbound thread; center = chief's draft + retrieved success examples; right = ordered trajectory list with per-action thumbs-up/thumbs-down/missing-step buttons + comment boxes. Wilson can: (1) approve/edit/reject the draft (writes wilson_edits), (2) grade any action (writes action_annotations), (3) insert a missing step (writes action_annotations with grade='missing').

CLI surfaces:

hq trace show <task_id>                 # render full trajectory + annotations
hq trace gaps --customer=<slug> --since=14d  # all `missing` annotations grouped by pattern
hq action annotate <action_id> --grade=… --note="…"
hq examples find --include-trajectory   # default true for chief and operator (#76 §6)

39.3 Success examples DB and auto-promotion (#75 §3, amended by #76 §7)

Operator at 22:00 (Opus, plan-mode) runs the two-axis auto-promotion pipeline against ops.wilson_edits × ops.action_annotations from the past 24h. The text axis (Wilson edited the artifact) and the trajectory axis (Wilson graded the procedure) compose:

Edit type × trajectory annotations Promotion
accepted AND no bad or missing annotations auto_promoted after 7 days (clean text + clean process)
accepted AND ≥1 missing annotation auto_promoted_with_caveat — the procedure embeds the missing note as a corrective; future retrievals see "next time also do X" inline
tweaked AND diff_score < 0.20 AND no bad annotations auto_promoted (Wilson liked both procedure and bones)
tweaked with ≥1 bad annotation NOT promoted — extracted as anti-pattern lesson with the specific bad action highlighted
rewrote OR rejected NOT promoted — anti-pattern lesson; trajectory annotations included in the lesson body
wilson_pinned (manual via hq example pin) Bypasses all rules; never decays

The auto_promoted_with_caveat class is novel and important: it captures the case where Wilson said "the email was fine, but next time also check X." Too valuable to lose, not a clean exemplar — so the markdown mirror includes the missing step as a prescriptive instruction in the procedure section.

Promoted examples mirror to /memories/success-examples/<action_type>/<id>.md with PII redaction (Haiku strips names + addresses; replaces entity<customer>, email_address<email>, person names → <contact>). The mirror format includes a "What I queried (the trajectory)" section listing each action with its grade (✓ / ✗ / +missing) and Wilson's note — see #76 §5 for the full template.

39.4 Retrieval (used at draft time, trajectory-aware per #76 §6)

hq examples find --action-type <type> --tags <…> --query "<…>" --top <N> [--include-trajectory=true] returns top-N markdown cards. The --include-trajectory flag (default true for chief and operator; opt-out for cheap lookups) returns the full trajectory section — the procedure that produced the validated artifact — alongside the artifact itself.

Both agents are required (per their workspace CLAUDE.md) to call it before:

The retrieved cards are appended to the agent's prompt as a <past-wilson-validated-trajectories> block. The agent is instructed to follow the procedure in the retrieved cards before drafting (run the same queries, ask operator/chief in the same order, retrieve the same kinds of context), not just to mimic the text. Tone is the surface; procedure is the substance. This is trajectory-RAG over the agent's own validated outputs — no synthetic training data, no fine-tuning.

39.5 Incident mining (the other half of the 22:00 loop)

The same Opus pass also mines agent_incident nodes, ops.agent_error_log, the past 24h of action-ledger rows where status='failed', and operator's own self-reflection. Outputs four kinds of proposals (unchanged from v1.7):

Kind Routes to Example
Prompt tweak ops.improvement_proposals, #49 canary "chief hit 12 activity-timeouts; add 'if no new info in 5 turns, task-complete' rule"
New skill operator drafts; Wilson approves "3 actions independently derived Portuguese deadline extraction — create extract-deadline skill"
Config change #49 canary "project:garq-pdm exceeded budget 5/7 days; raise hard cap $5→$7 OR add summarize gate"
Seed case / test agents/operator/incident_corpus/ "This loop pattern becomes an eval fixture"

39.6 Weekly and monthly aggregations

Operator on Sunday 04:00 (Haiku — cheap rollup) concatenates the week's success-examples + lessons into a single markdown index at /memories/success-examples/_weekly/<YYYY-Www>.md. Month-end concatenates four weeks. The morning brief on the first weekday of each week pulls the prior week's index as a "What we learned" section.

39.7 Cost and budget

Component Daily cost (estimate)
Diff summary (50 actions/day × Haiku ~$0.0002) $0.01
Trajectory summary generation per email_reply_session (~30/day × Haiku ~$0.0005) $0.015
Embedding (50 actions × Voyage 3.5-lite, ~500 tokens each, $0.02/M) ~$0.0005 (≈ $0.02/month — included in main Voyage spend line)
Auto-promotion pipeline (Opus, ~35 K tokens — slightly larger than v1.8.1 because trajectory annotations are now an input) $0.50
Markdown mirror generation with trajectory section (Haiku, ~12 K tokens) $0.006
Mailbox + watchdog overhead $0.00
Total ~$0.53/day, capped $0.60

Hard cap raised from $0.50 to $0.60 to cover trajectory-summary generation. Enforced by daily_usd_hard on operator's budget plus per-call cost telemetry.

40. Playbooks (#47)

40.1 Creation

Nightly playbook-proposer (driven by operator's 22:00 learning loop) scans for:

Draft body: ≤4096 chars. Security scan: invisible Unicode, prompt-injection patterns, financial/legal/compliance keyword flags. Store at wiki/playbooks/<category>/<slug>.md; mirrored as playbook node.

40.2 Lifecycle

Canary 20% → auto-decide ≥75% → promote; <50% → archive. Decay: 90d unused → demote, 180d → archive. Drift monitor 7d rolling.

40.3 Consumption

Hybrid search in buildContext() returns top-2 status IN (canary, active) playbooks. Fenced as <playbook-context>[System note]...</playbook-context> to guard prompt injection.

41. Calibration loop (#48)

41.1 Observation tables (append-only)

41.2 Pipeline

Weekly calibration-analyzer (Sun 09:00) detects override patterns; emits calibration_proposal to ops.pending_proposals. Hourly calibration-applier runs proposals through #49 framework. Config hot-reload on SIGHUP (no restart).

41.3 Tunables

All markdown — hot-reloadable by runtime on SIGHUP.

42. Schema evolution (#38)

Enrichment output includes pending_schema_proposals[] (new enum values seen). Monthly (1st Sunday) schema-analyzer aggregates, emits to review queue. Approved → enum migration + re-enrichment of historical interactions (12-month window).