Part III — Data Model

8. Graph substrate

The data layer is fully specified in docs/architecture/data-layer.md. Summary of what v1.5 locks:

Core schema:

CREATE TABLE nodes (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  type            text NOT NULL,
  status          text,                                       -- projection column
  priority        text,                                       -- projection column
  occurred_at     timestamptz,                                -- projection column
  archived_at     timestamptz,                                -- soft delete
  props           jsonb NOT NULL DEFAULT '{}',
  tags            text[] DEFAULT '{}',
  full_text       tsvector,
  embedding       vector(512),                                -- voyage-3.5-lite output (ADR #82)
  producer_id     uuid NOT NULL REFERENCES nodes(id),         -- provenance
  owner_id        uuid REFERENCES nodes(id),
  external_source text,
  external_id     text,
  valid_during    tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
  created_at      timestamptz DEFAULT now(),
  updated_at      timestamptz DEFAULT now(),
  redaction_policy text DEFAULT 'none',
  confidence      real
);

CREATE TABLE edges (
  id           uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id      uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
  to_id        uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
  type         text NOT NULL,
  props        jsonb NOT NULL DEFAULT '{}',
  valid_during tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
  producer_id  uuid NOT NULL REFERENCES nodes(id),
  confidence   real,
  created_at   timestamptz DEFAULT now()
);

Bitemporal shadow tables (history_nodes, history_edges) record every mutation with op (insert/update/delete/archive), actor (from hq.actor GUC), reason (from hq.reason GUC), full prior row as JSONB.

Indexes: btree on id/external_id; gin on props (jsonb_path_ops), full_text, tags; hnsw on embedding; gist on valid_during; partial btree on (type, status), (type, priority), (type, occurred_at).

Extensions used: pgcrypto, pg_trgm, btree_gin, pg_stat_statements, pg_cron, AGE (Cypher fallback), pgvector, pgvectorscale, timescaledb-apache, pg_partman, pg_duckdb, pg_search, auto_explain.

9. Node types (complete catalog — 28 types)

CRM core:

Type	Purpose
`entity`	Person or organization (customers, partners, prospects, own ventures)
`contact`	Individual contact; multi-entity-capable
`engagement`	Commercial unit — one per proposed deal; stages discovery→proposal→contract→delivery (or partner/declined)

Execution hierarchy:

Type	Purpose
`project`	Execution wrapper; priority axis + status + tech stack
`feature`	Discrete deliverable within a project; complexity + acceptance criteria
`task`	Unit of work; dev_* + ops_* fields; Todoist mirror
`deliverable`	Shipped artifact (URL, repo commit, document) — distinct from `document`

Communication:

Type	Purpose
`interaction`	Every conversation — email, WhatsApp, phone, meeting
`conversation`	Session wrapper around agent runs, meetings, or email threads

Commercial flow:

Type	Purpose
`quote`	Proposal draft before engagement promotion
`invoice`	Mirrored from TOConline
`payment`	Settles invoice; partial/full/credit note
`expense`	Charged to engagement

Signal & value layer (the v1.4b unlock):

Type	Purpose
`intent`	What a customer is asking for (ask_reply, ask_budget, ask_document, ask_development, ask_fix, ask_meeting, ask_decision, inform, confirm, approve, complain, churn_signal, expand_signal, thank)
`expectation`	Accountability unit — what's owed, by whom, when; we_owe / they_owe / mutual
`turn_state`	Per-conversation state: theirs / ours / third_party / blocked / closed

Knowledge & outputs:

Type	Purpose
`document`	File, PDF, transcript, invoice, contract
`memo`	Synthesized output (WBR, briefing, proposal text)
`kb_article`	Wiki mirror from `~/knowledge-base/wiki/`
`decision`	ADR; supersedable
`risk` / `open_question`	Unresolved concerns affecting projects

Agent & governance:

Type	Purpose
`producer`	Intake registry — every data source is a first-class node
`event`	Timeline marker (milestone / development / action / state_change)
`agent_run`	One tool-calling session of an agent
`agent_session`	Supervisor-spawned session; parent of agent_runs
`agent_handover_memo`	Memory bridge between sessions
`memory_entry`	Distilled fact, pattern, preference, anti_pattern, watch_item — with salience + decay
`reasoning_trace`	LLM call audit (inputs, output, confidence, canary_id)
`agent_incident`	Watchdog-detected issue
`review`	Confidence-gated or destructive decision awaiting human/agent resolution
`playbook`	Procedural memory (auto-proposed, canary-rolled, auto-demoted)
`proposal`	Agent's proposed write; awaits dispatcher resolution
`triage_decision`	Companion node capturing triage-proposed values vs. actual writes
`alert`	Prometheus-fired alert
`shopify_order`	Imported from Shopify

10. Edge types (complete catalog)

Provenance & participation:

produced_by — every node → producer (mandatory)
authored_by — document/memo → contact/agent_run
from / to / cc / has_participant — interaction → contact
touched — agent_run → any node read or written during the run

Containment & composition:

part_of — interaction/agent_run → conversation; event → conversation
has_feature — project → feature
has_task — feature → task
belongs_to — task → feature (primary)

Relationships:

about — interaction/event/memo → entity/project/feature/engagement
mentions — document/memo → entity/contact/task/kb_article
owned_by — project/contact → entity
works_for — contact → entity
reports_to — contact → contact

Commercial:

fulfills — project → engagement (M:N)
covers — engagement → entity
proposes — quote → engagement
billed_to — invoice → entity
for_work — invoice → engagement / project / feature
settles — payment → invoice
charged_to — expense → engagement

Dependencies & dependencies semantics (v1.5 richer than v1.3.1):

blocked_by, blocks, depends_on, related_to, supersedes, superseded_by, duplicates, duplicate_of, spawned_from

Intent & expectation flow:

extracted_from — intent → interaction
addressed_to — intent → contact
fulfilled_by — intent/expectation → interaction/document/event/task
spawned — intent/event/interaction → task
owed_by / owed_to — expectation → contact/entity

Agent & learning:

input_to — reasoning_trace → context nodes
decided — reasoning_trace → affected nodes
validated_by — memory_entry → reasoning_trace
applied_by — playbook → agent_run
resolved_by — review → contact/agent_run
linked_trace — agent_incident → reasoning_trace

Multi-role assignment (DACI):

driven_by — task → contact/agent (exactly 1)
accountable — task → contact (exactly 1)
consulted — task → contact/agent (0..N)
informed — task → contact (0..N)

11. Producer registry (#26)

Every fact carries produced_by → producer. Five kinds:

Kind	Example slugs	Attribution
`conversation_channel`	`email:zoho-wilson`, `whatsapp:wilson-personal`, `meeting:zoom`, `phone:meo`	Per-channel ingest worker
`agent_session`	`claude-code:vps`, `claude-code:laptop`, `hermes:vps`, `langchain:vps`	Per-session; agent_run rolled up
`project`	`project:garq-pdm-consulta`, `project:capitao-command-center`	Semantic emission only (events, not raw)
`external_system`	`toconline:capitao`, `shopify:petvitaclub`, `todoist:wilson`, `github:wcapitao`	Per-system poller / webhook
`internal_worker`	`cc:view-renderer`, `cc:triage-worker`, `cc:agent:ventures:chief`, `cc:agent:ventures:operator`, `cc:agent:project:garq-pdm`	Self-referential

Each agent is a producer.kind='internal_worker' with a role prop (ventures:chief, ventures:operator, project:garq-pdm, etc.). This enables per-agent queries: "show me everything chief produced this week" is one WHERE producer.slug = 'cc:agent:ventures:chief'. Subagent attribution flows through the action ledger's parent_id (#75 §1).

12. Intent + expectation + turn_state — the value unlock

These three node types elevate the schema from "record of what happened" to "state of obligations and turns."

12.1 Intent

Every interaction produces 0..N intents via the enrichment worker.

intent
  kind                ask_reply | ask_budget | ask_document | ask_development
                      | ask_fix | ask_meeting | ask_decision | ask_intro
                      | inform | confirm | approve | complain
                      | churn_signal | expand_signal | thank
  urgency             blocker | impactful | nice
  explicit_due_at     timestamptz (when the sender stated a deadline)
  confidence          0.0-1.0 (enrichment confidence)
  evidence_span       text quote
  status              open | fulfilled | abandoned | superseded

  edges:
    extracted_from → interaction
    about          → entity | engagement | project | feature
    addressed_to   → contact
    fulfilled_by   → interaction | document | event | task

12.2 Expectation

Every promise — ours or theirs — is an expectation.

expectation
  kind                commitment_made | ask_received | sla | recurring_obligation
                      | deliverable_promised
  direction           we_owe | they_owe | mutual
  asked_at            timestamptz
  due_at              timestamptz (nullable)
  resolved_at         timestamptz (nullable)
  status              open | overdue | resolved | abandoned | superseded
  severity            blocker | impactful | nice
  description_md      short text
  sla_source          engagement | policy | explicit | derived

  edges:
    about           → entity | engagement | project | feature
    spawned_from    → intent | event | interaction
    owed_by         → contact | entity
    owed_to         → contact | entity
    fulfilled_by    → interaction | document | event | task
    supersedes      → expectation (when replaced)

Recurring obligations (monthly invoice, weekly standup, quarterly review) use kind='recurring_obligation' + RRULE. pg_cron materializes next instance as predecessor closes.

12.3 turn_state

Maintained per conversation by the enrichment worker.

turn_state (1:1 with conversation)
  state                 theirs | ours | third_party | blocked | closed
  last_turn_at          timestamptz
  last_turn_by          contact_id | agent_id
  turnaround_sla_hours  integer (inherited from engagement)
  overdue_at            timestamptz (computed: last_turn_at + sla when state='ours')

One query answers "who am I ignoring right now?":

SELECT conversation.id, entity.name, now() - last_turn_at AS waiting
  FROM turn_state JOIN conversations USING (conversation_id)
                  JOIN entities ON ...
 WHERE state = 'ours' AND overdue_at < now()
 ORDER BY overdue_at ASC;

13. Memory persistence contract (#63, amended by #71)

Every persistent agent's memory is three-layered. Full rationale in DECISIONS.md #71 and #63 (amended).

13.0 Three-layer model

Static layer — per-agent .md files in the workspace (agents/<scope>/<slug>/CLAUDE.md, playbook.md, personality.md; project agents add customer-profile.md, domain-knowledge.md). Committed to git, rarely changes, token-budgeted (see §17.x). Loaded verbatim as static context at session start.
Dynamic working layer — MEMORY.md per agent. Anthropic-standard 25 KB cap. Written by the agent during a session via memory: project SDK frontmatter. Mirrored from the graph nightly by memory-tender. Not authoritative — a cache only.
Long-term graph-backed layer — memory_entry, agent_handover_memo, reasoning_trace, agent_session nodes (unchanged, see §13.1–§13.3 below). Authoritative source of truth.

Invariant C11': MEMORY.md is a cache. Agent code MUST NOT treat MEMORY.md as authoritative. Missing or stale files MUST be rebuilt by memory-tender from the graph. agent-watchdog (#64) enforces this: if MEMORY.md is absent or stale beyond decay_after_days on session open, a tender pass runs before any event is processed.

Context-budget alarm (from #59) lowered from 75% → 70% to compensate for the static layer.

Every persistent agent's long-term memory lives in the graph. Three node types form the contract:

13.1 agent_session

One per agent run lifecycle. Created on spawn; finalized on task-complete or rotation.

agent_session
  agent_id
  started_at / ended_at
  turns_taken
  cost_usd
  model_used
  trigger_kind            beat | outbox | dm
  task_summary_md         short narrative of what this session did
  snapshot_md             the handover memory snapshot
  reason_closed           from <task-complete reason="..."/>

  edges:
    produced_by → agent's producer node
    part_of     → conversation (if multi-session task)

13.2 memory_entry

Distilled facts the agent carries forward.

memory_entry
  kind                 fact | pattern | preference | anti_pattern | watch_item
  body_md              ≤500 chars, concise
  salience             0.0-1.0
  created_in_session   uuid
  last_validated_at    timestamptz
  decay_after_days     int (default: 30 facts, 90 patterns, 180 prefs, infinite anti_patterns)
  tags                 text[]

  edges:
    about         → entity | project | feature | engagement (optional)
    produced_by   → agent_producer
    validated_by  → reasoning_trace (when agent reconfirms in later session)
    superseded_by → memory_entry (when replaced)

13.3 agent_handover_memo

Written at session close.

agent_handover_memo
  body_md                200-400 words, human-readable
  open_tasks[]           uuid array
  open_subscriptions[]
  pending_proposals[]

  edges:
    produced_by → agent_producer
    part_of     → agent_session

13.4 Reconstruction header injected on fresh session

# Session context reconstruction

You are agent:<role> starting a fresh session after the previous one
closed with reason: "{{ last_session.reason_closed }}".

## Recent memory entries (salience ≥ 0.3, last 30 days)
{{ hq memory query --role <role> --limit 40 --min-salience 0.3 }}

## Last handover memo
{{ last_handover_memo.body_md }}

## Currently open tasks where you are the driver
{{ hq task list --driven-by agent:<role> --status in_progress }}

## Open proposals you emitted still pending
{{ hq proposal list --by-agent <role> --status pending }}

## Open subscriptions
{{ last_handover_memo.open_subscriptions }}

---
Current trigger: {{ current_trigger.summary }}
Proceed. For older context: `hq memory search --role <role> --query ...`

~2-8k tokens total. Bounded. Deterministic. Cheaper than replaying a full transcript.

13.5 Memory decay and reconciliation

Weekly memory-tender job (Haiku, ~$0.05/week). As of #71, gains a sync pass:

Reads each agent's MEMORY.md, promotes durable facts into memory_entry nodes, prunes the file back under 25 KB cap, rewrites it from the top-N graph entries by salience. Graph remains source of truth.
Walks each agent's memory pool; archives entries past decay_after_days with last_validated_at < now() - decay
Flags contradicting entries as memory-conflict reviews
Merges near-duplicate entries (embedding-similarity > 0.95 AND compatible content)
Must be idempotent and crash-safe: single transaction, per-agent row lock, atomic MEMORY.md write (mv from temp).

Keeps memory pool under ~200 active entries per agent — fits comfortably in the reconstruction header.

13.6 Invariants (enforced in code)

No <task-complete/> finalizes without at least one memory_entry write OR an explicit memory_entry_count=0 reason.
No fresh session starts without successful reconstruction query.
No memory_entry can be deleted; only archived.
Every memory_entry has produced_by → reasoning_trace so "why did the agent believe this?" is one hop.

14. Bitemporal audit (#16)

Every mutation writes a history_* row via AFTER triggers.

CREATE TABLE history_nodes (
  history_id  bigserial PRIMARY KEY,
  id          uuid NOT NULL,
  op          text NOT NULL CHECK (op IN ('insert','update','delete','archive')),
  actor       text NOT NULL,               -- hq.actor GUC
  reason      text,                        -- hq.reason GUC
  recorded_at timestamptz DEFAULT now(),
  row         jsonb NOT NULL
) PARTITION BY RANGE (recorded_at);        -- monthly partitions

Actor convention:

user:wilson — human Wilson action
agent:team:<role> — Ventures team agent
agent:project:<slug> — project agent
worker:<name> — infrastructure worker
system:migration — schema migrations
system:rollback — autonomy framework auto-rollback

Time-travel query: hq as-of <timestamp> describe <slug> calls public.nodes_as_of(timestamptz) which unions current nodes with history rows matching the window.

15. Materialized projections + gap views

Materialized views (refreshed nightly unless noted):

analytics.mv_entity_score — composite: recency(30%) + revenue(25%) + activity(15%) + stage(10%) + linguistic(10%) + anti_match(10%)
analytics.mv_inbox_counts — unprocessed interactions + tasks per owner/entity/priority
analytics.mv_pipeline — engagements by stage + age + revenue-at-risk + forecast
analytics.mv_entity_focus_count — count with priority='focus' (enforces hard cap of 3 via trigger)
analytics.mv_health_per_entity — temperature, response latency, expectation counts

Gap views (new in v1.5 — the "what's missing" engine):

gap_overdue_replies — turn_state='ours' AND overdue_at < now()
gap_stale_proposals — engagement at 'proposal' with no activity 14d
gap_billing — engagement at 'contract' ≥7d with no invoice
gap_stalled_projects — project in_progress with no task activity 30d
gap_incomplete_specs — feature in_progress with NULL acceptance_criteria
gap_overdue_expectations — expectation status='open' AND due_at < now()
gap_recurring_missed — recurring_obligation not materialized on schedule
gap_unresolved_intents — intent status='open' past SLA with no fulfilling edge
gap_abandoned_tasks — task in_progress with updated_at < now() - 7d
gap_unread_documents — proposal document sent >14d ago, no read event (via proposal-analytics)
gap_quiet_customers — engagement active but last interaction >30d
gap_cost_anomalies — agent_run cost burst (>2× 7d mean)

~20 deterministic gap views. Each agent queries its domain's gap view first, acts second.

Part IV — Agent Organization

16. Capitão Ventures team — two operational agents (#74)

The team is two always-on operational generalists, not a roster of specialists. They differ only in which side of the system they face.

Capitão Ventures team
├── chief     — Outward.  Customers, prospects, partners, Wilson.
└── operator  — Inward.   Data, code, files, finance, KB, the graph.

On-demand subagents (spawned in-process by either top agent)
├── chief    spawns: email-drafter, proposal-writer, customer-brief, pipeline-analyst, Explore
└── operator spawns: project-agent:<slug>, code-reviewer, test-engineer, debugger,
                     database-specialist, kb-ingest, Explore

Infrastructure layer (horizontal, unchanged from v1.7)
├── agent-supervisor — event routing, concurrency cap, RAM-aware spawning
├── agent-watchdog   — heartbeat, loop detection, budget, incident reporting
├── meta-watchdog    — watches the watchdog
└── memory-tender    — weekly memory + workspace reconciliation

16.1 Why two agents (not 10, not 3)

v1.7 specified 10 persistent agents differentiated by domain (account-manager, project-manager, sales-bd, …). v1.8 rejects that model on two grounds:

Operational coherence beats specialization. A solo founder running 12 projects across 3 ventures needs an agent that knows everything about a thread; not 10 agents that each know one slice. The cookbook's "single agent with rich context" pattern beats the multi-agent split-brain pattern at this scale.
Always-on presence is the load-bearing feature. Differentiated cron schedules (08:00 account-manager, 08:30 project-manager, …) are anti-presence. Two warm agents that respond in seconds beat ten cold agents that respond in minutes.

The chief / operator split exists for safety isolation: customer-facing speech (chief) runs separately from system-mutating action (operator), so a model regression in one workspace cannot accidentally compromise the other. Both can read the full graph; only operator can mutate it. Both can converse; only chief speaks outward.

16.2 Shared workspace skeleton (#71 amended)

Both agents inherit the same workspace shape:

agents/
├── _shared/
│   ├── CLAUDE.md             # mission + 5 invariants + voice rules         (≤  900 tok)
│   ├── ventures-index.md     # 3 ventures + 12 projects, 1 line             (≤  600 tok)
│   ├── customers-index.md    # 1 line per active engagement                 (≤  600 tok)
│   ├── peer-card.md          # how to reach the other agent (mailbox API)   (≤  300 tok)
│   ├── glossary.md           # node kinds, slugs, conventions               (≤  400 tok)
│   └── opus-triggers.md      # the mandatory-Opus list (#74 §2)             (≤  300 tok)
├── chief/
│   ├── agent.md              # frontmatter only (name, model, tools, memory, opus_triggers)
│   ├── CLAUDE.md             # role, voice, ownership, walk-throughs        (≤ 1800 tok)
│   ├── playbook.md           # standard operating procedures                (≤ 1500 tok)
│   ├── personality.md        # tone, style, customer-by-customer notes      (≤  900 tok)
│   └── MEMORY.md             # dynamic cache, decay-managed                 (≤ 1200 tok)
└── operator/
    ├── agent.md
    ├── CLAUDE.md, playbook.md, personality.md, MEMORY.md

The Anthropic memory tool (memory_20250818) mounts /memories/ on both agents read-write. The directory tree under /memories/ mirrors agents/_shared/ and agents/<agent>/ exactly, so workspace-as-source and memory-as-runtime stay byte-identical.

Project workspaces live at /memories/projects/<slug>/CLAUDE.md and are loaded on demand by operator when it spawns a project subagent (see §18.5). They are not loaded into either top agent's static context.

Token budget at spawn. chief ≈ 22 K tokens (≈11% of 200 K); operator ≈ 22 K. The pre-commit hook tools/check-agent-budget.py (cl100k tokenizer via tiktoken) enforces the per-file caps above and the per-agent total of 22 K. Hook failure blocks the commit; an --override-budget path requires explicit Wilson approval in the commit trailer.

16.3 Model policy (#74 §2)

Both agents share the same model policy. Default is Sonnet; Opus is mandatory when any of these triggers fire:

Trigger	Model
Multi-step plan with ≥3 sequenced actions	Opus 4.7
Customer-facing artifact (proposal, contract, brief, post-mortem)	Opus 4.7
ADR drafting / decision-ledger entry	Opus 4.7
Morning brief synthesis (chief, 07:00)	Opus 4.7
Daily learning loop (operator, 22:00)	Opus 4.7
Destructive-action 4-class review	Opus 4.7
Routine email reply / task update / file edit	Sonnet 4.6
Single-step lookup, classification, triage	Haiku 4.5

Implementation. Each agent's agent.md declares an opus_triggers list. The runtime evaluates triggers in priority order before each turn and overrides the default model per turn via the SDK's query(options={"model": ...}) parameter. Trigger evaluation runs in <50 ms (regex + JSON predicates over the current event batch); cost is negligible.

16.4 Always-on lifecycle

Both agents run as systemd USER units under athena (#73 A3) with Restart=always. The supervisor (#68) holds an in-memory presence table; an agent is warm when its query loop is mid-task or within the 90-second post-task-complete cooldown.

State	Definition	Latency to first token
Warm	Query loop alive, prompt cache hot	<200 ms
Cooldown	Within 90 s of `<task-complete/>`, prompt cache hot	<200 ms
Cold	systemd active, query loop dormant	~1.5 s
Stopped	systemd inactive (manual or watchdog kill)	~3 s + restart cost

The supervisor's RAM-aware scheduler (#68) holds spawns when /proc/meminfo shows <600 MB available; under the 8 GB envelope (#66) this is a rare event because typical resident usage with both agents warm is 1.8–2.2 GB.

16.5 Inter-agent CLI mailbox (two-party, #71 amended)

Inter-agent communication collapses from N-party to two-party:

hq agent ask <peer> "<msg>"     # synchronous RPC, blocks for response (default 30 s, configurable)
hq agent send <peer> "<msg>"    # async FYI, no wait
hq agent reply <id> "<msg>"     # response to an outstanding ask
hq agent inbox                  # list unread messages
hq agent presence               # is the peer warm? returns {warm|cooldown|cold|stopped}

<peer> is exactly one of chief or operator. The supervisor enforces the peer set; unknown peers raise unknown_peer. The mailbox is implemented over ops.agent_inbox + Postgres LISTEN/NOTIFY (#71) — no Redis pub/sub, no external broker.

Typical handoff. chief receives an email asking for a project status update → hq agent ask operator "current state of guisoft ticketing dashboard?" → operator queries graph, returns a 3-paragraph summary → chief drafts the reply (Sonnet, or Opus if customer artifact threshold) → chief sends. Both halves of the handoff are persisted in ops.agent_actions (#75); the handoff itself is captured as two action rows linked by parent_id.

16.6 Cross-cutting protocols

Destructive actions route through the friction-floor-zero 4-class gate (#69). Either agent emitting a destructive action calls the request_approval custom tool, which writes a proposal node and a Telegram nudge to Wilson. Wilson is the gate.
Cross-scope writes (chief touching graph data; operator drafting customer-visible text) are blocked by per-agent tool registry curation (#70). The blocked action returns a typed error and a hint to ask the peer via hq agent ask.
Action capture is non-negotiable. Every tool call by either agent (or any subagent) writes a row to ops.agent_actions per #75 §1; downstream Wilson edits are captured per #75 §2 and folded into the success-examples database per #75 §3–§5.

17. `chief` — outward-facing operator

Workspace: agents/chief/. Mounted memory directory: /memories/agents/chief/.

17.1 Role and ownership

chief is the one and only outward-facing voice of Capitão Ventures. It owns every artifact a customer, prospect, partner, or Wilson would read.

Owns (full customer-outcome surface, per #74 amended by #77):

Email outbox — drafting, threading; approval-required by default per #74 §9; manual graduation per scope via hq agent ungate chief --action=email_send.
Prospect pipeline tracking and proposal drafting (Opus on artifact creation).
Customer artifact lifecycle, end-to-end (#77 §2): drafting → in-flight edits → post-dispatch amendments → archival. Includes proposals, contracts, briefs, post-mortems, status reports, and any other outward-facing document chief produces or revises.
Multi-source deep-dive context gathering (#77 §1): reads across project folders, the knowledge base, the email archive, drive files, calendar history, meeting transcripts, and external customer resources. Spawns customer-deep-dive for any non-trivial multi-source synthesis so chief's main context stays clean.
Customer-driven project work (#77 §3): when a customer asks for a change scoped inside an existing project (copy fix on the marketing site, status-report regeneration, deliverable tweak), chief spawns project-agent:<slug> directly via hq project run and reviews the subagent's output before publishing. code-reviewer / test-engineer subagents are available to chief for verifying customer-driven code changes.
Morning brief at 07:00 (Opus, plan-mode) — the single Wilson-facing daily summary, including graduation-readiness and capability-utilization signals.
Customer-success follow-ups, renewal nudges, billing-relationship questions.
Outbound calendar invitations and meeting prep notes (calendar invites with external attendees route through request_approval until graduated).
The voice rules in /memories/agents/chief/personality.md — tone, language defaults (PT-PT for Portuguese contacts, EN for international), per-customer style notes.

Does not own (system-level state — routes to operator):

Graph entity props, engagement props (chief reads; operator writes).
Code outside the customer-artifact path allow-list (system code, runtime, agents/<other>/, src/, migrations/) — operator owns; chief asks via hq agent ask operator.
Schema migrations, KB taxonomy changes, finance ledger writes — operator (with request_approval for irreversibles).
KB ingest pipeline runs (operator); chief reads the KB but does not ingest.
Direct Postgres writes (operator only).
Direct email send / calendar send while in approval-required mode for the relevant scope.

The clean rule: chief owns customer-facing outcomes; operator owns system-level state. When in doubt, ask operator — the round-trip is cheap.

17.2 Triggers

Kind	Value
Beat	`0 7 * * *` (07:00 morning brief, Opus, plan-mode)
Outbox topics	`email.received`, `email.thread.updated`, `entity.temperature_changed`, `engagement.stage_changed`, `proposal.draft_requested`, `customer.churn_signal`, `wilson.dm`, `prospect.created`, `interaction.overdue`
Wilson inbox	enabled (always)
Calendar	webhook on event creation/cancellation if attendees include external contacts

17.3 Tools (#77 expanded surface)

SDK built-ins: Read, Grep, Glob, Bash (curated allow-list), Agent, WebSearch, WebFetch, Monitor, Edit, Write (NEW per #77 — scoped to customer-artifact paths via runtime path allow-list; writes outside the allow-list raise typed errors and route to operator).

Path allow-list for Edit/Write (enforced by runtime middleware):

outputs/proposals/, outputs/contracts/ (in-draft only — post-dispatch routes through request_approval)
outputs/briefs/, outputs/post-mortems/, outputs/status-reports/
outputs/customer-facing/<customer>/
agents/chief/personality.md (per-customer voice notes)
agents/chief/MEMORY.md (own dynamic memory)

Capitão registry tools (from src/tools/registry.ts, see §33):

Base set (every agent): hq_search, hq_describe, hq_event_log, hq_tools.
Outward set: hq_entity (read), hq_engagement (read+propose), hq_interaction, hq_proposal, hq_timeline.
Communication set: gmail_thread_read, gmail_search (full archive, all customers), gmail_send, calendar_read, calendar_create_event, telegram_send_to_wilson.
Deep-dive set (NEW per #77): kb_search (read-only KB query), hq_action_log (read-only on own trajectories), hq_trace_show (read-only).
Project set (NEW per #77): hq_project_run (spawn project-agent:<slug> for customer-driven work).
Learning set: hq_examples_find (#75 + #76 — trajectory-aware RAG retrieval).

MCP servers (in-process via create_sdk_mcp_server, #67 amended):

gmail-mcp — wraps the IMAP/SMTP poller as MCP tools so the SDK can stream message bodies without re-authenticating per call. Includes gmail_search across full archive (#77).
pipeline-mcp — exposes prospect-pipeline view + per-stage transitions as typed MCP tools.
drive-mcp (NEW per #77) — wraps mcp__claude_ai_Google_Drive__* for searching and reading drive files in service of customer questions.
calendar-mcp (NEW per #77) — extends calendar reads beyond calendar_read to historical event search.
kb-mcp (read-only mode for chief; full mode for operator) — KB queries.

17.4 Subagents allowed (max 2 in-process + 1 project subagent, per #77)

Existing (v1.8):

email-drafter — drafts a single email reply.
proposal-writer — drafts a structured proposal (Opus when invoked).
customer-brief — synthesizes a 1-page customer-context brief.
pipeline-analyst — pipeline-state analysis.
Explore — fast read-only codebase / KB search.

New per #77 (load-bearing for deep work):

customer-deep-dive — multi-source synthesizer. Reads across project folders, KB, email archive, drive files, calendar history, meeting transcripts; returns a structured brief (see #77 §5 for format). Chief invokes for any non-trivial customer question requiring >2 sources.
kb-search — read-only deep KB query subagent (read paths only, no ingest); deeper than Explore for KB-specific synthesis.
project-agent:<slug> — project subagent loaded from /memories/projects/<slug>/CLAUDE.md. Chief spawns directly via hq project run when work is customer-driven (e.g., "fix the typo on the landing page that the prospect noticed"). The project subagent's writes are scoped to its own project; commits to main route through code-reviewer per #74. One project subagent at a time on top of the 2 in-process cap.
code-reviewer, test-engineer — available to chief when chief commissioned a customer-driven change via project-agent:<slug>. Used to verify before publishing/PRing.

Subagents are loaded via the Agent tool from agents/chief/subagents/<name>.md.

17.5 Budget

Control	Value
`daily_usd_soft`	$2.50
`daily_usd_hard`	$7.00
`tokens_per_run_cap`	200 000
`max_turns_per_query`	30
`opus_turns_per_day_cap`	12 (alarms at 10)

17.6 Permissions (#77 expanded)

Scope: read-everywhere-relevant, write-customer-artifacts, draft-outbound, spawn-project-subagent-for-customer-work, propose-graph-mutations.
Read-everywhere (#77 §1): all project folders read-only, KB read-only, full email archive, drive files, calendar history, meeting transcripts, action ledger / own trajectories.
Direct writes allowed (path allow-list enforced by runtime — see §17.3): customer-artifact files in outputs/proposals/, outputs/contracts/ (in-draft only), outputs/briefs/, outputs/post-mortems/, outputs/status-reports/, outputs/customer-facing/<customer>/, plus own workspace files (agents/chief/personality.md, agents/chief/MEMORY.md). Entity-temperature observations as event.kind='development' via hq CLI. Calendar invite drafts (proposed via request_approval if attendees are external).
Spawn rights expanded (#77 §3): chief may spawn project-agent:<slug> directly for customer-driven work; chief may spawn code-reviewer and test-engineer to verify the project subagent's output. System-driven project work still routes through operator.
Approval-required (default for outbound, #74 §9): all outbound emails (gmail_send) and calendar invites with external attendees (calendar_create_event) route through request_approval until Wilson runs hq agent ungate chief --action=<email_send|calendar_send> [--scope=…] for the relevant scope.
Approval-required (cross-scope, #69): edits to post-dispatch / executed contracts route as proposed amendments via request_approval. Customer-driven changes to system code (anything outside the path allow-list) route through operator.
Direct writes blocked (route through operator): graph entity properties, engagement props (chief proposes), system code paths (src/, agents/<other>/, migrations/, cmd/), finance ledgers, KB taxonomy.
Destructive actions (mass_email, entity_delete, engagement_close, contract dispatch post-execution, etc.): friction-floor-zero 4-class gate (#69), request_approval to Wilson, default deny — unaffected by graduation status.
hq_actor: agent:ventures:chief. Spawned subagents inherit parent_actor=agent:ventures:chief; project subagents spawned by chief use agent:project:<slug> with parent_actor=agent:ventures:chief (operator-spawned project subagents use parent_actor=agent:ventures:operator — the action ledger distinguishes).

17.7 Success metrics

Text-axis (artifact-level, from #75):

proposal_acceptance_rate_30d (target: >70%).
email_reply_clean_accept_rate_30approvals (target: ≥70% of approval-required drafts shipped without Wilson edit). Graduation signal #1 (#74 §9).
email_reply_diff_score_p50_30approvals (target: ≤0.15 — minor tweaks only). Graduation signal #2.
email_reply_rejection_count_14d (target: 0). Graduation signal #3.
wilson_pinned_examples_count_per_action_type (target: ≥5). Graduation signal #4.

Trajectory-axis (procedure-level, from #76) — equally load-bearing:

trajectory_clean_rate_30tasks (target: ≥70% of email_reply tasks have zero bad or missing annotations; tracked via ops.action_annotations).
bad_action_count_14d (target: ≤2; declining trend more important than absolute number).
missing_step_count_14d (target: ≤3; high values mean chief is consistently skipping a context-gathering step — surface the most frequent gap pattern in the morning brief).
auto_promoted_with_caveat_share_30d (informational: share of promotions that needed a corrective; declining over time means chief is internalizing the procedures).

Operational:

customer_response_latency_p50_hours (target: <4 h end-to-end including approval queue when in Tier 1).
morning_brief_freshness (target: 100% delivered by 07:30).
approval_queue_age_p50_hours (if p50 > 4 h, surface an alert in morning brief recommending direct-send graduation for low-stakes scopes).
trajectory_capture_completeness (target: 100% — every email_reply task writes one email_reply_sessions row; gap = P1 incident).

Capability-utilization (added per #77 §9):

customer_deep_dive_invocation_rate_30d (target: 30-50% of email replies; too low = chief is missing context, too high = chief is being lazy and outsourcing context-gathering it could do directly).
customer_artifact_edit_rate_30d (number of outputs/customer-facing/* writes per week; tracks whether chief is actually using the new write surface).
project_subagent_spawn_by_chief_rate_30d (number of hq project run calls by chief vs. by operator; tracks whether customer-driven project work is correctly routed through chief vs. needlessly going through operator).
cross_scope_violation_count_14d (target: 0 — attempted writes outside the path allow-list, caught by runtime middleware; any non-zero is a P2 incident).

17.8 Decision tree — act / spawn / ask (#77 §4)

When an inbound customer ask arrives, chief walks this tree before doing anything else. Plain-language version; the threshold "up to 2 sources" is a heuristic about context-window hygiene, explained immediately below.

Inbound customer ask arrives
│
├── Can chief answer by reading up to 2 small sources directly?
│   ├── Yes → chief reads inline; drafts.
│   └── No  → chief spawns `customer-deep-dive` with a focused question;
│             receives a structured brief; drafts on top of it.
│
├── Does the ask require changing an artifact?
│   ├── In-draft customer artifact (proposal/contract/brief still being built)
│   │     → chief edits the file directly via Edit/Write.
│   ├── Post-dispatch contract (already sent to the customer / signed)
│   │     → chief drafts an AMENDMENT (new file) + `request_approval` (cross-scope per #69 — legal binding).
│   └── Customer-driven change inside a project repo (e.g., copy fix on the marketing site)
│         → chief spawns `project-agent:<slug>` via `hq project run`.
│         (If the change is system-driven, not customer-driven, ask operator instead.)
│
└── Is any factual claim about graph state involved?
    (project status, scheduling, blockers, invoice state, who-said-what-when)
    → ALWAYS `hq agent ask operator` BEFORE drafting. No exceptions.
       Operator owns the graph; chief is not allowed to guess facts.

What counts as a "source". A source is one discrete chunk of context chief has to read to answer. Each of these counts as one: the inbound email thread (always source #1 by default), one entity record in the graph (hq describe entity:<…>), one project state file, one KB article, one contract/proposal file, one drive file, one past meeting transcript, one prior email thread (different from inbound), one external URL the customer linked.

Why the threshold. It's about where the reading happens:

Sources needed	Inline cost	Subagent cost	Winner
1	~3 s, ~2 K tokens added	~10 s, ~1 K tokens added	Inline — subagent overhead doesn't pay off
2	~6 s, ~5 K tokens added	~15 s, ~1 K tokens added	Inline — barely; depends on source size
3+	~15+ s, ~15-30 K tokens added (pollutes context)	~25 s, ~1.5 K tokens added	Subagent — keeps chief's context clean for drafting

The threshold is heuristic, not a hard rule. Chief should err toward the subagent if individual sources are large (long PDFs, multi-page contracts) even at 2 sources, and toward inline if all sources are tiny (a single timeline + 1 KB article = ~500 words total).

Concrete examples — up to 2 sources, read inline: "What time is our meeting tomorrow?" (calendar = 1), "Did João reply about the SLA last week?" (thread + 1 timeline query = 2), "What's our standard response-time SLA?" (1 KB article). Three or more sources, spawn deep-dive: "Can you summarize where we are with Frama overall?" (engagement + last 5 interactions + project state + open tasks + KB = 5+), the contract clause example from §17.1 (contract + 2 amendment precedents + KB compliance article + prior threads + operator check = 5+), "What did we promise the customer in the kickoff vs. what's in the contract?" (transcript + contract + proposal + RFP = 4).

17.9 Workspace files

agents/chief/CLAUDE.md — role, voice, ownership, the decision tree (§17.8 mirrored), the operational walk-throughs (deep-dive synthesis, contract amendment, customer-driven project work, morning brief, escalation, proposal drafting), the hq examples find usage rules, the destructive-action gate language. agents/chief/playbook.md — standard procedures: how to triage an inbound email, how to draft a proposal, how to draft a morning brief, how to handle a customer escalation, how to handle a missed deadline. agents/chief/personality.md — voice rules, language defaults, per-customer style notes (Frama formal+brief, PetVitaClub warm+chatty, Garq technical+precise, …). agents/chief/MEMORY.md — dynamic cache, decay-managed by memory-tender. agents/chief/subagents/customer-deep-dive.md, agents/chief/subagents/kb-search.md — subagent definitions per #77 §3.

18. `operator` — inward-facing operator

Workspace: agents/operator/. Mounted memory directory: /memories/agents/operator/.

18.1 Role and ownership

operator is the one and only system-mutating actor in Capitão Ventures. It owns the graph, the codebase, the file system, the KB, and the action ledger itself.

Owns:

Graph data: producer/event ingestion, entity reconciliation, edge maintenance, gap-view triage.
Code execution: spawning project subagents per /memories/projects/<slug>/, running test/lint/build pipelines, committing code via code-reviewer + test-engineer subagent collaboration.
File operations: workspace structure, KB taxonomy, raw-source ingestion, archival.
Finance operations: invoice reconciliation against TOConline (read-only source of truth), expense categorization, revenue variance flags.
Daily learning loop at 22:00 (Opus, plan-mode) — the success-examples auto-promotion pipeline (#75 §3).
Action ledger maintenance: nightly auto-promotion job, wilson_pinned example mirror generation, ledger health checks.
Schema evolution: ADR drafting (Opus mandatory) and migration authoring (delegated to database-specialist subagent).

Does not own:

Outbound communication (delegates to chief via hq agent ask).
Customer-visible artifacts (delegates to chief).
Wilson-facing summaries (those are chief's morning brief).

18.2 Triggers

Kind	Value
Beat	`0 22 * * ` (22:00 daily learning loop, Opus, plan-mode); `0 3 * *` (03:00 nightly graph + ledger reconciliation)
Outbox topics	`producer.unmapped`, `event.unclassified`, `task.assigned_to.operator`, `task.assigned_to.project:`, `proposal.kind=schema_change`, `proposal.kind=migration`, `feature.status_changed.`, `kb.ingest.completed`, `finance.anomaly`, `agent_incident.created`, `wilson.dm`
Wilson inbox	enabled (always)
File watchers	`~/capitao-knowledge-base/raw/`, `~/capitao-command-center/proposals/` (newly emitted proposals from chief)

18.3 Tools

SDK built-ins: Read, Write, Edit, Bash (curated allow-list with broader scope than chief), Glob, Grep, Agent, Monitor, NotebookEdit.

Capitão registry tools (from src/tools/registry.ts, see §33):

Base set: hq_search, hq_describe, hq_event_log, hq_tools.
Inward set: hq_entity (read+write), hq_engagement (read+write), hq_task (read+write), hq_proposal (read+write), hq_timeline (read+write).
System set: worker_run, migration_plan, migration_apply, kb_ingest_run, graph_reconcile, finance_import.
Ledger set: hq_action_log (admin queries on ops.agent_actions), hq_edit_log (admin queries on ops.wilson_edits), hq_examples_promote (auto-promotion driver), hq_examples_pin (manual curation).

MCP servers (in-process):

graph-mcp — exposes the bitemporal graph as typed MCP tools (faster than CLI for complex traversals).
kb-mcp — exposes KB ingest pipeline + lint queries.
finance-mcp — wraps TOConline read-only API.

18.4 Subagents allowed (max 2 concurrent + 1 project subagent)

project-agent:<slug> (one at a time per project; loaded on demand from /memories/projects/<slug>/CLAUDE.md), code-reviewer, test-engineer, debugger, database-specialist, kb-ingest, Explore.

18.5 Budget

Control	Value
`daily_usd_soft`	$4.00
`daily_usd_hard`	$10.00
`tokens_per_run_cap`	250 000
`max_turns_per_query`	50
`opus_turns_per_day_cap`	18 (alarms at 14)

18.6 Permissions

Scope: read-graph, write-graph, write-fs, write-code, propose-schema.
Direct writes allowed: every internal node kind under #69's friction-floor-zero rule, every file under /home/athena/capitao-* and /var/cache/capitao/, code commits via the code-reviewer gate.
Direct writes blocked: outbound email (chief), customer-artifact files in outputs/customer-facing/ (chief).
Destructive actions (schema_drop, mass_delete, force_push, migration_irreversible, secret_export): friction-floor-zero 4-class gate (#69), request_approval to Wilson, default deny.
hq_actor: agent:ventures:operator. Spawned project subagents use agent:project:<slug> with parent_actor=agent:ventures:operator.

18.7 Success metrics

task_completion_rate_30d (target: >85% of operator-driven tasks closed within SLA).
code_review_pass_rate_first_attempt (target: >70%).
kb_ingest_freshness (target: <24h lag from raw drop to wiki article).
graph_gap_resolution_p50_hours (target: <12h for gap_unresolved_intents).
success_example_promotion_rate_30d (target: >60% of new actions auto-promoted within 7 days).
learning_loop_completion (target: 100% — 22:00 loop runs every day).

18.8 Workspace files

agents/operator/CLAUDE.md — role, ownership, the six operational walk-throughs (data-organization focus), the destructive-action gate, the action-ledger discipline. agents/operator/playbook.md — standard procedures: how to ingest a new producer, how to reconcile entities, how to spawn a project subagent, how to run the 22:00 learning loop, how to draft an ADR. agents/operator/personality.md — voice rules for internal artifacts (terse, citation-heavy, structured); how to write commit messages; ADR rhetoric. agents/operator/MEMORY.md — dynamic cache, decay-managed.

18.5 Project subagents — on-demand (#74 amends #53)

Project agents are no longer persistent. They are loaded on demand by operator via the SDK Agent tool, with a system prompt assembled from three markdown files at runtime.

18.5.1 Lifecycle

For every active project node with priority IN ('focus', 'now'):

A workspace exists at /memories/projects/<slug>/ containing CLAUDE.md, customer-profile.md, domain-knowledge.md, playbook.md, personality.md, MEMORY.md (dynamic).
A producer node is registered: cc:agent:project:<slug> (created idempotently when the workspace is first loaded).
No systemd unit. No cron. No warm window.

When operator needs to act on a project, it calls:

hq project run <slug> "<task description>"

Sugar for Agent(subagent_type="project:<slug>", prompt="<task description>"). The Agent tool reads the workspace, composes the system prompt (/memories/_shared/CLAUDE.md + /memories/projects/<slug>/CLAUDE.md + /memories/projects/<slug>/MEMORY.md), runs to <task-complete/>, and exits.

Cold-start cost: ~2 seconds (no warm window). RAM peak: ~600 MB while running, freed on exit.

18.5.2 Workspace template

/memories/projects/_template/CLAUDE.md:

---
name: project:{{slug}}
description: Operator for project {{title}}. Reads assigned tasks, executes, reports, flags blockers.
parent: agent:ventures:operator
---

# Project agent — {{title}}

## Scope
This project only. {{description_md}}.

**MAY:** read/write own project repo (via worktrees), create/update tasks and features within this project, propose milestones/developments/actions as events, spawn Explore/code-reviewer/test-engineer/debugger subagents, flag blockers.

**MAY NOT:** touch other projects, write to entity nodes, sign off deliverables, commit to main without code-reviewer pass.

## Model policy
Inherits #74 §2 — Sonnet default; Opus on the mandatory triggers.

## Tools
Inherits operator's inward set + per-project additions per `customer-profile.md` declared `project_type`.

18.5.3 Initial roster (Wave 3)

Based on 2026-04-22 priorities (unchanged from v1.7):

Focus (hard cap 3):

project:capitao-command-center (dogfood — operates its own codebase).
project:garq-pdm-consulta (customer, contract stage).
project:frama-b2b-maintenance (customer, delivery).

Now (~6):

project:popdigit-tourism (partner).
project:beepenger-budget (proposal).
project:gopecauto-officegest (proposal).
project:capitao-consulting-site (own marketing).
project:membriko (own product).
project:ghostpost (own product).

Other projects (arisilvahelenos, ferroembrasa, guisoft, safaa, personal, ti-milha) keep workspaces at /memories/projects/<slug>/ but are not loaded by operator until a triggering event arrives.

19. Infrastructure agents

19.1 agent-supervisor (#68 amended by #74)

Language: Go (tiny RSS, fast startup)
Always-on: yes (systemd Restart=always)
Routing table (v1.8): {chief, operator}. Project subagents are spawned in-process via the SDK Agent tool by operator (§18.5); the supervisor does not route to them directly.
Responsibilities:
- Hold Postgres LISTEN/NOTIFY subscriptions for every outbox topic chief or operator cares about (see §17.2 and §18.2 trigger lists).
- Watch ops.agent_inbox for Wilson DMs and inter-agent mailbox traffic between chief ↔ operator (§16.5).
- On notify event: read the two-row routing table, deliver the event to the warm agent's query loop (or wake from cooldown/cold).
- Enforce concurrency cap = 2 top-level agents running + 2 in-process subagents per top agent simultaneously (4 total query loops max).
- Event coalescing: ≥3 events for the same agent within a 2 s window become one wake with the batch.
- RAM-aware scheduling: read /proc/meminfo before waking a cold agent; if available <600 MB, hold the wake and retry every 5 s.
- Emit Prom metrics: supervisor_agents_warm{agent}, supervisor_events_queued{agent}, supervisor_wakes_total{agent}, supervisor_ram_aware_holds_total, supervisor_subagent_spawns_total{parent,subagent}.

19.2 agent-watchdog (#64)

Language: Python (SQL + Prom-scrape heavy)
Always-on: yes
Check cadence: every 30 seconds
Checks: heartbeat (TTL gauge), activity timeout (no tool calls 10 min during open query), loop detection (same tool sig >4×), budget consumed, error rate, Stop integrity (no task-complete but max_turns hit), memory pool sanity, subagent leak, RSS drift
Actions by severity: soft (DM agent to check in), medium (force-stop query + fresh session), hard (systemd restart), critical (quarantine + review)
Writes: agent_incident nodes + ops.agent_error_log

19.3 meta-watchdog

Language: Bash + systemd timer
Cadence: every 60 seconds
Check: is agent-watchdog.service running and healthy?
Action: systemd restart + Telegram page Wilson if failed

19.4 memory-tender

Cadence: weekly (Sunday 04:00)
Duties: walk each agent's memory, archive decayed entries, merge near-duplicates, flag contradictions
Cost: ~$0.05/week on Haiku

Part V — Runtime & Infrastructure

20. Agent runtime

20.1 AgentRuntime class — shape and responsibilities

Python module at src/runtime/agent_runtime.py (~500 LOC total). One file; all agents share it. Per-agent behavior comes from markdown config + prompt, not from code.

class AgentRuntime:
    """
    Runs one agent for one trigger batch. Exits after <task-complete/>.
    Reloaded per spawn by the supervisor.
    """

    def __init__(self, role: str, config_path: str, events_stdin: list[dict]):
        self.role = role
        self.config = MarkdownConfigParser(config_path).parse()   # §17.3 format
        self.events = events_stdin
        self.session_id = None
        self.store = PostgresSessionStore(os.environ["HQ_DB_URL"])
        self.budget = CostBudget.from_config(self.config.budget)

    async def run_once(self) -> int:
        """Entrypoint. Returns exit code (0=ok, 1=budget, 2=error, 3=watchdog-killed)."""
        await self.store.connect()
        self.session_id = await self.store.get_or_create_session(self.role)

        if self.store.is_fresh_session(self.session_id):
            reconstruction = await self._build_reconstruction_header()
        else:
            reconstruction = ""        # resumed session still has context

        prompt = reconstruction + self._render_event_batch(self.events)

        try:
            async for message in query(
                prompt=prompt,
                options=self._build_sdk_options()
            ):
                await self._on_message(message)
                if self._detect_task_complete(message):
                    await self._finalize_session(message)
                    return 0
        except BudgetExceeded:
            await self._freeze_self()
            return 1
        except KeyboardInterrupt:           # SIGTERM from watchdog
            await self._partial_finalize()
            return 3

        return 2                            # fell off without task-complete

Full implementation spec is a Wave 1 artifact (§49).

20.2 PostgresSessionStore — see ADR #72

Canonical schema and adapter live in DECISIONS.md #72 (Anthropic PostgresSessionStore reference port; storage table ops.agent_sessions(id BIGSERIAL, key TEXT, entries JSONB, created_at TIMESTAMPTZ DEFAULT now()) indexed on (key, id); CI conformance gate via claude_agent_sdk.testing.run_session_store_conformance(...); local-disk primary at /var/cache/capitao/sessions, Postgres mirror async + best-effort; cold-start restore order pg_restore → disk_restore → fresh). The earlier hand-rolled schema in this section was superseded by #72 in v1.7 and removed in v1.8.

20.3 Markdown config parser

~80 lines of Python. Reads the agent's .md file; extracts:

Frontmatter (name, description)
## Model policy table → dict of condition → model
## Triggers table → {beat, outbox_topics, wilson_inbox}
## Tools list
## Subagents allowed list
## Budget table
## Permissions table
## Session lifecycle narrative (parsed minimally; code has defaults)
## Prompt reference → loads prompt file verbatim

21. Task-complete lifecycle (#62)

21.1 The sentinel

Every agent prompt contains:

When you have truly finished your current unit of work AND are not waiting on any tool result, subagent, review decision, Wilson input, or other agent's proposal — emit on its own line:

<task-complete reason="..."/>

Only emit when truly done. If waiting for anything, stay in the turn.

21.2 Stop hook

async def _detect_task_complete(self, message) -> bool:
    """Scan final message for the sentinel."""
    for block in getattr(message, "content", []):
        if getattr(block, "type", None) == "text":
            if "<task-complete" in block.text:
                self.task_complete_reason = self._extract_reason(block.text)
                return True
    return False

async def _finalize_session(self, final_message):
    # 1. Ask SDK for a compact memory snapshot turn
    snapshot = await self._request_memory_snapshot()

    # 2. Write agent_session node
    await self.db.execute("INSERT INTO nodes (type, props, ...) VALUES ('agent_session', ...)")

    # 3. Parse snapshot into memory_entry nodes
    await self._persist_memory_entries(snapshot)

    # 4. Write agent_handover_memo
    await self._write_handover_memo(snapshot, final_message)

    # 5. Archive transcript
    await self.store.archive(self.session_id)

    # 6. Emit outbox event
    await self.db.execute("INSERT INTO ops.outbox (topic, payload) VALUES ('agent.session_closed', $1)", ...)

    # 7. Process exits (caller returns from run_once with 0)

21.3 Warm window (chief + operator only)

Both top-level agents have a 90-second warm window post-task-complete. Project subagents and on-demand worker subagents skip the warm window (full process exit per task).

if self.config.warm_window_seconds > 0:
    await self._wait_for_event_or_timeout(self.config.warm_window_seconds)
    if new_event_arrived:
        # Fresh session starts in same process
        self.session_id = None
        await self.run_once()           # recurse with new events
    else:
        return 0                         # exit process

Saves cold-start overhead during bursts. Default warm_window_seconds = 90 for chief and operator; 0 for everyone else.

21.4 Fallback lifecycles

Fallback	Trigger	Purpose
Auto-compaction	Context > 75% of window	In-place compaction; keep current task intact
Nightly rotation	03:00 local, still-live sessions	Forced clean rollover with handover memo
Budget-cap rotation	Hard cap hit	Freeze + fresh session after un-freeze
Crash recovery	systemd restart	Resume from PostgresSessionStore

22. The agent-supervisor process

22.1 Implementation sketch

// cmd/agent-supervisor/main.go  (~200 LOC)
package main

import (
    "github.com/lib/pq"
    ...
)

type Supervisor struct {
    db          *sql.DB
    routing     map[string]AgentRoute    // topic -> agent role
    running     map[string]*exec.Cmd     // role -> process
    concurrency int                       // cap = 3
    mu          sync.Mutex
}

func (s *Supervisor) Listen() {
    listener := pq.NewListener(dsn, ...)
    for _, topic := range s.subscribedTopics() {
        listener.Listen(topic)
    }
    for notif := range listener.Notify {
        events := s.coalesce(notif)      // batch same-role events within 2s
        s.trySpawn(events)
    }
}

func (s *Supervisor) trySpawn(events []Event) {
    s.mu.Lock()
    defer s.mu.Unlock()

    role := events[0].Role
    if _, alreadyRunning := s.running[role]; alreadyRunning {
        s.enqueue(events)                // buffer; dispatched when current finishes
        return
    }
    if len(s.running) >= s.concurrency {
        s.enqueue(events)
        return
    }
    if !s.ramAvailable(600_000_000) {    // 600 MB free required
        s.enqueue(events)
        return
    }

    cmd := exec.Command("hq", "agent", "run", role)
    cmd.Stdin = strings.NewReader(eventsJSON(events))
    cmd.Start()
    s.running[role] = cmd

    go func() {
        cmd.Wait()
        s.mu.Lock()
        delete(s.running, role)
        s.drainBuffer()                  // dispatch queued events if slots free
        s.mu.Unlock()
    }()
}

Full implementation is Wave 1 artifact (§49.5).

23. Service matrix

23.1 Always-on services

Service	Language	RAM steady	CPU	Purpose
postgresql	C	700-1000 MB	burst	Graph store, queue, analytics
valkey	C	60-100 MB	low	Cache, pub/sub, rate-limiter
pgbouncer	C	5-10 MB	low	Connection pooling :6432
agent-supervisor	Go	20-30 MB	low	Event routing, concurrency
agent-watchdog	Python	50-70 MB	low	Health checks
prometheus	Go	100-150 MB	low	Metrics
next.js	Node	180-220 MB	burst	Admin UI
caddy	Go	20-30 MB	low	Reverse proxy, TLS, service wake-up
node_exporter	Go	15-20 MB	low	OS metrics
postgres_exporter	Go	20-30 MB	low	Postgres metrics
hq-exporter	Node	35-45 MB	low	Custom metrics
ubuntu + systemd	—	~300 MB	—	OS base

Always-on total: ~1.5-1.9 GB.

23.2 On-demand services

Service	Trigger	RAM when active	Auto-shutdown
whisper-stt	meeting-transcribe triggers	~1.5-2 GB	on completion
grafana	First `/grafana/*` request via Caddy	~130 MB	10 min idle
next.js admin (if idle-tuned further)	First HTTP request	~180 MB	configurable

Voyage 3.5-lite (ADR #82) is SaaS, not on-demand local — no RAM cost, no socket activation. Reached over HTTPS by embed-worker.

On-demand services run 0 MB when idle.

23.3 Always-on agents (chief + operator with 90 s warm window)

Agent	RAM during warm window	RAM during active query
chief	~280-400 MB (Python + SDK + workspace context)	~500-700 MB (with 1 subagent active)
operator	~280-400 MB	~600-850 MB (with project subagent or code-reviewer subagent active)
agent-watchdog	~50-70 MB	~70-100 MB (during SQL-heavy checks)
agent-supervisor	~25-35 MB (Go binary)	same
ledger-flusher	~30-50 MB	~60-90 MB (during batch flush)

23.4 Ephemeral subagents (spawn-run-exit)

Subagent	RAM while running	Duration typical
Single in-process subagent (Explore, code-reviewer, email-drafter, …)	~150-280 MB on top of parent	10 s - 3 min
Two concurrent subagents (cap)	~300-560 MB on top of parent	rare; heavy analysis
Project subagent with code tasks	~400-600 MB on top of operator	minutes
Worker subagent (kb-ingest, finance-import)	~80-180 MB on top of operator	seconds to minutes

24. Service tuning — locked day-0 flags

24.1 Postgres 17 (`/etc/postgresql/17/main/postgresql.conf`)

shared_buffers = 256MB
effective_cache_size = 2GB
work_mem = 8MB
maintenance_work_mem = 64MB
max_connections = 30
wal_buffers = 16MB
random_page_cost = 1.1
track_io_timing = on
jit = off
max_parallel_workers_per_gather = 2

Expected RSS: 700-1000 MB steady.

24.2 Valkey (`/etc/valkey/valkey.conf`)

maxmemory 96mb
maxmemory-policy allkeys-lru
save ""
appendonly no
tcp-keepalive 60

Expected RSS: 60-100 MB.

24.3 Prometheus flags

--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=800MB
--query.max-samples=5000000
--scrape.interval=30s

24.4 Next.js

NODE_OPTIONS="--max-old-space-size=200 --no-warnings"
NEXT_TELEMETRY_DISABLED=1

24.5 Caddy site blocks (`/etc/caddy/Caddyfile`)

Grafana wake-up handler (socket-activated; cold start on first request):

grafana.internal.capitao {
    @first_visit not header Cookie *grafana_session*
    handle @first_visit {
        exec systemctl start grafana.service
        respond "Starting Grafana, refresh in 2s..." 202
    }
    reverse_proxy localhost:3000
}

Command Center UI (subdomain, #78). Day-0 binds to Tailscale; F27 lock at Wave 2 may swap bind tailscale0 for a public posture (OAuth or IP allow-list):

command-center.capitao.consulting {
    bind tailscale0          # Wave 1: Tailscale-only. Removed on F27 lock.
    encode gzip zstd
    log {
        output file /var/log/caddy/command-center.log
        format json
    }
    @md query format=md
    handle @md {
        header Content-Type "text/markdown; charset=utf-8"
        reverse_proxy 127.0.0.1:3001
    }
    reverse_proxy 127.0.0.1:3001
}

The @md matcher implements #25's ?format=md symmetry: HTML and markdown share one upstream (the Next.js process) and the route handler decides which view to render. Drift detection: curl …/roadmap?format=md byte-equals state/roadmap.md (modulo whitespace).

24.6 systemd socket activation for Whisper

(TEI socket activation removed per ADR #82 — embedding inference moved off-host to Voyage 3.5-lite. Whisper stays local.)

# /etc/systemd/system/whisper-stt.socket
[Socket]
ListenStream=127.0.0.1:8210

[Install]
WantedBy=sockets.target

# /etc/systemd/system/whisper-stt.service
[Service]
ExecStart=/usr/local/bin/whisper-start-wrapper
EnvironmentFile=/etc/capitao/secrets.env

The wrapper script starts Whisper, keeps alive 5 min of idle, then stops. VOYAGE_API_KEY lives in the same secrets.env (mode 0600, athena:athena) and is loaded by embed-worker via EnvironmentFile= in its own systemd unit.

24.7 cgroup limits per agent

/etc/systemd/system/capitao-agent@.service.d/limits.conf:

[Service]
MemoryMax=900M
MemorySwapMax=400M
CPUQuota=200%

Protects the box from a runaway agent.

25. RAM budget — 8 GB envelope (#66)

25.1 Realistic usage over time

Scenario	RAM	% of 8 GB
Overnight (supervisor + watchdogs only)	~1.6 GB	20%
Normal business hours	~2.0-2.8 GB	25-35%
Busy afternoon (2 agents concurrent)	~3.0-3.5 GB	38-44%
3 agents + 1 subagent each (realistic peak)	~3.5-4.0 GB	44-50%
Ceiling: 3 agents × 2 subagents + Grafana	~4.4-4.7 GB	55-59%
+ Whisper transcribing (briefly allowed over)	~7.0-7.5 GB	88-94%

Headroom at typical load: 4-5 GB free for Postgres page cache, burst absorption, Grafana sessions. Page cache keeps search queries <40ms p95.

25.2 Supervisor RAM-aware rules

if available_ram < 600 MB:        hold new agent spawns; queue events
if available_ram < 400 MB:        kill concurrency cap to 1 until memory frees
if swap_used > 500 MB sustained:  alert Telegram + pause non-essential agents

26. Authentication — Max OAuth (#57)

26.1 Setup (one-time per 12 months)

# On the VPS, as capitao user:
claude setup-token

# Result: prints 1-year OAuth token.
# Store in /etc/capitao/agents.env (chmod 600, owner capitao:capitao):
#   CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...

26.2 systemd unit drop-in

[Service]
EnvironmentFile=/etc/capitao/agents.env
User=capitao

Applied to every agent service.

26.3 Token rotation watcher

A weekly cron job decodes the JWT, checks expires_at. If < 30 days remain, opens a review node asking Wilson to re-run claude setup-token. No silent expiry.

26.4 License compliance

Max OAuth is authorized by Anthropic for local development and personal automation. Capitão Command Center operates Capitão Ventures internally; it is not resold. Authorized use.

If Command Center ever becomes a SaaS product, switch to API-key authentication (ANTHROPIC_API_KEY). No code changes needed — SDK auto-detects.

27. Rate-limit and cost control

27.1 Plan-level limits

Max 20× plan: 5-hour rolling windows. With 2 always-on top agents + on-demand subagents (typically 1-2 active at a time during business hours), typical spend stays under 30% of plan cap. Bursts during customer incidents or heavy code work can hit 70%+.

27.2 Mitigations (built into runtime)

Event-driven, not cron-driven. Both top agents wake on outbox events; the only fixed cron beats are 07:00 (chief brief) and 22:00 (operator loop). Burn rate scales with workload, not with the clock.
Supervisor concurrency cap (§19.1) — 2 top agents + max 2 in-process subagents per top agent = 4 query loops total.
Exponential backoff on 429 — Valkey-shared rate-limiter coordinates across both agents.
Cost-aware demotion — if 7-day moving avg trends toward plan cap, the per-turn model picker demotes routine Sonnet turns to Haiku; Opus triggers (#74 §2) remain mandatory and are never demoted.
Circuit breaker at 90% of plan — pause operator's 22:00 learning loop and any non-emergency project subagents; keep chief live for customer-facing work.
Per-agent daily hard caps — agent freezes itself at its own cap, opens review via request_approval.
hq autonomy freeze --reason "..." — manual emergency stop.

27.3 Cost telemetry

Every LLM call writes to ops.llm_call_log:

CREATE TABLE ops.llm_call_log (
  id           bigserial PRIMARY KEY,
  agent_id     text NOT NULL,
  session_id   uuid,
  trace_id     uuid REFERENCES nodes(id),
  model        text NOT NULL,
  tokens_in    int,
  tokens_out   int,
  cost_usd     numeric(8,4),
  latency_ms   int,
  canary_id    text,
  purpose      text,
  started_at   timestamptz DEFAULT now()
);
CREATE INDEX ON ops.llm_call_log (agent_id, started_at DESC);

Grafana panel per-agent-cost-24h + Prom gauge agent_run_cost_usd_24h{agent="..."}.

Part VI — Tools, Skills, and Surfaces

28. The `hq` CLI — canonical action surface

Form: hq <noun> <verb> [--filters] [--json | --text]

Exit codes: 0 ok / 1 user-error / 2 system-error / 3 not-found.

Universal flags: --json, --text (default), --actor='<string>', --reason='<string>'.

28.1 Reads (safe)

hq search <query>
hq describe <slug|uuid>
hq entity find --email|--phone
hq entity profile <slug>
hq engagement list --stage <stage>
hq project list --priority <focus|now|next|backlog>
hq task list --owner --priority
hq interaction list --entity --limit
hq timeline [--since] [--entity]
hq review list | show <id>
hq expectation list [--status] [--direction]
hq intent list [--kind] [--status]
hq gap list                         # all gap_* views
hq gap show <gap_name>
hq as-of <timestamp> describe <slug>
hq memory search --role <role> --query <terms>
hq trace {show,inputs,decided,replay,explain} <trace-id>
hq producer health [<slug>]
hq playbook {list,show}
hq proposal {list,show} --kind <kind>
hq autonomy status
hq watchdog status
hq agent {list,status,attach} [<role>]

28.2 Writes (produce outbox events)

hq entity create --name --kind person|org [--email] [--phone]
hq entity merge <loser> --into <winner>     # always reviews
hq interaction log --channel --from|to|cc --subject --body-file --thread-id
hq conversation create --kind --started-at [--participant]
hq engagement create --entity --stage --name --price [--maintenance-months]
hq engagement stage <slug> --to <stage>
hq project create --entity --name --slug [--lead] [--tech-stack]
hq feature create --project --title --slug [--complexity S|M|L|XL]
hq task create --title [--project|--feature] --priority --owner --due
hq task complete <slug>
hq task assign <slug> --to <contact>
hq event create --kind --about [--impact]
hq intent create --kind --about --extracted-from [--urgency] [--due-at]
hq expectation create --kind --direction --about --owed-by --owed-to [--due-at]
hq review resolve <id> --choice <opt-N> [--note]
hq review defer <id> [--until <ts>]
hq review dismiss <id> --reason
hq proposal propose --kind --evidence <json> [--actor <agent>]
hq proposal rollback <id>
hq autonomy freeze [--reason] [--until <ts>]
hq autonomy thaw
hq autonomy kill --loop <playbook|calibration>
hq playbook archive <slug> --reason

28.3 Agent control (new in v1.5)

hq agent run <role>                     # supervisor entrypoint; reads events from stdin
hq agent send <role> <message>          # DM an agent; tails response
hq agent attach <role>                  # live-tail transcript
hq agent pause <role> [--for <duration>]
hq agent resume <role>
hq agent restart <role> [--fresh-session]
hq agent handover <role>                # force session rotation now
hq agent status <role>

28.4 MCP fallback

hq mcp-serve                            # only enabled on hot paths; socket-activated

Not used in default config. Reserved for measured need.

29. Skills catalog

29.1 Mandatory skills (every agent)

Skill	Grunt / state	Purpose
`caveman`	Full	Token compression (internal reasoning + inter-agent writes)
`caveman-compress`	installed	Compresses long memory files
`hq-actor-attribution`	always on	Ensures `hq.actor` GUC set on every write
`cost-budget-guard`	always on	Enforces daily caps; aborts on overrun
`session-distill`	always on	Stop-hook: reads transcript, proposes events

29.2 Role-specific skills (catalog reference) — v1.8 collapsed

Full catalog in .skills/INDEX.md. The v1.7 per-role skill split (10 sets × 4 skills) collapses into two larger sets owned by the two top agents. Many skills still exist; ownership simplifies.

Agent	Skills
`chief`	entity-brief, interaction-log, draft-outreach, relationship-temperature, pipeline-report, proposal-draft, stage-advance, visitor-analytics-digest, renewal-watch, upsell-probe, nps-signal, action-now-render, morning-brief, weekly-digest, examples-find
`operator`	project-health, roadmap-show, blocker-probe, scope-diff, search, adr-draft, dependency-audit, complexity-review, invoice-chase, recurring-materialize, revenue-variance, playbook-draft, kb-gap-scan, kb-ingest, kb-query, daily-learning-loop, examples-promote, examples-find, incident-cluster, prompt-propose, config-propose, seed-case-author
Project subagent (template)	search, scope-diff, adr-draft, complexity-review, blocker-probe (loaded from `/memories/projects/<slug>/playbook.md`)

29.3 Community skills via `gh skill`

gh skill install JuliusBrussee/caveman caveman
gh skill install JuliusBrussee/caveman caveman-compress
gh skill update --all                                 # weekly cron

Own skills authored locally; not gh skill published (internal only).

30. Caveman policy (#55)

30.1 Default state

All agents operate under caveman full for internal reasoning, tool calls, inter-agent writes (proposal bodies, reasoning_trace notes, enrichment output).

30.2 Mandatory carve-outs — switch to normal mode

Every agent's prompt embeds:

Before producing ANY artifact intended for Wilson or a customer — memo,
task.title, task.description, review.question_text, email body, proposal
text, invoice line items — emit `normal mode` on its own line, produce
the artifact in clear human English (or Portuguese), then emit
`/caveman full` on its own line before continuing.

NEVER apply caveman to:
  - memo.content_md
  - task.title / task.description (human-visible)
  - review.question_text / review.options[]
  - interaction.body (outbound)
  - playbook.body_md (read by LLMs AND Wilson)
  - any document body (contracts, proposals, invoices)

30.3 Language fallback

Portuguese customer → memo in Portuguese
English codebase → task titles in English
Set on entity.props.preferred_language

31. MCP policy (#67)

Default: no persistent MCP server. Agents invoke hq <verb> via Bash.

Conditions for enabling hq mcp-serve:

A tool is called >50× per agent per hour, measured over 7 days
Subprocess-spawn latency (>100ms p95) measurably harms agent latency
Observed in production, not theoretical

When enabled: socket-activated; starts on first request; exits after 5 min idle. Same on-demand pattern as Whisper STT.

32. The five agent surfaces (#21)

32.1 AGENTS.md hierarchy

Root AGENTS.md (~200 lines): what this is, orientation order, canonical commands, where state lives, forbidden actions, pointers
MEMORY.md (Hermes-sized, ~2000 chars): minimum context for bounded-memory agents
Nested AGENTS.md in .skills/, src/cli/, src/workers/, src/runtime/, migrations/, state/, schemas/, agents/

32.2 `.skills/` catalog

agentskills.io-compliant. Symlinked to ~/.claude/skills/capitao/. See §29.

32.3 `state/` filesystem mirror

Maintained by view-renderer worker. ≤5s lag. Read-only for agents.

Layout:

state/
├── INDEX.md
├── focus.md · now.md · next.md · backlog.md
├── action-now.md                       ← the killer view (§40 of workflows)
├── projects/<slug>/README.md
├── ventures/<slug>.md
├── tasks/{focus,now,blocked,due-this-week}.md
├── agents/
│   ├── INDEX.md
│   ├── ventures/<role>.md
│   └── projects/<slug>.md
├── producers/INDEX.md
├── timeline/YYYY-MM-DD.md
└── system/
    ├── learning.md                     ← nightly autonomy digest
    ├── agent-incidents.md              ← watchdog output
    └── agent-costs.md                  ← per-agent daily/weekly

32.4 `schemas/` JSON Schema catalog

Every node type + edge type + CLI command + webhook has a schema. Flat (no $ref to externals). Consumed by LangChain's strict tool mode.

32.5 `hq` CLI (§28)

33. Agent tool registry & per-agent curation (#69, #70)

Decision #70 takes the shell-invocable surface (#67, §28) and adds a structured layer: every worker action and every hq verb is registered once as an Anthropic-format tool definition (JSON Schema, strict: true, additionalProperties: false, optional input_examples), exposed natively to agent runtimes (Claude Agent SDK, Hermes, LangChain). Decision #69 sets the write-contract on top of that surface: agents write directly within their domain scope; gating is post-hoc and reserved for blast-radius actions only.

33.1 Single source of truth

src/tools/
├── registry.ts                    ← canonical TypeScript tool spec — one entry per tool
├── handlers/
│   ├── hq_task.ts                 ← per-tool handler + types
│   ├── worker_run.ts
│   └── … (one file per tool)
├── exporters/
│   ├── anthropic.ts               ← → tools[] for Claude Agent SDK
│   ├── hermes.ts                  ← → ./.hermes/plugins/<name>.md
│   └── langchain.ts               ← → BaseTool[]
└── schemas/tools.json             ← auto-generated, committed for diffing
schemas/tools.md                   ← human-readable docs auto-rendered by `hq tools docs`

One source. Three exporters. Schema doc auto-published. Adding a 19th tool = one PR touching one directory. No drift across consumers.

33.2 The 18-tool catalog

Consolidated by domain — action enums collapse what would otherwise be 50+ verbs.

Tool	Type	Purpose
`hq_search`	read	Polymorphic search across all node kinds (slug, text, vector, hybrid). Always-on.
`hq_describe`	read	Get one node's full state and direct edges. Always-on.
`hq_timeline`	read	Chronological event/interaction stream for an entity, project, or engagement.
`hq_trace`	read	Replay an LLM reasoning trace — inputs, decision, evidence, replay, explain.
`hq_memory`	read	Query agent memory by role, salience, time window, free-text.
`hq_producer`	read	Producer health — last_seen_at, throughput, error rate per data source.
`hq_playbook`	read	List/show playbooks with status (canary / active / decayed / archived).
`hq_tools`	meta	Discovery. Returns the full catalog with per-tool access status (`granted` / `request`). Cheap (~200 tokens). Loaded for every agent.
`hq_task`	write	`action='create'
`hq_engagement`	write	`action='create'
`hq_event_log`	write	Record `event.kind ∈ {action, development, state_change, milestone}`. Always-on for non-read agents.
`hq_entity`	write	`action='find'
`hq_interaction`	write	Log a manual interaction (note / call / in-person). Ingest workers log automatically.
`hq_review`	write	`action='list'
`hq_proposal`	gated	The path for the four gated cases (destructive / cross-scope / system-behavior / heuristic-flag — see §33.6 via #69). `action='propose'
`hq_autonomy`	control	`action='status'
`hq_agent`	control	`action='list'
`hq_examples_find`	read	RAG retrieval over the success-examples DB (#75 §5). `--action-type X --tags Y --query Z --top N`. Both agents call this before high-stakes drafts.
`hq_action_log` / `hq_edit_log`	read	Admin queries on the action ledger and Wilson-edit ledger. Restricted to `operator`.
`hq_examples_pin`	write	Manual pin of a success example so it never decays (`operator` invokes on Wilson's request).
`worker_run`	execute	Trigger any of the 16 workers on a specific input. Workers also run autonomously — see §33.5.

Total tokens fully loaded ≈ 12.6 K. With per-agent curation (§33.4) the median agent loads ~5.5 K (~55% reduction).

33.3 Tool definition shape

Every tool follows the Anthropic tool-definition contract:

{
  "name": "hq_<noun>",
  "description": "<3-5 detailed sentences — the single biggest performance lever per Anthropic>",
  "strict": true,
  "input_schema": {
    "type": "object",
    "properties": { "action": { "enum": ["..."] }, "...": { "..." } },
    "required": ["action"],
    "additionalProperties": false
  },
  "input_examples": [ { "..." }, { "..." } ]
}

Output contract — every handler returns this shape:

{
  "ok": true,
  "action": "create",
  "data": { "slug": "...", "id": "..." },
  "audit_event_id": "evt_01HXY...",
  "trace_id": "trc_01HXY...",
  "error": null
}

High-signal returns only — slugs, UUIDs, counts. Never raw rows. audit_event_id enables one-command rollback via hq_proposal rollback; trace_id feeds hq_trace explain.

33.4 Per-agent curation (v1.8 collapsed roster)

Each agent's markdown config declares ## Tools with base: true (always-on) and a role-specific add: [...] list. The agent-supervisor (§22, #68) reads this at boot and assembles the registry subset passed as tools=[...] on every API request.

Base set (every agent, ~2 K tokens): hq_search, hq_describe, hq_event_log, hq_tools.

Agent	Adds (on top of base)	Total	Tokens (≈)
`chief`	`hq_timeline`, `hq_entity` (read), `hq_engagement` (read+propose), `hq_interaction`, `hq_proposal`, `hq_examples_find`, `hq_action_log` (read-only on own trajectories), `hq_trace_show`, `hq_project_run`, `gmail_thread_read`, `gmail_search` (full archive), `gmail_send`, `calendar_read`, `calendar_create_event`, `telegram_send_to_wilson`, `kb_search` (read-only), `Edit` + `Write` (path allow-list), drive-mcp (read), calendar-mcp (read history), kb-mcp (read-only)	26	~14.0 K
`operator`	`hq_timeline`, `hq_entity` (read+write), `hq_engagement`, `hq_task`, `hq_proposal`, `worker_run`, `hq_trace`, `hq_memory`, `hq_producer`, `hq_autonomy`, `hq_agent`, `hq_playbook`, `hq_examples_find`, `hq_examples_promote`, `hq_examples_pin`, `hq_action_log`, `hq_edit_log`, `migration_plan`, `migration_apply`, `kb_ingest_run`, `graph_reconcile`, `finance_import`	26	~16.5 K
project-subagent (template, loaded on demand)	`hq_timeline`, `hq_task`, `worker_run`, `hq_proposal`, `hq_examples_find`	9	~6.0 K

Median: chief ~14.0 K / operator ~16.5 K (both grew with #77 / #76 deltas). Discovery: any tool not in an agent's set is one hq_tools(action='list') hop away; access expansion via hq_tools(action='request', tool='X', reason='Y') lands as a proposal for Wilson approval (#69 cross-scope class).

Note (v1.8 tool count): both agents now cross the ~20-tool embedding-search threshold (#74 amended #67; #77 expands chief). Loading strategy below (§33.7) opts both into Phase 2 (Tool Search Tool with defer_loading=true for cold-tier tools).

Edit/Write for chief — path allow-list (#77 §6): the runtime middleware enforces a path allow-list on chief's Edit and Write calls. Allowed paths: outputs/proposals/, outputs/contracts/ (in-draft only — post-dispatch routes through request_approval), outputs/briefs/, outputs/post-mortems/, outputs/status-reports/, outputs/customer-facing/<customer>/, agents/chief/personality.md, agents/chief/MEMORY.md. Writes outside the allow-list raise a typed cross_scope_violation error and a hint to ask operator. Violations are tracked as cross_scope_violation_count_14d (target: 0).

33.5 Dual-run workers — autonomous + tool

Decision #70 keeps autonomous worker execution intact and adds the tool surface as a second invocation path. Same handler, two callers.

Worker	Autonomous trigger	Tool surface
`email_ingest`	IMAP IDLE (always-on)	`worker_run(worker='email_ingest', params={force_refresh:true})`
`whatsapp_ingest`	daemon push	`worker_run(worker='whatsapp_ingest', params={since_ts})`
`meeting_transcribe`	inotify on hot folder	`worker_run(worker='meeting_transcribe', params={file_path})`
`enrich`	BullMQ on `interaction.created`	`worker_run(worker='enrich', params={interaction_id})` for re-enrich
`summarize`	BullMQ on payloads >50 K tokens	`worker_run(worker='summarize', params={interaction_id})`
`reconcile_llm`	enrichment writer sub-call	`worker_run(worker='reconcile_llm', params={candidate_set})`
`triage`	BullMQ on `interaction.enriched`	`worker_run(worker='triage', params={interaction_id})`
`view_renderer`	BullMQ on every `ops.outbox` event	`worker_run(worker='view_renderer', params={node_type, slug})` debug
`embed_worker`	BullMQ on `nodes.text_changed`	`worker_run(worker='embed_worker', params={node_ids})`
`kb_indexer`	inotify on `wiki/`	`worker_run(worker='kb_indexer', params={path})`
`profile_worker`	BullMQ on entity-touching outbox event	`worker_run(worker='profile_worker', params={entity_id})`
`review_applier`	BullMQ on `review.resolved`	`worker_run(worker='review_applier', params={review_id})`
`semantic_dedup`	nightly cron	`worker_run(worker='semantic_dedup', params={window_days})`
`playbook_proposer`	nightly cron	`worker_run(worker='playbook_proposer', params={trigger_kind})`
`calibration_analyzer`	weekly cron	`worker_run(worker='calibration_analyzer', params={since:'7d'})`
`calibration_applier`	event on `calibration_proposal.created`	`worker_run(worker='calibration_applier', params={proposal_id})`
`agent_research`	22:00 daily cron	`worker_run(worker='agent_research', params={window:'24h'})`
`proposal_analytics_mirror`	5-min poll	`worker_run(worker='proposal_analytics_mirror', params={customer_slug})`
`toconline_sync`	30-min poll	`worker_run(worker='toconline_sync', params={since_ts})`
`shopify_sync`	webhook + 1-h poll	`worker_run(worker='shopify_sync', params={shop, since_ts})`
`todoist_mirror`	webhook + 5-min poll	`worker_run(worker='todoist_mirror', params={direction})`

The producer registry (#26) attributes both: triggered_by: 'cron:agent_research' vs triggered_by: 'agent:project:gopecauto'.

33.6 Direct-write default — friction-floor-zero (#69)

Decision #69 sets the write contract on top of the tool registry. Default = direct write with audit trail. Every tool call lands in ops.outbox + ops.llm_call_log + agent_runs; audit_event_id enables one-command rollback (hq_proposal rollback <id>).

Synchronous gating reserved for four classes only:

Destructive actions — financial / legal / compliance writes, schema removals, edge-orphaning merges, deletion of another agent's writes.
Cross-scope writes — project-A agent touching project-B's subgraph; any agent touching company-level state or another agent's config.
System-behavior changes — autonomy thresholds, agent prompts, model routing, canary fractions (continues through #49).
Heuristic flags — magnitude cap exceeded, novelty, conflicting evidence, security-scan hit.

These four route via the request_approval custom tool to Wilson (#69 + #74). The handoff writes a proposal node (audit trail) and a Telegram nudge; Wilson's accept/reject lands as an ops.wilson_edits row tied to the originating action. There is no terminal-writer chokepoint for routine writes; the action ledger (#75) provides the audit plane.

33.7 Loading strategy

Phase	When	Strategy	Token cost
Phase 1 — Now	≤ 25 tools per agent's curated set	Load all curated tools directly. No Tool Search Tool.	Median 5.5 K / agent
Phase 2	When agent-specific catalogs exceed 30 tools	Tool Search Tool — keep `hq_search` / `hq_task` / `hq_describe` / `worker_run` / `hq_event_log` always loaded; defer the rest with `defer_loading: true`.	~85% reduction
Phase 3	Bulk orchestration (e.g., `operator`'s 22:00 loop mining 200 incidents + 50 wilson_edits)	Programmatic Tool Calling — model writes Python in code-execution sandbox; intermediate results never enter context.	~37% reduction on bulk tasks

33.8 Restrictions matrix (v1.8)

Tool / verb	Granted to
`hq_autonomy` (any verb)	`operator` (with Wilson confirmation via `request_approval` for `freeze` and `kill`)
`hq_agent` write verbs (pause/resume/restart/handover)	`operator`
`hq_proposal rollback`	`operator` (chief proposes via mailbox handoff)
`hq_entity merge`	nobody direct — always proposal via `request_approval`
`gmail_send`	`chief` only (operator has no outbound email permission). Default mode: approval-required — every call wrapped by `request_approval` per #74 §9. Direct send only after Wilson runs `hq agent ungate chief --action=email_send [--scope=…]`.
`migration_apply`	`operator` only, plus `request_approval` if migration is irreversible

Anyone needing a restricted tool escalates via hq_tools(action='request', tool='X', reason='Y') — request becomes a proposal Wilson approves.

33.9 What this changes in the existing plan

Replaces the wording in superseded #51/#52 (Tier 1 / 10-agent) — both top agents now write directly within their respective scopes; request_approval (#69) is the gate for the four destructive classes.
Extends #54 ("Workers as tools") — workers stay shell-invocable, AND are first-class registry tools with structured schemas. Same handler, two surfaces.
Updates the agent-config ## Tools markdown section in §17.3 + §18.3 — now distinguishes SDK built-ins, Capitão registry tools, and in-process MCP servers.

33.10 Inter-agent CLI mailbox (#71)

Full rationale in DECISIONS.md #71. CLI verbs:

hq agent send <to> <message>                   # fire-and-forget
hq agent ask <to> <message> --timeout 30s      # sync RPC, blocks on reply (hard ceiling 5 min)
hq agent broadcast <group> <message>           # @ventures | @projects | @all
hq agent reply <message-id> <body>
hq agent inbox [--unread] [--from <agent>]
hq agent roster [--scope <team|project>]
hq agent presence <role>                       # last-seen, current state

Wire model. hq agent ask writes a row into ops.agent_inbox (extended per #60 amendment: from_agent TEXT, correlation_id UUID, expects_reply BOOLEAN) and emits an agent.message outbox event. Supervisor (#68) routes the event to the recipient — fork-execing it cold if not warm. Recipient calls hq agent reply <message-id> <body>, which emits agent.reply.<correlation_id>. Caller's runtime LISTENs on that channel and unblocks. Every agent stays a top-level SDK process (no SDK nesting; preserves #50). Sub-30 ms transport when both ends are warm; ~1.5 s when target is cold.

SDK surface. Inside each agent's session, a custom SendMessage tool (Anthropic-pattern JSON-schema, strict: true, per #70) wraps hq agent send / hq agent ask as a subprocess. This is NOT in the base toolset; agents declare it explicitly via add: [send_message]. from_agent is required on all agent-to-agent rows; NULL means Wilson. from_agent='wilson' magic strings are forbidden (CHECK constraint).

Coordination class split (amends #50). Synchronous Q&A and short-form delegation use the CLI mailbox (the fast path). Multi-step proposals, cross-tier reviews, and anything that must survive a process exit or be replayed by hq state rebuild continue through the graph (the primary path). Rule of thumb: if you'd want it replayable, it goes through the graph.

Part VII — Safety, Governance, and Learning

34. Autonomy framework (#49) — four layers

Every self-improvement (playbook promotion, calibration change, prompt tweak, config change) flows through:

Layer 1: Pre-apply gates
  ✓ dangerousness check (destructive → review queue)
  ✓ sample size (n ≥ threshold)
  ✓ magnitude cap (per-week delta limits)
  ✓ security scan (invisible Unicode, prompt-injection patterns, fenced context)
  ✓ freeze state (hq autonomy freeze → hold)
         │ passes
         ▼
Layer 2: Canary rollout
  • test_fraction = 20% traffic (default)
  • adaptive window: 48h min, 8 uses target, 14d max
  • canary_id on every reasoning_trace
         │ window closes
         ▼
Layer 3: Auto-decide
  • regression → rollback + alert
  • no change → shelved (30d cooldown)
  • improvement → promote + digest note
         │ promoted
         ▼
Layer 4: Drift monitor (continuous, 7-day rolling)
  • metric degradation > tolerance for 48h → auto-rollback
  • reason logged + Telegram to Wilson

Destructive carve-out (always reviews, never canary):

Entity merges orphaning ≥5 edges
Prop changes to fields tagged financial / legal / compliance
Node deletions
Schema migrations removing enum values
Prompt changes removing confidence-gating
Playbooks activating on financial workflows
Agent config changes affecting write scope or budget hard caps

Emergency controls:

hq autonomy freeze [--reason "..."] [--until <ts>]
hq autonomy thaw
hq autonomy kill --loop {playbook|calibration|all}
hq proposal rollback <id>

35. Confidence gating

Confidence	Action
≥ 0.95	Auto-apply deterministically
0.85 – 0.95	Auto-apply + soft review (compounds to profile)
0.70 – 0.85	Auto-apply with high scrutiny (canary eligible)
< 0.70	Queue as `review` node; human decides
Destructive, any confidence	Queue as `review` node

36. Watchdog tiers (#64)

Tier	Trigger	Action
Soft	First timeout, suspected loop	DM agent: "check in" — 2 min grace
Medium	Confirmed loop, repeated timeout, error burst	Force-stop current query, fresh session on next trigger
Hard	Crash, budget runaway, RSS explosion	systemd restart + review
Critical	>5 hard trips in 1 hour	systemd-stop + quarantine status + review

Every trip writes agent_incident node. Meta-watchdog guards watchdog.

37. Direct-write discipline (#69, #74 — replaces single-writer)

The v1.7 single-writer model (terminal writes only by triage-dispatcher) is dropped. With only two top agents, write-race risk is dominated by accidental overlap, not by deliberate dedup, and is solved with simpler tooling:

Per-agent scope. chief writes outbound (email, customer artifacts). operator writes inward (graph, files, code, finance). The tool registry (#70) blocks cross-scope writes at the harness layer.
Friction-floor-zero with destructive gate (#69). Each agent writes directly within its own scope. Destructive, cross-scope, system-behavior, and heuristic-flagged actions route to the request_approval custom tool, which becomes a proposal node visible to Wilson.
Action ledger as single source of truth (#75). Every write produces one ops.agent_actions row, regardless of agent. Conflicts surface as duplicate-target rows with overlapping timestamps; a nightly reconcile job (operator @ 03:00) flags them for the morning brief.
Inter-agent coordination via mailbox (#71). When chief needs a graph mutation, it hq agent ask operator. Operator owns the write. The handoff is logged as parent/child action rows.

Prevents write races (scope isolation), duplicate work (mailbox handoff), and actor-attribution confusion (action ledger).

38. Memory invariants

No <task-complete/> finalizes without ≥1 memory_entry OR explicit memory_entry_count=0 reason
No fresh session starts without successful reconstruction query
No memory_entry is deletable (archive only)
Every memory_entry has produced_by → reasoning_trace
Memory pool per agent capped at ~200 entries (enforced by memory-tender)

39. Action ledger + trajectory + success examples (#75 + #76) + Daily learning loop (#65 amended by #74)

The v1.7 "daily agent-research" agent collapses into operator's 22:00 learning loop. The loop runs four passes — trajectory annotation review, success-pattern promotion (text + procedure), incident mining, and roll-up — as a single Opus session, gated by friction-floor-zero (#69), and writes into the three-layer ledger: actions (#75 §1) + trajectories (#76) + success examples (#75 §3 + #76 trajectory_summary).

39.1 Action ledger schema (#75 §1, amended by #76 §1)

ops.agent_actions(id, actor, session_id, task_id, action_type, target_kind, target_id, input, output, status, model, cost_usd, duration_ms, parent_id, embedding, created_at). Every tool call by either top agent or any subagent appends one row. The task_id UUID groups all actions within a single agent task (one inbound event → one task → one task_id; subagents inherit task_id from their parent). The agent_runtime PostToolUse hook writes asynchronously to a Valkey stream; ledger-flusher drains the stream into Postgres in 1-second batches. Storage cost ~50 KB/row × 50 rows/day × 365 days ≈ 900 MB/year. Kept forever.

39.2 Wilson edit log (#75 §2)

ops.wilson_edits(id, action_id, edit_type, final_output, diff_summary, diff_score, notes, edited_at). Captured via three paths:

Path	Detection	edit_type values
Email send	Outbox watcher diffs `chief`'s draft against the actually-sent message in Gmail	accepted, tweaked, rewrote, rejected
Task / proposal / file edit	`hq` CLI wrappers on every Wilson-driven mutation persist pre/post snapshots	accepted, tweaked, rewrote
Acceptance with no change	Outbox watcher emits `edit_type='accepted'`, `diff_score=0.0` after 24 h with no Wilson modification	accepted
Rejection	`hq action reject <id> --reason=…`	rejected, abandoned

Diff summary: 1-line Haiku 4.5 generation (~$0.0002 per diff). Diff score: deterministic cosine distance on Voyage 3.5-lite embeddings (ADR #82) — the same embedding service that powers nodes.embedding. Each diff scoring call is ~2 short embed requests (draft + sent) at $0.02/M tokens; ~50 diffs/day × ~500 tokens each = ~$0.0005/day, rounded into the embedding line below.

39.2.5 Trajectory capture and per-action annotation (#76)

ops.email_reply_sessions — one row per email reply attempt. Captures: task_id, thread_id, customer_slug, inbound_thread (full snapshot), classification, draft_output, draft_model, trajectory_action_ids[] (ordered list of action_ids — the procedure), retrieved_example_ids[] (which success-examples chief used as RAG anchors), approval_status, final_output, final_diff_score. Schema is generalizable to other artifact types (proposal_sessions, adr_sessions, kb_ingest_sessions) in later waves; the email case is Wave 1 priority.

ops.action_annotations — Wilson's per-action grades. Captures: action_id, task_id, grade ∈ {good, bad, missing, unnecessary}, note, annotator, created_at. Annotations are written from two paths:

Path	Trigger
Inline approval UI	Wilson clicks thumbs/comment on any action in the trajectory pane while reviewing a draft (`request_approval` page)
Retrospective CLI	`hq action annotate <action_id> --grade=… --note="…"`

For grade='missing', the annotation is attached to the closest preceding action_id with a note describing what should have happened; the runtime renders this as an interleaved gap when displaying the trajectory.

Approval UI three-pane layout (#76 §4): left = inbound thread; center = chief's draft + retrieved success examples; right = ordered trajectory list with per-action thumbs-up/thumbs-down/missing-step buttons + comment boxes. Wilson can: (1) approve/edit/reject the draft (writes wilson_edits), (2) grade any action (writes action_annotations), (3) insert a missing step (writes action_annotations with grade='missing').

CLI surfaces:

hq trace show <task_id>                 # render full trajectory + annotations
hq trace gaps --customer=<slug> --since=14d  # all `missing` annotations grouped by pattern
hq action annotate <action_id> --grade=… --note="…"
hq examples find --include-trajectory   # default true for chief and operator (#76 §6)

39.3 Success examples DB and auto-promotion (#75 §3, amended by #76 §7)

Operator at 22:00 (Opus, plan-mode) runs the two-axis auto-promotion pipeline against ops.wilson_edits × ops.action_annotations from the past 24h. The text axis (Wilson edited the artifact) and the trajectory axis (Wilson graded the procedure) compose:

Edit type × trajectory annotations	Promotion
`accepted` AND no `bad` or `missing` annotations	`auto_promoted` after 7 days (clean text + clean process)
`accepted` AND ≥1 `missing` annotation	`auto_promoted_with_caveat` — the procedure embeds the `missing` note as a corrective; future retrievals see "next time also do X" inline
`tweaked AND diff_score < 0.20` AND no `bad` annotations	`auto_promoted` (Wilson liked both procedure and bones)
`tweaked` with ≥1 `bad` annotation	NOT promoted — extracted as anti-pattern lesson with the specific bad action highlighted
`rewrote` OR `rejected`	NOT promoted — anti-pattern lesson; trajectory annotations included in the lesson body
`wilson_pinned` (manual via `hq example pin`)	Bypasses all rules; never decays

The auto_promoted_with_caveat class is novel and important: it captures the case where Wilson said "the email was fine, but next time also check X." Too valuable to lose, not a clean exemplar — so the markdown mirror includes the missing step as a prescriptive instruction in the procedure section.

Promoted examples mirror to /memories/success-examples/<action_type>/<id>.md with PII redaction (Haiku strips names + addresses; replaces entity → <customer>, email_address → <email>, person names → <contact>). The mirror format includes a "What I queried (the trajectory)" section listing each action with its grade (✓ / ✗ / +missing) and Wilson's note — see #76 §5 for the full template.

39.4 Retrieval (used at draft time, trajectory-aware per #76 §6)

hq examples find --action-type <type> --tags <…> --query "<…>" --top <N> [--include-trajectory=true] returns top-N markdown cards. The --include-trajectory flag (default true for chief and operator; opt-out for cheap lookups) returns the full trajectory section — the procedure that produced the validated artifact — alongside the artifact itself.

Both agents are required (per their workspace CLAUDE.md) to call it before:

chief: every outbound email draft (regardless of approval-required vs direct mode).
chief: every customer-artifact draft (Opus tier).
operator: every meaningful file restructure or code commit.
operator: every ADR draft.

The retrieved cards are appended to the agent's prompt as a <past-wilson-validated-trajectories> block. The agent is instructed to follow the procedure in the retrieved cards before drafting (run the same queries, ask operator/chief in the same order, retrieve the same kinds of context), not just to mimic the text. Tone is the surface; procedure is the substance. This is trajectory-RAG over the agent's own validated outputs — no synthetic training data, no fine-tuning.

39.5 Incident mining (the other half of the 22:00 loop)

The same Opus pass also mines agent_incident nodes, ops.agent_error_log, the past 24h of action-ledger rows where status='failed', and operator's own self-reflection. Outputs four kinds of proposals (unchanged from v1.7):

Kind	Routes to	Example
Prompt tweak	`ops.improvement_proposals`, #49 canary	"chief hit 12 activity-timeouts; add 'if no new info in 5 turns, task-complete' rule"
New skill	operator drafts; Wilson approves	"3 actions independently derived Portuguese deadline extraction — create `extract-deadline` skill"
Config change	#49 canary	"project:garq-pdm exceeded budget 5/7 days; raise hard cap $5→$7 OR add summarize gate"
Seed case / test	`agents/operator/incident_corpus/`	"This loop pattern becomes an eval fixture"

39.6 Weekly and monthly aggregations

Operator on Sunday 04:00 (Haiku — cheap rollup) concatenates the week's success-examples + lessons into a single markdown index at /memories/success-examples/_weekly/<YYYY-Www>.md. Month-end concatenates four weeks. The morning brief on the first weekday of each week pulls the prior week's index as a "What we learned" section.

39.7 Cost and budget

Component	Daily cost (estimate)
Diff summary (50 actions/day × Haiku ~$0.0002)	$0.01
Trajectory summary generation per email_reply_session (~30/day × Haiku ~$0.0005)	$0.015
Embedding (50 actions × Voyage 3.5-lite, ~500 tokens each, $0.02/M)	~$0.0005 (≈ $0.02/month — included in main Voyage spend line)
Auto-promotion pipeline (Opus, ~35 K tokens — slightly larger than v1.8.1 because trajectory annotations are now an input)	$0.50
Markdown mirror generation with trajectory section (Haiku, ~12 K tokens)	$0.006
Mailbox + watchdog overhead	$0.00
Total	~$0.53/day, capped $0.60

Hard cap raised from $0.50 to $0.60 to cover trajectory-summary generation. Enforced by daily_usd_hard on operator's budget plus per-call cost telemetry.

40. Playbooks (#47)

40.1 Creation

Nightly playbook-proposer (driven by operator's 22:00 learning loop) scans for:

reconcile-llm ≥0.90 confidence + ≥3 evidence interactions + no reversal within 7d
review resolved with annotation
Chained outbox sequence completed cleanly
Error recovery after retries

Draft body: ≤4096 chars. Security scan: invisible Unicode, prompt-injection patterns, financial/legal/compliance keyword flags. Store at wiki/playbooks/<category>/<slug>.md; mirrored as playbook node.

40.2 Lifecycle

Canary 20% → auto-decide ≥75% → promote; <50% → archive. Decay: 90d unused → demote, 180d → archive. Drift monitor 7d rolling.

40.3 Consumption

Hybrid search in buildContext() returns top-2 status IN (canary, active) playbooks. Fenced as <playbook-context>[System note]...</playbook-context> to guard prompt injection.

41. Calibration loop (#48)

41.1 Observation tables (append-only)

ops.llm_call_log
ops.triage_override_log
ops.reconcile_auto_apply_log
ops.review_feedback_log
ops.schema_proposals_log
ops.applied_proposals_log

41.2 Pipeline

Weekly calibration-analyzer (Sun 09:00) detects override patterns; emits calibration_proposal to ops.pending_proposals. Hourly calibration-applier runs proposals through #49 framework. Config hot-reload on SIGHUP (no restart).

41.3 Tunables

config/reconcile.md — reconcile confidence thresholds
config/prompts.md — prompt versions per agent
config/triage-rules.md — triage priority/owner rules

All markdown — hot-reloadable by runtime on SIGHUP.

42. Schema evolution (#38)

Enrichment output includes pending_schema_proposals[] (new enum values seen). Monthly (1st Sunday) schema-analyzer aggregates, emits to review queue. Approved → enum migration + re-enrichment of historical interactions (12-month window).

Capitão Command Center — Plan