Part III — Data Model
8. Graph substrate
The data layer is fully specified in docs/architecture/data-layer.md. Summary of what v1.5 locks:
Core schema:
CREATE TABLE nodes (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
type text NOT NULL,
status text, -- projection column
priority text, -- projection column
occurred_at timestamptz, -- projection column
archived_at timestamptz, -- soft delete
props jsonb NOT NULL DEFAULT '{}',
tags text[] DEFAULT '{}',
full_text tsvector,
embedding vector(512), -- voyage-3.5-lite output (ADR #82)
producer_id uuid NOT NULL REFERENCES nodes(id), -- provenance
owner_id uuid REFERENCES nodes(id),
external_source text,
external_id text,
valid_during tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now(),
redaction_policy text DEFAULT 'none',
confidence real
);
CREATE TABLE edges (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
from_id uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
to_id uuid NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT,
type text NOT NULL,
props jsonb NOT NULL DEFAULT '{}',
valid_during tstzrange NOT NULL DEFAULT tstzrange(now(), null, '[)'),
producer_id uuid NOT NULL REFERENCES nodes(id),
confidence real,
created_at timestamptz DEFAULT now()
);
Bitemporal shadow tables (history_nodes, history_edges) record every mutation with op (insert/update/delete/archive), actor (from hq.actor GUC), reason (from hq.reason GUC), full prior row as JSONB.
Indexes: btree on id/external_id; gin on props (jsonb_path_ops), full_text, tags; hnsw on embedding; gist on valid_during; partial btree on (type, status), (type, priority), (type, occurred_at).
Extensions used: pgcrypto, pg_trgm, btree_gin, pg_stat_statements, pg_cron, AGE (Cypher fallback), pgvector, pgvectorscale, timescaledb-apache, pg_partman, pg_duckdb, pg_search, auto_explain.
9. Node types (complete catalog — 28 types)
CRM core:
| Type | Purpose |
|---|---|
entity |
Person or organization (customers, partners, prospects, own ventures) |
contact |
Individual contact; multi-entity-capable |
engagement |
Commercial unit — one per proposed deal; stages discovery→proposal→contract→delivery (or partner/declined) |
Execution hierarchy:
| Type | Purpose |
|---|---|
project |
Execution wrapper; priority axis + status + tech stack |
feature |
Discrete deliverable within a project; complexity + acceptance criteria |
task |
Unit of work; dev_* + ops_* fields; Todoist mirror |
deliverable |
Shipped artifact (URL, repo commit, document) — distinct from document |
Communication:
| Type | Purpose |
|---|---|
interaction |
Every conversation — email, WhatsApp, phone, meeting |
conversation |
Session wrapper around agent runs, meetings, or email threads |
Commercial flow:
| Type | Purpose |
|---|---|
quote |
Proposal draft before engagement promotion |
invoice |
Mirrored from TOConline |
payment |
Settles invoice; partial/full/credit note |
expense |
Charged to engagement |
Signal & value layer (the v1.4b unlock):
| Type | Purpose |
|---|---|
intent |
What a customer is asking for (ask_reply, ask_budget, ask_document, ask_development, ask_fix, ask_meeting, ask_decision, inform, confirm, approve, complain, churn_signal, expand_signal, thank) |
expectation |
Accountability unit — what's owed, by whom, when; we_owe / they_owe / mutual |
turn_state |
Per-conversation state: theirs / ours / third_party / blocked / closed |
Knowledge & outputs:
| Type | Purpose |
|---|---|
document |
File, PDF, transcript, invoice, contract |
memo |
Synthesized output (WBR, briefing, proposal text) |
kb_article |
Wiki mirror from ~/knowledge-base/wiki/ |
decision |
ADR; supersedable |
risk / open_question |
Unresolved concerns affecting projects |
Agent & governance:
| Type | Purpose |
|---|---|
producer |
Intake registry — every data source is a first-class node |
event |
Timeline marker (milestone / development / action / state_change) |
agent_run |
One tool-calling session of an agent |
agent_session |
Supervisor-spawned session; parent of agent_runs |
agent_handover_memo |
Memory bridge between sessions |
memory_entry |
Distilled fact, pattern, preference, anti_pattern, watch_item — with salience + decay |
reasoning_trace |
LLM call audit (inputs, output, confidence, canary_id) |
agent_incident |
Watchdog-detected issue |
review |
Confidence-gated or destructive decision awaiting human/agent resolution |
playbook |
Procedural memory (auto-proposed, canary-rolled, auto-demoted) |
proposal |
Agent's proposed write; awaits dispatcher resolution |
triage_decision |
Companion node capturing triage-proposed values vs. actual writes |
alert |
Prometheus-fired alert |
shopify_order |
Imported from Shopify |
10. Edge types (complete catalog)
Provenance & participation:
produced_by— every node → producer (mandatory)authored_by— document/memo → contact/agent_runfrom/to/cc/has_participant— interaction → contacttouched— agent_run → any node read or written during the run
Containment & composition:
part_of— interaction/agent_run → conversation; event → conversationhas_feature— project → featurehas_task— feature → taskbelongs_to— task → feature (primary)
Relationships:
about— interaction/event/memo → entity/project/feature/engagementmentions— document/memo → entity/contact/task/kb_articleowned_by— project/contact → entityworks_for— contact → entityreports_to— contact → contact
Commercial:
fulfills— project → engagement (M:N)covers— engagement → entityproposes— quote → engagementbilled_to— invoice → entityfor_work— invoice → engagement / project / featuresettles— payment → invoicecharged_to— expense → engagement
Dependencies & dependencies semantics (v1.5 richer than v1.3.1):
blocked_by,blocks,depends_on,related_to,supersedes,superseded_by,duplicates,duplicate_of,spawned_from
Intent & expectation flow:
extracted_from— intent → interactionaddressed_to— intent → contactfulfilled_by— intent/expectation → interaction/document/event/taskspawned— intent/event/interaction → taskowed_by/owed_to— expectation → contact/entity
Agent & learning:
input_to— reasoning_trace → context nodesdecided— reasoning_trace → affected nodesvalidated_by— memory_entry → reasoning_traceapplied_by— playbook → agent_runresolved_by— review → contact/agent_runlinked_trace— agent_incident → reasoning_trace
Multi-role assignment (DACI):
driven_by— task → contact/agent (exactly 1)accountable— task → contact (exactly 1)consulted— task → contact/agent (0..N)informed— task → contact (0..N)
11. Producer registry (#26)
Every fact carries produced_by → producer. Five kinds:
| Kind | Example slugs | Attribution |
|---|---|---|
conversation_channel |
email:zoho-wilson, whatsapp:wilson-personal, meeting:zoom, phone:meo |
Per-channel ingest worker |
agent_session |
claude-code:vps, claude-code:laptop, hermes:vps, langchain:vps |
Per-session; agent_run rolled up |
project |
project:garq-pdm-consulta, project:capitao-command-center |
Semantic emission only (events, not raw) |
external_system |
toconline:capitao, shopify:petvitaclub, todoist:wilson, github:wcapitao |
Per-system poller / webhook |
internal_worker |
cc:view-renderer, cc:triage-worker, cc:agent:ventures:chief, cc:agent:ventures:operator, cc:agent:project:garq-pdm |
Self-referential |
Each agent is a producer.kind='internal_worker' with a role prop (ventures:chief, ventures:operator, project:garq-pdm, etc.). This enables per-agent queries: "show me everything chief produced this week" is one WHERE producer.slug = 'cc:agent:ventures:chief'. Subagent attribution flows through the action ledger's parent_id (#75 §1).
12. Intent + expectation + turn_state — the value unlock
These three node types elevate the schema from "record of what happened" to "state of obligations and turns."
12.1 Intent
Every interaction produces 0..N intents via the enrichment worker.
intent
kind ask_reply | ask_budget | ask_document | ask_development
| ask_fix | ask_meeting | ask_decision | ask_intro
| inform | confirm | approve | complain
| churn_signal | expand_signal | thank
urgency blocker | impactful | nice
explicit_due_at timestamptz (when the sender stated a deadline)
confidence 0.0-1.0 (enrichment confidence)
evidence_span text quote
status open | fulfilled | abandoned | superseded
edges:
extracted_from → interaction
about → entity | engagement | project | feature
addressed_to → contact
fulfilled_by → interaction | document | event | task
12.2 Expectation
Every promise — ours or theirs — is an expectation.
expectation
kind commitment_made | ask_received | sla | recurring_obligation
| deliverable_promised
direction we_owe | they_owe | mutual
asked_at timestamptz
due_at timestamptz (nullable)
resolved_at timestamptz (nullable)
status open | overdue | resolved | abandoned | superseded
severity blocker | impactful | nice
description_md short text
sla_source engagement | policy | explicit | derived
edges:
about → entity | engagement | project | feature
spawned_from → intent | event | interaction
owed_by → contact | entity
owed_to → contact | entity
fulfilled_by → interaction | document | event | task
supersedes → expectation (when replaced)
Recurring obligations (monthly invoice, weekly standup, quarterly review) use kind='recurring_obligation' + RRULE. pg_cron materializes next instance as predecessor closes.
12.3 turn_state
Maintained per conversation by the enrichment worker.
turn_state (1:1 with conversation)
state theirs | ours | third_party | blocked | closed
last_turn_at timestamptz
last_turn_by contact_id | agent_id
turnaround_sla_hours integer (inherited from engagement)
overdue_at timestamptz (computed: last_turn_at + sla when state='ours')
One query answers "who am I ignoring right now?":
SELECT conversation.id, entity.name, now() - last_turn_at AS waiting
FROM turn_state JOIN conversations USING (conversation_id)
JOIN entities ON ...
WHERE state = 'ours' AND overdue_at < now()
ORDER BY overdue_at ASC;
13. Memory persistence contract (#63, amended by #71)
Every persistent agent's memory is three-layered. Full rationale in DECISIONS.md #71 and #63 (amended).
13.0 Three-layer model
- Static layer — per-agent
.mdfiles in the workspace (agents/<scope>/<slug>/CLAUDE.md,playbook.md,personality.md; project agents addcustomer-profile.md,domain-knowledge.md). Committed to git, rarely changes, token-budgeted (see §17.x). Loaded verbatim as static context at session start. - Dynamic working layer —
MEMORY.mdper agent. Anthropic-standard 25 KB cap. Written by the agent during a session viamemory: projectSDK frontmatter. Mirrored from the graph nightly bymemory-tender. Not authoritative — a cache only. - Long-term graph-backed layer —
memory_entry,agent_handover_memo,reasoning_trace,agent_sessionnodes (unchanged, see §13.1–§13.3 below). Authoritative source of truth.
Invariant C11': MEMORY.md is a cache. Agent code MUST NOT treat MEMORY.md as authoritative. Missing or stale files MUST be rebuilt by memory-tender from the graph. agent-watchdog (#64) enforces this: if MEMORY.md is absent or stale beyond decay_after_days on session open, a tender pass runs before any event is processed.
Context-budget alarm (from #59) lowered from 75% → 70% to compensate for the static layer.
Every persistent agent's long-term memory lives in the graph. Three node types form the contract:
13.1 agent_session
One per agent run lifecycle. Created on spawn; finalized on task-complete or rotation.
agent_session
agent_id
started_at / ended_at
turns_taken
cost_usd
model_used
trigger_kind beat | outbox | dm
task_summary_md short narrative of what this session did
snapshot_md the handover memory snapshot
reason_closed from <task-complete reason="..."/>
edges:
produced_by → agent's producer node
part_of → conversation (if multi-session task)
13.2 memory_entry
Distilled facts the agent carries forward.
memory_entry
kind fact | pattern | preference | anti_pattern | watch_item
body_md ≤500 chars, concise
salience 0.0-1.0
created_in_session uuid
last_validated_at timestamptz
decay_after_days int (default: 30 facts, 90 patterns, 180 prefs, infinite anti_patterns)
tags text[]
edges:
about → entity | project | feature | engagement (optional)
produced_by → agent_producer
validated_by → reasoning_trace (when agent reconfirms in later session)
superseded_by → memory_entry (when replaced)
13.3 agent_handover_memo
Written at session close.
agent_handover_memo
body_md 200-400 words, human-readable
open_tasks[] uuid array
open_subscriptions[]
pending_proposals[]
edges:
produced_by → agent_producer
part_of → agent_session
13.4 Reconstruction header injected on fresh session
# Session context reconstruction
You are agent:<role> starting a fresh session after the previous one
closed with reason: "{{ last_session.reason_closed }}".
## Recent memory entries (salience ≥ 0.3, last 30 days)
{{ hq memory query --role <role> --limit 40 --min-salience 0.3 }}
## Last handover memo
{{ last_handover_memo.body_md }}
## Currently open tasks where you are the driver
{{ hq task list --driven-by agent:<role> --status in_progress }}
## Open proposals you emitted still pending
{{ hq proposal list --by-agent <role> --status pending }}
## Open subscriptions
{{ last_handover_memo.open_subscriptions }}
---
Current trigger: {{ current_trigger.summary }}
Proceed. For older context: `hq memory search --role <role> --query ...`
~2-8k tokens total. Bounded. Deterministic. Cheaper than replaying a full transcript.
13.5 Memory decay and reconciliation
Weekly memory-tender job (Haiku, ~$0.05/week). As of #71, gains a sync pass:
- Reads each agent's MEMORY.md, promotes durable facts into
memory_entrynodes, prunes the file back under 25 KB cap, rewrites it from the top-N graph entries by salience. Graph remains source of truth. - Walks each agent's memory pool; archives entries past
decay_after_dayswithlast_validated_at < now() - decay - Flags contradicting entries as
memory-conflictreviews - Merges near-duplicate entries (embedding-similarity > 0.95 AND compatible content)
- Must be idempotent and crash-safe: single transaction, per-agent row lock, atomic MEMORY.md write (
mvfrom temp).
Keeps memory pool under ~200 active entries per agent — fits comfortably in the reconstruction header.
13.6 Invariants (enforced in code)
- No
<task-complete/>finalizes without at least onememory_entrywrite OR an explicitmemory_entry_count=0reason. - No fresh session starts without successful reconstruction query.
- No
memory_entrycan be deleted; only archived. - Every
memory_entryhasproduced_by → reasoning_traceso "why did the agent believe this?" is one hop.
14. Bitemporal audit (#16)
Every mutation writes a history_* row via AFTER triggers.
CREATE TABLE history_nodes (
history_id bigserial PRIMARY KEY,
id uuid NOT NULL,
op text NOT NULL CHECK (op IN ('insert','update','delete','archive')),
actor text NOT NULL, -- hq.actor GUC
reason text, -- hq.reason GUC
recorded_at timestamptz DEFAULT now(),
row jsonb NOT NULL
) PARTITION BY RANGE (recorded_at); -- monthly partitions
Actor convention:
user:wilson— human Wilson actionagent:team:<role>— Ventures team agentagent:project:<slug>— project agentworker:<name>— infrastructure workersystem:migration— schema migrationssystem:rollback— autonomy framework auto-rollback
Time-travel query: hq as-of <timestamp> describe <slug> calls public.nodes_as_of(timestamptz) which unions current nodes with history rows matching the window.
15. Materialized projections + gap views
Materialized views (refreshed nightly unless noted):
analytics.mv_entity_score— composite: recency(30%) + revenue(25%) + activity(15%) + stage(10%) + linguistic(10%) + anti_match(10%)analytics.mv_inbox_counts— unprocessed interactions + tasks per owner/entity/priorityanalytics.mv_pipeline— engagements by stage + age + revenue-at-risk + forecastanalytics.mv_entity_focus_count— count with priority='focus' (enforces hard cap of 3 via trigger)analytics.mv_health_per_entity— temperature, response latency, expectation counts
Gap views (new in v1.5 — the "what's missing" engine):
gap_overdue_replies— turn_state='ours' AND overdue_at < now()gap_stale_proposals— engagement at 'proposal' with no activity 14dgap_billing— engagement at 'contract' ≥7d with no invoicegap_stalled_projects— project in_progress with no task activity 30dgap_incomplete_specs— feature in_progress with NULL acceptance_criteriagap_overdue_expectations— expectation status='open' AND due_at < now()gap_recurring_missed— recurring_obligation not materialized on schedulegap_unresolved_intents— intent status='open' past SLA with no fulfilling edgegap_abandoned_tasks— task in_progress with updated_at < now() - 7dgap_unread_documents— proposal document sent >14d ago, no read event (via proposal-analytics)gap_quiet_customers— engagement active but last interaction >30dgap_cost_anomalies— agent_run cost burst (>2× 7d mean)
~20 deterministic gap views. Each agent queries its domain's gap view first, acts second.
Part IV — Agent Organization
16. Capitão Ventures team — two operational agents (#74)
The team is two always-on operational generalists, not a roster of specialists. They differ only in which side of the system they face.
Capitão Ventures team
├── chief — Outward. Customers, prospects, partners, Wilson.
└── operator — Inward. Data, code, files, finance, KB, the graph.
On-demand subagents (spawned in-process by either top agent)
├── chief spawns: email-drafter, proposal-writer, customer-brief, pipeline-analyst, Explore
└── operator spawns: project-agent:<slug>, code-reviewer, test-engineer, debugger,
database-specialist, kb-ingest, Explore
Infrastructure layer (horizontal, unchanged from v1.7)
├── agent-supervisor — event routing, concurrency cap, RAM-aware spawning
├── agent-watchdog — heartbeat, loop detection, budget, incident reporting
├── meta-watchdog — watches the watchdog
└── memory-tender — weekly memory + workspace reconciliation
16.1 Why two agents (not 10, not 3)
v1.7 specified 10 persistent agents differentiated by domain (account-manager, project-manager, sales-bd, …). v1.8 rejects that model on two grounds:
- Operational coherence beats specialization. A solo founder running 12 projects across 3 ventures needs an agent that knows everything about a thread; not 10 agents that each know one slice. The cookbook's "single agent with rich context" pattern beats the multi-agent split-brain pattern at this scale.
- Always-on presence is the load-bearing feature. Differentiated cron schedules (08:00 account-manager, 08:30 project-manager, …) are anti-presence. Two warm agents that respond in seconds beat ten cold agents that respond in minutes.
The chief / operator split exists for safety isolation: customer-facing speech (chief) runs separately from system-mutating action (operator), so a model regression in one workspace cannot accidentally compromise the other. Both can read the full graph; only operator can mutate it. Both can converse; only chief speaks outward.
16.2 Shared workspace skeleton (#71 amended)
Both agents inherit the same workspace shape:
agents/
├── _shared/
│ ├── CLAUDE.md # mission + 5 invariants + voice rules (≤ 900 tok)
│ ├── ventures-index.md # 3 ventures + 12 projects, 1 line (≤ 600 tok)
│ ├── customers-index.md # 1 line per active engagement (≤ 600 tok)
│ ├── peer-card.md # how to reach the other agent (mailbox API) (≤ 300 tok)
│ ├── glossary.md # node kinds, slugs, conventions (≤ 400 tok)
│ └── opus-triggers.md # the mandatory-Opus list (#74 §2) (≤ 300 tok)
├── chief/
│ ├── agent.md # frontmatter only (name, model, tools, memory, opus_triggers)
│ ├── CLAUDE.md # role, voice, ownership, walk-throughs (≤ 1800 tok)
│ ├── playbook.md # standard operating procedures (≤ 1500 tok)
│ ├── personality.md # tone, style, customer-by-customer notes (≤ 900 tok)
│ └── MEMORY.md # dynamic cache, decay-managed (≤ 1200 tok)
└── operator/
├── agent.md
├── CLAUDE.md, playbook.md, personality.md, MEMORY.md
The Anthropic memory tool (memory_20250818) mounts /memories/ on both agents read-write. The directory tree under /memories/ mirrors agents/_shared/ and agents/<agent>/ exactly, so workspace-as-source and memory-as-runtime stay byte-identical.
Project workspaces live at /memories/projects/<slug>/CLAUDE.md and are loaded on demand by operator when it spawns a project subagent (see §18.5). They are not loaded into either top agent's static context.
Token budget at spawn. chief ≈ 22 K tokens (≈11% of 200 K); operator ≈ 22 K. The pre-commit hook tools/check-agent-budget.py (cl100k tokenizer via tiktoken) enforces the per-file caps above and the per-agent total of 22 K. Hook failure blocks the commit; an --override-budget path requires explicit Wilson approval in the commit trailer.
16.3 Model policy (#74 §2)
Both agents share the same model policy. Default is Sonnet; Opus is mandatory when any of these triggers fire:
| Trigger | Model |
|---|---|
| Multi-step plan with ≥3 sequenced actions | Opus 4.7 |
| Customer-facing artifact (proposal, contract, brief, post-mortem) | Opus 4.7 |
| ADR drafting / decision-ledger entry | Opus 4.7 |
| Morning brief synthesis (chief, 07:00) | Opus 4.7 |
| Daily learning loop (operator, 22:00) | Opus 4.7 |
| Destructive-action 4-class review | Opus 4.7 |
| Routine email reply / task update / file edit | Sonnet 4.6 |
| Single-step lookup, classification, triage | Haiku 4.5 |
Implementation. Each agent's agent.md declares an opus_triggers list. The runtime evaluates triggers in priority order before each turn and overrides the default model per turn via the SDK's query(options={"model": ...}) parameter. Trigger evaluation runs in <50 ms (regex + JSON predicates over the current event batch); cost is negligible.
16.4 Always-on lifecycle
Both agents run as systemd USER units under athena (#73 A3) with Restart=always. The supervisor (#68) holds an in-memory presence table; an agent is warm when its query loop is mid-task or within the 90-second post-task-complete cooldown.
| State | Definition | Latency to first token |
|---|---|---|
| Warm | Query loop alive, prompt cache hot | <200 ms |
| Cooldown | Within 90 s of <task-complete/>, prompt cache hot |
<200 ms |
| Cold | systemd active, query loop dormant | ~1.5 s |
| Stopped | systemd inactive (manual or watchdog kill) | ~3 s + restart cost |
The supervisor's RAM-aware scheduler (#68) holds spawns when /proc/meminfo shows <600 MB available; under the 8 GB envelope (#66) this is a rare event because typical resident usage with both agents warm is 1.8–2.2 GB.
16.5 Inter-agent CLI mailbox (two-party, #71 amended)
Inter-agent communication collapses from N-party to two-party:
hq agent ask <peer> "<msg>" # synchronous RPC, blocks for response (default 30 s, configurable)
hq agent send <peer> "<msg>" # async FYI, no wait
hq agent reply <id> "<msg>" # response to an outstanding ask
hq agent inbox # list unread messages
hq agent presence # is the peer warm? returns {warm|cooldown|cold|stopped}
<peer> is exactly one of chief or operator. The supervisor enforces the peer set; unknown peers raise unknown_peer. The mailbox is implemented over ops.agent_inbox + Postgres LISTEN/NOTIFY (#71) — no Redis pub/sub, no external broker.
Typical handoff. chief receives an email asking for a project status update → hq agent ask operator "current state of guisoft ticketing dashboard?" → operator queries graph, returns a 3-paragraph summary → chief drafts the reply (Sonnet, or Opus if customer artifact threshold) → chief sends. Both halves of the handoff are persisted in ops.agent_actions (#75); the handoff itself is captured as two action rows linked by parent_id.
16.6 Cross-cutting protocols
- Destructive actions route through the friction-floor-zero 4-class gate (#69). Either agent emitting a destructive action calls the
request_approvalcustom tool, which writes aproposalnode and a Telegram nudge to Wilson. Wilson is the gate. - Cross-scope writes (chief touching graph data; operator drafting customer-visible text) are blocked by per-agent tool registry curation (#70). The blocked action returns a typed error and a hint to ask the peer via
hq agent ask. - Action capture is non-negotiable. Every tool call by either agent (or any subagent) writes a row to
ops.agent_actionsper #75 §1; downstream Wilson edits are captured per #75 §2 and folded into the success-examples database per #75 §3–§5.
17. chief — outward-facing operator
Workspace: agents/chief/. Mounted memory directory: /memories/agents/chief/.
17.1 Role and ownership
chief is the one and only outward-facing voice of Capitão Ventures. It owns every artifact a customer, prospect, partner, or Wilson would read.
Owns (full customer-outcome surface, per #74 amended by #77):
- Email outbox — drafting, threading; approval-required by default per #74 §9; manual graduation per scope via
hq agent ungate chief --action=email_send. - Prospect pipeline tracking and proposal drafting (Opus on artifact creation).
- Customer artifact lifecycle, end-to-end (#77 §2): drafting → in-flight edits → post-dispatch amendments → archival. Includes proposals, contracts, briefs, post-mortems, status reports, and any other outward-facing document chief produces or revises.
- Multi-source deep-dive context gathering (#77 §1): reads across project folders, the knowledge base, the email archive, drive files, calendar history, meeting transcripts, and external customer resources. Spawns
customer-deep-divefor any non-trivial multi-source synthesis so chief's main context stays clean. - Customer-driven project work (#77 §3): when a customer asks for a change scoped inside an existing project (copy fix on the marketing site, status-report regeneration, deliverable tweak), chief spawns
project-agent:<slug>directly viahq project runand reviews the subagent's output before publishing.code-reviewer/test-engineersubagents are available to chief for verifying customer-driven code changes. - Morning brief at 07:00 (Opus, plan-mode) — the single Wilson-facing daily summary, including graduation-readiness and capability-utilization signals.
- Customer-success follow-ups, renewal nudges, billing-relationship questions.
- Outbound calendar invitations and meeting prep notes (calendar invites with external attendees route through
request_approvaluntil graduated). - The voice rules in
/memories/agents/chief/personality.md— tone, language defaults (PT-PT for Portuguese contacts, EN for international), per-customer style notes.
Does not own (system-level state — routes to operator):
- Graph entity props, engagement props (chief reads; operator writes).
- Code outside the customer-artifact path allow-list (system code, runtime,
agents/<other>/,src/,migrations/) — operator owns; chief asks viahq agent ask operator. - Schema migrations, KB taxonomy changes, finance ledger writes — operator (with
request_approvalfor irreversibles). - KB ingest pipeline runs (operator); chief reads the KB but does not ingest.
- Direct Postgres writes (operator only).
- Direct email send / calendar send while in approval-required mode for the relevant scope.
The clean rule: chief owns customer-facing outcomes; operator owns system-level state. When in doubt, ask operator — the round-trip is cheap.
17.2 Triggers
| Kind | Value |
|---|---|
| Beat | 0 7 * * * (07:00 morning brief, Opus, plan-mode) |
| Outbox topics | email.received, email.thread.updated, entity.temperature_changed, engagement.stage_changed, proposal.draft_requested, customer.churn_signal, wilson.dm, prospect.created, interaction.overdue |
| Wilson inbox | enabled (always) |
| Calendar | webhook on event creation/cancellation if attendees include external contacts |
17.3 Tools (#77 expanded surface)
SDK built-ins: Read, Grep, Glob, Bash (curated allow-list), Agent, WebSearch, WebFetch, Monitor, Edit, Write (NEW per #77 — scoped to customer-artifact paths via runtime path allow-list; writes outside the allow-list raise typed errors and route to operator).
Path allow-list for Edit/Write (enforced by runtime middleware):
outputs/proposals/,outputs/contracts/(in-draft only — post-dispatch routes throughrequest_approval)outputs/briefs/,outputs/post-mortems/,outputs/status-reports/outputs/customer-facing/<customer>/agents/chief/personality.md(per-customer voice notes)agents/chief/MEMORY.md(own dynamic memory)
Capitão registry tools (from src/tools/registry.ts, see §33):
- Base set (every agent):
hq_search,hq_describe,hq_event_log,hq_tools. - Outward set:
hq_entity(read),hq_engagement(read+propose),hq_interaction,hq_proposal,hq_timeline. - Communication set:
gmail_thread_read,gmail_search(full archive, all customers),gmail_send,calendar_read,calendar_create_event,telegram_send_to_wilson. - Deep-dive set (NEW per #77):
kb_search(read-only KB query),hq_action_log(read-only on own trajectories),hq_trace_show(read-only). - Project set (NEW per #77):
hq_project_run(spawnproject-agent:<slug>for customer-driven work). - Learning set:
hq_examples_find(#75 + #76 — trajectory-aware RAG retrieval).
MCP servers (in-process via create_sdk_mcp_server, #67 amended):
gmail-mcp— wraps the IMAP/SMTP poller as MCP tools so the SDK can stream message bodies without re-authenticating per call. Includesgmail_searchacross full archive (#77).pipeline-mcp— exposes prospect-pipeline view + per-stage transitions as typed MCP tools.drive-mcp(NEW per #77) — wrapsmcp__claude_ai_Google_Drive__*for searching and reading drive files in service of customer questions.calendar-mcp(NEW per #77) — extends calendar reads beyondcalendar_readto historical event search.kb-mcp(read-only mode for chief; full mode for operator) — KB queries.
17.4 Subagents allowed (max 2 in-process + 1 project subagent, per #77)
Existing (v1.8):
email-drafter— drafts a single email reply.proposal-writer— drafts a structured proposal (Opus when invoked).customer-brief— synthesizes a 1-page customer-context brief.pipeline-analyst— pipeline-state analysis.Explore— fast read-only codebase / KB search.
New per #77 (load-bearing for deep work):
customer-deep-dive— multi-source synthesizer. Reads across project folders, KB, email archive, drive files, calendar history, meeting transcripts; returns a structured brief (see #77 §5 for format). Chief invokes for any non-trivial customer question requiring >2 sources.kb-search— read-only deep KB query subagent (read paths only, no ingest); deeper thanExplorefor KB-specific synthesis.project-agent:<slug>— project subagent loaded from/memories/projects/<slug>/CLAUDE.md. Chief spawns directly viahq project runwhen work is customer-driven (e.g., "fix the typo on the landing page that the prospect noticed"). The project subagent's writes are scoped to its own project; commits tomainroute throughcode-reviewerper #74. One project subagent at a time on top of the 2 in-process cap.code-reviewer,test-engineer— available to chief when chief commissioned a customer-driven change viaproject-agent:<slug>. Used to verify before publishing/PRing.
Subagents are loaded via the Agent tool from agents/chief/subagents/<name>.md.
17.5 Budget
| Control | Value |
|---|---|
daily_usd_soft |
$2.50 |
daily_usd_hard |
$7.00 |
tokens_per_run_cap |
200 000 |
max_turns_per_query |
30 |
opus_turns_per_day_cap |
12 (alarms at 10) |
17.6 Permissions (#77 expanded)
- Scope:
read-everywhere-relevant, write-customer-artifacts, draft-outbound, spawn-project-subagent-for-customer-work, propose-graph-mutations. - Read-everywhere (#77 §1): all project folders read-only, KB read-only, full email archive, drive files, calendar history, meeting transcripts, action ledger / own trajectories.
- Direct writes allowed (path allow-list enforced by runtime — see §17.3): customer-artifact files in
outputs/proposals/,outputs/contracts/(in-draft only),outputs/briefs/,outputs/post-mortems/,outputs/status-reports/,outputs/customer-facing/<customer>/, plus own workspace files (agents/chief/personality.md,agents/chief/MEMORY.md). Entity-temperature observations asevent.kind='development'viahqCLI. Calendar invite drafts (proposed viarequest_approvalif attendees are external). - Spawn rights expanded (#77 §3): chief may spawn
project-agent:<slug>directly for customer-driven work; chief may spawncode-reviewerandtest-engineerto verify the project subagent's output. System-driven project work still routes through operator. - Approval-required (default for outbound, #74 §9): all outbound emails (
gmail_send) and calendar invites with external attendees (calendar_create_event) route throughrequest_approvaluntil Wilson runshq agent ungate chief --action=<email_send|calendar_send> [--scope=…]for the relevant scope. - Approval-required (cross-scope, #69): edits to post-dispatch / executed contracts route as proposed amendments via
request_approval. Customer-driven changes to system code (anything outside the path allow-list) route through operator. - Direct writes blocked (route through operator): graph entity properties, engagement props (chief proposes), system code paths (
src/,agents/<other>/,migrations/,cmd/), finance ledgers, KB taxonomy. - Destructive actions (
mass_email,entity_delete,engagement_close, contract dispatch post-execution, etc.): friction-floor-zero 4-class gate (#69),request_approvalto Wilson, default deny — unaffected by graduation status. hq_actor:agent:ventures:chief. Spawned subagents inheritparent_actor=agent:ventures:chief; project subagents spawned by chief useagent:project:<slug>withparent_actor=agent:ventures:chief(operator-spawned project subagents useparent_actor=agent:ventures:operator— the action ledger distinguishes).
17.7 Success metrics
Text-axis (artifact-level, from #75):
proposal_acceptance_rate_30d(target: >70%).email_reply_clean_accept_rate_30approvals(target: ≥70% of approval-required drafts shipped without Wilson edit). Graduation signal #1 (#74 §9).email_reply_diff_score_p50_30approvals(target: ≤0.15 — minor tweaks only). Graduation signal #2.email_reply_rejection_count_14d(target: 0). Graduation signal #3.wilson_pinned_examples_count_per_action_type(target: ≥5). Graduation signal #4.
Trajectory-axis (procedure-level, from #76) — equally load-bearing:
trajectory_clean_rate_30tasks(target: ≥70% of email_reply tasks have zerobadormissingannotations; tracked viaops.action_annotations).bad_action_count_14d(target: ≤2; declining trend more important than absolute number).missing_step_count_14d(target: ≤3; high values mean chief is consistently skipping a context-gathering step — surface the most frequent gap pattern in the morning brief).auto_promoted_with_caveat_share_30d(informational: share of promotions that needed a corrective; declining over time means chief is internalizing the procedures).
Operational:
customer_response_latency_p50_hours(target: <4 h end-to-end including approval queue when in Tier 1).morning_brief_freshness(target: 100% delivered by 07:30).approval_queue_age_p50_hours(if p50 > 4 h, surface an alert in morning brief recommending direct-send graduation for low-stakes scopes).trajectory_capture_completeness(target: 100% — every email_reply task writes oneemail_reply_sessionsrow; gap = P1 incident).
Capability-utilization (added per #77 §9):
customer_deep_dive_invocation_rate_30d(target: 30-50% of email replies; too low = chief is missing context, too high = chief is being lazy and outsourcing context-gathering it could do directly).customer_artifact_edit_rate_30d(number ofoutputs/customer-facing/*writes per week; tracks whether chief is actually using the new write surface).project_subagent_spawn_by_chief_rate_30d(number ofhq project runcalls by chief vs. by operator; tracks whether customer-driven project work is correctly routed through chief vs. needlessly going through operator).cross_scope_violation_count_14d(target: 0 — attempted writes outside the path allow-list, caught by runtime middleware; any non-zero is a P2 incident).
17.8 Decision tree — act / spawn / ask (#77 §4)
When an inbound customer ask arrives, chief walks this tree before doing anything else. Plain-language version; the threshold "up to 2 sources" is a heuristic about context-window hygiene, explained immediately below.
Inbound customer ask arrives
│
├── Can chief answer by reading up to 2 small sources directly?
│ ├── Yes → chief reads inline; drafts.
│ └── No → chief spawns `customer-deep-dive` with a focused question;
│ receives a structured brief; drafts on top of it.
│
├── Does the ask require changing an artifact?
│ ├── In-draft customer artifact (proposal/contract/brief still being built)
│ │ → chief edits the file directly via Edit/Write.
│ ├── Post-dispatch contract (already sent to the customer / signed)
│ │ → chief drafts an AMENDMENT (new file) + `request_approval` (cross-scope per #69 — legal binding).
│ └── Customer-driven change inside a project repo (e.g., copy fix on the marketing site)
│ → chief spawns `project-agent:<slug>` via `hq project run`.
│ (If the change is system-driven, not customer-driven, ask operator instead.)
│
└── Is any factual claim about graph state involved?
(project status, scheduling, blockers, invoice state, who-said-what-when)
→ ALWAYS `hq agent ask operator` BEFORE drafting. No exceptions.
Operator owns the graph; chief is not allowed to guess facts.
What counts as a "source". A source is one discrete chunk of context chief has to read to answer. Each of these counts as one: the inbound email thread (always source #1 by default), one entity record in the graph (hq describe entity:<…>), one project state file, one KB article, one contract/proposal file, one drive file, one past meeting transcript, one prior email thread (different from inbound), one external URL the customer linked.
Why the threshold. It's about where the reading happens:
| Sources needed | Inline cost | Subagent cost | Winner |
|---|---|---|---|
| 1 | ~3 s, ~2 K tokens added | ~10 s, ~1 K tokens added | Inline — subagent overhead doesn't pay off |
| 2 | ~6 s, ~5 K tokens added | ~15 s, ~1 K tokens added | Inline — barely; depends on source size |
| 3+ | ~15+ s, ~15-30 K tokens added (pollutes context) | ~25 s, ~1.5 K tokens added | Subagent — keeps chief's context clean for drafting |
The threshold is heuristic, not a hard rule. Chief should err toward the subagent if individual sources are large (long PDFs, multi-page contracts) even at 2 sources, and toward inline if all sources are tiny (a single timeline + 1 KB article = ~500 words total).
Concrete examples — up to 2 sources, read inline: "What time is our meeting tomorrow?" (calendar = 1), "Did João reply about the SLA last week?" (thread + 1 timeline query = 2), "What's our standard response-time SLA?" (1 KB article). Three or more sources, spawn deep-dive: "Can you summarize where we are with Frama overall?" (engagement + last 5 interactions + project state + open tasks + KB = 5+), the contract clause example from §17.1 (contract + 2 amendment precedents + KB compliance article + prior threads + operator check = 5+), "What did we promise the customer in the kickoff vs. what's in the contract?" (transcript + contract + proposal + RFP = 4).
17.9 Workspace files
agents/chief/CLAUDE.md — role, voice, ownership, the decision tree (§17.8 mirrored), the operational walk-throughs (deep-dive synthesis, contract amendment, customer-driven project work, morning brief, escalation, proposal drafting), the hq examples find usage rules, the destructive-action gate language.
agents/chief/playbook.md — standard procedures: how to triage an inbound email, how to draft a proposal, how to draft a morning brief, how to handle a customer escalation, how to handle a missed deadline.
agents/chief/personality.md — voice rules, language defaults, per-customer style notes (Frama formal+brief, PetVitaClub warm+chatty, Garq technical+precise, …).
agents/chief/MEMORY.md — dynamic cache, decay-managed by memory-tender.
agents/chief/subagents/customer-deep-dive.md, agents/chief/subagents/kb-search.md — subagent definitions per #77 §3.
18. operator — inward-facing operator
Workspace: agents/operator/. Mounted memory directory: /memories/agents/operator/.
18.1 Role and ownership
operator is the one and only system-mutating actor in Capitão Ventures. It owns the graph, the codebase, the file system, the KB, and the action ledger itself.
Owns:
- Graph data: producer/event ingestion, entity reconciliation, edge maintenance, gap-view triage.
- Code execution: spawning project subagents per
/memories/projects/<slug>/, running test/lint/build pipelines, committing code viacode-reviewer+test-engineersubagent collaboration. - File operations: workspace structure, KB taxonomy, raw-source ingestion, archival.
- Finance operations: invoice reconciliation against TOConline (read-only source of truth), expense categorization, revenue variance flags.
- Daily learning loop at 22:00 (Opus, plan-mode) — the success-examples auto-promotion pipeline (#75 §3).
- Action ledger maintenance: nightly auto-promotion job,
wilson_pinnedexample mirror generation, ledger health checks. - Schema evolution: ADR drafting (Opus mandatory) and migration authoring (delegated to
database-specialistsubagent).
Does not own:
- Outbound communication (delegates to chief via
hq agent ask). - Customer-visible artifacts (delegates to chief).
- Wilson-facing summaries (those are chief's morning brief).
18.2 Triggers
| Kind | Value |
|---|---|
| Beat | 0 22 * * * (22:00 daily learning loop, Opus, plan-mode); 0 3 * * * (03:00 nightly graph + ledger reconciliation) |
| Outbox topics | producer.unmapped, event.unclassified, task.assigned_to.operator, task.assigned_to.project:*, proposal.kind=schema_change, proposal.kind=migration, feature.status_changed.*, kb.ingest.completed, finance.anomaly, agent_incident.created, wilson.dm |
| Wilson inbox | enabled (always) |
| File watchers | ~/capitao-knowledge-base/raw/, ~/capitao-command-center/proposals/ (newly emitted proposals from chief) |
18.3 Tools
SDK built-ins: Read, Write, Edit, Bash (curated allow-list with broader scope than chief), Glob, Grep, Agent, Monitor, NotebookEdit.
Capitão registry tools (from src/tools/registry.ts, see §33):
- Base set:
hq_search,hq_describe,hq_event_log,hq_tools. - Inward set:
hq_entity(read+write),hq_engagement(read+write),hq_task(read+write),hq_proposal(read+write),hq_timeline(read+write). - System set:
worker_run,migration_plan,migration_apply,kb_ingest_run,graph_reconcile,finance_import. - Ledger set:
hq_action_log(admin queries onops.agent_actions),hq_edit_log(admin queries onops.wilson_edits),hq_examples_promote(auto-promotion driver),hq_examples_pin(manual curation).
MCP servers (in-process):
graph-mcp— exposes the bitemporal graph as typed MCP tools (faster than CLI for complex traversals).kb-mcp— exposes KB ingest pipeline + lint queries.finance-mcp— wraps TOConline read-only API.
18.4 Subagents allowed (max 2 concurrent + 1 project subagent)
project-agent:<slug> (one at a time per project; loaded on demand from /memories/projects/<slug>/CLAUDE.md), code-reviewer, test-engineer, debugger, database-specialist, kb-ingest, Explore.
18.5 Budget
| Control | Value |
|---|---|
daily_usd_soft |
$4.00 |
daily_usd_hard |
$10.00 |
tokens_per_run_cap |
250 000 |
max_turns_per_query |
50 |
opus_turns_per_day_cap |
18 (alarms at 14) |
18.6 Permissions
- Scope:
read-graph, write-graph, write-fs, write-code, propose-schema. - Direct writes allowed: every internal node kind under #69's friction-floor-zero rule, every file under
/home/athena/capitao-*and/var/cache/capitao/, code commits via thecode-reviewergate. - Direct writes blocked: outbound email (chief), customer-artifact files in
outputs/customer-facing/(chief). - Destructive actions (
schema_drop,mass_delete,force_push,migration_irreversible,secret_export): friction-floor-zero 4-class gate (#69),request_approvalto Wilson, default deny. hq_actor:agent:ventures:operator. Spawned project subagents useagent:project:<slug>withparent_actor=agent:ventures:operator.
18.7 Success metrics
task_completion_rate_30d(target: >85% of operator-driven tasks closed within SLA).code_review_pass_rate_first_attempt(target: >70%).kb_ingest_freshness(target: <24h lag from raw drop to wiki article).graph_gap_resolution_p50_hours(target: <12h forgap_unresolved_intents).success_example_promotion_rate_30d(target: >60% of new actions auto-promoted within 7 days).learning_loop_completion(target: 100% — 22:00 loop runs every day).
18.8 Workspace files
agents/operator/CLAUDE.md — role, ownership, the six operational walk-throughs (data-organization focus), the destructive-action gate, the action-ledger discipline.
agents/operator/playbook.md — standard procedures: how to ingest a new producer, how to reconcile entities, how to spawn a project subagent, how to run the 22:00 learning loop, how to draft an ADR.
agents/operator/personality.md — voice rules for internal artifacts (terse, citation-heavy, structured); how to write commit messages; ADR rhetoric.
agents/operator/MEMORY.md — dynamic cache, decay-managed.
18.5 Project subagents — on-demand (#74 amends #53)
Project agents are no longer persistent. They are loaded on demand by operator via the SDK Agent tool, with a system prompt assembled from three markdown files at runtime.
18.5.1 Lifecycle
For every active project node with priority IN ('focus', 'now'):
- A workspace exists at
/memories/projects/<slug>/containingCLAUDE.md,customer-profile.md,domain-knowledge.md,playbook.md,personality.md,MEMORY.md(dynamic). - A producer node is registered:
cc:agent:project:<slug>(created idempotently when the workspace is first loaded). - No systemd unit. No cron. No warm window.
When operator needs to act on a project, it calls:
hq project run <slug> "<task description>"
Sugar for Agent(subagent_type="project:<slug>", prompt="<task description>"). The Agent tool reads the workspace, composes the system prompt (/memories/_shared/CLAUDE.md + /memories/projects/<slug>/CLAUDE.md + /memories/projects/<slug>/MEMORY.md), runs to <task-complete/>, and exits.
Cold-start cost: ~2 seconds (no warm window). RAM peak: ~600 MB while running, freed on exit.
18.5.2 Workspace template
/memories/projects/_template/CLAUDE.md:
---
name: project:{{slug}}
description: Operator for project {{title}}. Reads assigned tasks, executes, reports, flags blockers.
parent: agent:ventures:operator
---
# Project agent — {{title}}
## Scope
This project only. {{description_md}}.
**MAY:** read/write own project repo (via worktrees), create/update tasks and features within this project, propose milestones/developments/actions as events, spawn Explore/code-reviewer/test-engineer/debugger subagents, flag blockers.
**MAY NOT:** touch other projects, write to entity nodes, sign off deliverables, commit to main without code-reviewer pass.
## Model policy
Inherits #74 §2 — Sonnet default; Opus on the mandatory triggers.
## Tools
Inherits operator's inward set + per-project additions per `customer-profile.md` declared `project_type`.
18.5.3 Initial roster (Wave 3)
Based on 2026-04-22 priorities (unchanged from v1.7):
Focus (hard cap 3):
project:capitao-command-center(dogfood — operates its own codebase).project:garq-pdm-consulta(customer, contract stage).project:frama-b2b-maintenance(customer, delivery).
Now (~6):
project:popdigit-tourism(partner).project:beepenger-budget(proposal).project:gopecauto-officegest(proposal).project:capitao-consulting-site(own marketing).project:membriko(own product).project:ghostpost(own product).
Other projects (arisilvahelenos, ferroembrasa, guisoft, safaa, personal, ti-milha) keep workspaces at /memories/projects/<slug>/ but are not loaded by operator until a triggering event arrives.
19. Infrastructure agents
19.1 agent-supervisor (#68 amended by #74)
- Language: Go (tiny RSS, fast startup)
- Always-on: yes (systemd
Restart=always) - Routing table (v1.8):
{chief, operator}. Project subagents are spawned in-process via the SDKAgenttool byoperator(§18.5); the supervisor does not route to them directly. - Responsibilities:
- Hold Postgres LISTEN/NOTIFY subscriptions for every outbox topic chief or operator cares about (see §17.2 and §18.2 trigger lists).
- Watch
ops.agent_inboxfor Wilson DMs and inter-agent mailbox traffic between chief ↔ operator (§16.5). - On notify event: read the two-row routing table, deliver the event to the warm agent's query loop (or wake from cooldown/cold).
- Enforce concurrency cap = 2 top-level agents running + 2 in-process subagents per top agent simultaneously (4 total query loops max).
- Event coalescing: ≥3 events for the same agent within a 2 s window become one wake with the batch.
- RAM-aware scheduling: read
/proc/meminfobefore waking a cold agent; if available <600 MB, hold the wake and retry every 5 s. - Emit Prom metrics:
supervisor_agents_warm{agent},supervisor_events_queued{agent},supervisor_wakes_total{agent},supervisor_ram_aware_holds_total,supervisor_subagent_spawns_total{parent,subagent}.
19.2 agent-watchdog (#64)
- Language: Python (SQL + Prom-scrape heavy)
- Always-on: yes
- Check cadence: every 30 seconds
- Checks: heartbeat (TTL gauge), activity timeout (no tool calls 10 min during open query), loop detection (same tool sig >4×), budget consumed, error rate, Stop integrity (no task-complete but max_turns hit), memory pool sanity, subagent leak, RSS drift
- Actions by severity: soft (DM agent to check in), medium (force-stop query + fresh session), hard (systemd restart), critical (quarantine + review)
- Writes:
agent_incidentnodes +ops.agent_error_log
19.3 meta-watchdog
- Language: Bash + systemd timer
- Cadence: every 60 seconds
- Check: is
agent-watchdog.servicerunning and healthy? - Action: systemd restart + Telegram page Wilson if failed
19.4 memory-tender
- Cadence: weekly (Sunday 04:00)
- Duties: walk each agent's memory, archive decayed entries, merge near-duplicates, flag contradictions
- Cost: ~$0.05/week on Haiku
Part V — Runtime & Infrastructure
20. Agent runtime
20.1 AgentRuntime class — shape and responsibilities
Python module at src/runtime/agent_runtime.py (~500 LOC total). One file; all agents share it. Per-agent behavior comes from markdown config + prompt, not from code.
class AgentRuntime:
"""
Runs one agent for one trigger batch. Exits after <task-complete/>.
Reloaded per spawn by the supervisor.
"""
def __init__(self, role: str, config_path: str, events_stdin: list[dict]):
self.role = role
self.config = MarkdownConfigParser(config_path).parse() # §17.3 format
self.events = events_stdin
self.session_id = None
self.store = PostgresSessionStore(os.environ["HQ_DB_URL"])
self.budget = CostBudget.from_config(self.config.budget)
async def run_once(self) -> int:
"""Entrypoint. Returns exit code (0=ok, 1=budget, 2=error, 3=watchdog-killed)."""
await self.store.connect()
self.session_id = await self.store.get_or_create_session(self.role)
if self.store.is_fresh_session(self.session_id):
reconstruction = await self._build_reconstruction_header()
else:
reconstruction = "" # resumed session still has context
prompt = reconstruction + self._render_event_batch(self.events)
try:
async for message in query(
prompt=prompt,
options=self._build_sdk_options()
):
await self._on_message(message)
if self._detect_task_complete(message):
await self._finalize_session(message)
return 0
except BudgetExceeded:
await self._freeze_self()
return 1
except KeyboardInterrupt: # SIGTERM from watchdog
await self._partial_finalize()
return 3
return 2 # fell off without task-complete
Full implementation spec is a Wave 1 artifact (§49).
20.2 PostgresSessionStore — see ADR #72
Canonical schema and adapter live in DECISIONS.md #72 (Anthropic PostgresSessionStore reference port; storage table ops.agent_sessions(id BIGSERIAL, key TEXT, entries JSONB, created_at TIMESTAMPTZ DEFAULT now()) indexed on (key, id); CI conformance gate via claude_agent_sdk.testing.run_session_store_conformance(...); local-disk primary at /var/cache/capitao/sessions, Postgres mirror async + best-effort; cold-start restore order pg_restore → disk_restore → fresh). The earlier hand-rolled schema in this section was superseded by #72 in v1.7 and removed in v1.8.
20.3 Markdown config parser
~80 lines of Python. Reads the agent's .md file; extracts:
- Frontmatter (
name,description) ## Model policytable → dict of condition → model## Triggerstable → {beat, outbox_topics, wilson_inbox}## Toolslist## Subagents allowedlist## Budgettable## Permissionstable## Session lifecyclenarrative (parsed minimally; code has defaults)## Promptreference → loads prompt file verbatim
21. Task-complete lifecycle (#62)
21.1 The sentinel
Every agent prompt contains:
When you have truly finished your current unit of work AND are not waiting on any tool result, subagent, review decision, Wilson input, or other agent's proposal — emit on its own line:
<task-complete reason="..."/>Only emit when truly done. If waiting for anything, stay in the turn.
21.2 Stop hook
async def _detect_task_complete(self, message) -> bool:
"""Scan final message for the sentinel."""
for block in getattr(message, "content", []):
if getattr(block, "type", None) == "text":
if "<task-complete" in block.text:
self.task_complete_reason = self._extract_reason(block.text)
return True
return False
async def _finalize_session(self, final_message):
# 1. Ask SDK for a compact memory snapshot turn
snapshot = await self._request_memory_snapshot()
# 2. Write agent_session node
await self.db.execute("INSERT INTO nodes (type, props, ...) VALUES ('agent_session', ...)")
# 3. Parse snapshot into memory_entry nodes
await self._persist_memory_entries(snapshot)
# 4. Write agent_handover_memo
await self._write_handover_memo(snapshot, final_message)
# 5. Archive transcript
await self.store.archive(self.session_id)
# 6. Emit outbox event
await self.db.execute("INSERT INTO ops.outbox (topic, payload) VALUES ('agent.session_closed', $1)", ...)
# 7. Process exits (caller returns from run_once with 0)
21.3 Warm window (chief + operator only)
Both top-level agents have a 90-second warm window post-task-complete. Project subagents and on-demand worker subagents skip the warm window (full process exit per task).
if self.config.warm_window_seconds > 0:
await self._wait_for_event_or_timeout(self.config.warm_window_seconds)
if new_event_arrived:
# Fresh session starts in same process
self.session_id = None
await self.run_once() # recurse with new events
else:
return 0 # exit process
Saves cold-start overhead during bursts. Default warm_window_seconds = 90 for chief and operator; 0 for everyone else.
21.4 Fallback lifecycles
| Fallback | Trigger | Purpose |
|---|---|---|
| Auto-compaction | Context > 75% of window | In-place compaction; keep current task intact |
| Nightly rotation | 03:00 local, still-live sessions | Forced clean rollover with handover memo |
| Budget-cap rotation | Hard cap hit | Freeze + fresh session after un-freeze |
| Crash recovery | systemd restart | Resume from PostgresSessionStore |
22. The agent-supervisor process
22.1 Implementation sketch
// cmd/agent-supervisor/main.go (~200 LOC)
package main
import (
"github.com/lib/pq"
...
)
type Supervisor struct {
db *sql.DB
routing map[string]AgentRoute // topic -> agent role
running map[string]*exec.Cmd // role -> process
concurrency int // cap = 3
mu sync.Mutex
}
func (s *Supervisor) Listen() {
listener := pq.NewListener(dsn, ...)
for _, topic := range s.subscribedTopics() {
listener.Listen(topic)
}
for notif := range listener.Notify {
events := s.coalesce(notif) // batch same-role events within 2s
s.trySpawn(events)
}
}
func (s *Supervisor) trySpawn(events []Event) {
s.mu.Lock()
defer s.mu.Unlock()
role := events[0].Role
if _, alreadyRunning := s.running[role]; alreadyRunning {
s.enqueue(events) // buffer; dispatched when current finishes
return
}
if len(s.running) >= s.concurrency {
s.enqueue(events)
return
}
if !s.ramAvailable(600_000_000) { // 600 MB free required
s.enqueue(events)
return
}
cmd := exec.Command("hq", "agent", "run", role)
cmd.Stdin = strings.NewReader(eventsJSON(events))
cmd.Start()
s.running[role] = cmd
go func() {
cmd.Wait()
s.mu.Lock()
delete(s.running, role)
s.drainBuffer() // dispatch queued events if slots free
s.mu.Unlock()
}()
}
Full implementation is Wave 1 artifact (§49.5).
23. Service matrix
23.1 Always-on services
| Service | Language | RAM steady | CPU | Purpose |
|---|---|---|---|---|
| postgresql | C | 700-1000 MB | burst | Graph store, queue, analytics |
| valkey | C | 60-100 MB | low | Cache, pub/sub, rate-limiter |
| pgbouncer | C | 5-10 MB | low | Connection pooling :6432 |
| agent-supervisor | Go | 20-30 MB | low | Event routing, concurrency |
| agent-watchdog | Python | 50-70 MB | low | Health checks |
| prometheus | Go | 100-150 MB | low | Metrics |
| next.js | Node | 180-220 MB | burst | Admin UI |
| caddy | Go | 20-30 MB | low | Reverse proxy, TLS, service wake-up |
| node_exporter | Go | 15-20 MB | low | OS metrics |
| postgres_exporter | Go | 20-30 MB | low | Postgres metrics |
| hq-exporter | Node | 35-45 MB | low | Custom metrics |
| ubuntu + systemd | — | ~300 MB | — | OS base |
Always-on total: ~1.5-1.9 GB.
23.2 On-demand services
| Service | Trigger | RAM when active | Auto-shutdown |
|---|---|---|---|
| whisper-stt | meeting-transcribe triggers | ~1.5-2 GB | on completion |
| grafana | First /grafana/* request via Caddy |
~130 MB | 10 min idle |
| next.js admin (if idle-tuned further) | First HTTP request | ~180 MB | configurable |
Voyage 3.5-lite (ADR #82) is SaaS, not on-demand local — no RAM cost, no socket activation. Reached over HTTPS by embed-worker.
On-demand services run 0 MB when idle.
23.3 Always-on agents (chief + operator with 90 s warm window)
| Agent | RAM during warm window | RAM during active query |
|---|---|---|
| chief | ~280-400 MB (Python + SDK + workspace context) | ~500-700 MB (with 1 subagent active) |
| operator | ~280-400 MB | ~600-850 MB (with project subagent or code-reviewer subagent active) |
| agent-watchdog | ~50-70 MB | ~70-100 MB (during SQL-heavy checks) |
| agent-supervisor | ~25-35 MB (Go binary) | same |
| ledger-flusher | ~30-50 MB | ~60-90 MB (during batch flush) |
23.4 Ephemeral subagents (spawn-run-exit)
| Subagent | RAM while running | Duration typical |
|---|---|---|
| Single in-process subagent (Explore, code-reviewer, email-drafter, …) | ~150-280 MB on top of parent | 10 s - 3 min |
| Two concurrent subagents (cap) | ~300-560 MB on top of parent | rare; heavy analysis |
| Project subagent with code tasks | ~400-600 MB on top of operator | minutes |
| Worker subagent (kb-ingest, finance-import) | ~80-180 MB on top of operator | seconds to minutes |
24. Service tuning — locked day-0 flags
24.1 Postgres 17 (/etc/postgresql/17/main/postgresql.conf)
shared_buffers = 256MB
effective_cache_size = 2GB
work_mem = 8MB
maintenance_work_mem = 64MB
max_connections = 30
wal_buffers = 16MB
random_page_cost = 1.1
track_io_timing = on
jit = off
max_parallel_workers_per_gather = 2
Expected RSS: 700-1000 MB steady.
24.2 Valkey (/etc/valkey/valkey.conf)
maxmemory 96mb
maxmemory-policy allkeys-lru
save ""
appendonly no
tcp-keepalive 60
Expected RSS: 60-100 MB.
24.3 Prometheus flags
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=800MB
--query.max-samples=5000000
--scrape.interval=30s
24.4 Next.js
NODE_OPTIONS="--max-old-space-size=200 --no-warnings"
NEXT_TELEMETRY_DISABLED=1
24.5 Caddy site blocks (/etc/caddy/Caddyfile)
Grafana wake-up handler (socket-activated; cold start on first request):
grafana.internal.capitao {
@first_visit not header Cookie *grafana_session*
handle @first_visit {
exec systemctl start grafana.service
respond "Starting Grafana, refresh in 2s..." 202
}
reverse_proxy localhost:3000
}
Command Center UI (subdomain, #78). Day-0 binds to Tailscale; F27 lock at Wave 2 may swap bind tailscale0 for a public posture (OAuth or IP allow-list):
command-center.capitao.consulting {
bind tailscale0 # Wave 1: Tailscale-only. Removed on F27 lock.
encode gzip zstd
log {
output file /var/log/caddy/command-center.log
format json
}
@md query format=md
handle @md {
header Content-Type "text/markdown; charset=utf-8"
reverse_proxy 127.0.0.1:3001
}
reverse_proxy 127.0.0.1:3001
}
The @md matcher implements #25's ?format=md symmetry: HTML and markdown share one upstream (the Next.js process) and the route handler decides which view to render. Drift detection: curl …/roadmap?format=md byte-equals state/roadmap.md (modulo whitespace).
24.6 systemd socket activation for Whisper
(TEI socket activation removed per ADR #82 — embedding inference moved off-host to Voyage 3.5-lite. Whisper stays local.)
# /etc/systemd/system/whisper-stt.socket
[Socket]
ListenStream=127.0.0.1:8210
[Install]
WantedBy=sockets.target
# /etc/systemd/system/whisper-stt.service
[Service]
ExecStart=/usr/local/bin/whisper-start-wrapper
EnvironmentFile=/etc/capitao/secrets.env
The wrapper script starts Whisper, keeps alive 5 min of idle, then stops. VOYAGE_API_KEY lives in the same secrets.env (mode 0600, athena:athena) and is loaded by embed-worker via EnvironmentFile= in its own systemd unit.
24.7 cgroup limits per agent
/etc/systemd/system/capitao-agent@.service.d/limits.conf:
[Service]
MemoryMax=900M
MemorySwapMax=400M
CPUQuota=200%
Protects the box from a runaway agent.
25. RAM budget — 8 GB envelope (#66)
25.1 Realistic usage over time
| Scenario | RAM | % of 8 GB |
|---|---|---|
| Overnight (supervisor + watchdogs only) | ~1.6 GB | 20% |
| Normal business hours | ~2.0-2.8 GB | 25-35% |
| Busy afternoon (2 agents concurrent) | ~3.0-3.5 GB | 38-44% |
| 3 agents + 1 subagent each (realistic peak) | ~3.5-4.0 GB | 44-50% |
| Ceiling: 3 agents × 2 subagents + Grafana | ~4.4-4.7 GB | 55-59% |
| + Whisper transcribing (briefly allowed over) | ~7.0-7.5 GB | 88-94% |
Headroom at typical load: 4-5 GB free for Postgres page cache, burst absorption, Grafana sessions. Page cache keeps search queries <40ms p95.
25.2 Supervisor RAM-aware rules
if available_ram < 600 MB: hold new agent spawns; queue events
if available_ram < 400 MB: kill concurrency cap to 1 until memory frees
if swap_used > 500 MB sustained: alert Telegram + pause non-essential agents
26. Authentication — Max OAuth (#57)
26.1 Setup (one-time per 12 months)
# On the VPS, as capitao user:
claude setup-token
# Result: prints 1-year OAuth token.
# Store in /etc/capitao/agents.env (chmod 600, owner capitao:capitao):
# CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...
26.2 systemd unit drop-in
[Service]
EnvironmentFile=/etc/capitao/agents.env
User=capitao
Applied to every agent service.
26.3 Token rotation watcher
A weekly cron job decodes the JWT, checks expires_at. If < 30 days remain, opens a review node asking Wilson to re-run claude setup-token. No silent expiry.
26.4 License compliance
Max OAuth is authorized by Anthropic for local development and personal automation. Capitão Command Center operates Capitão Ventures internally; it is not resold. Authorized use.
If Command Center ever becomes a SaaS product, switch to API-key authentication (ANTHROPIC_API_KEY). No code changes needed — SDK auto-detects.
27. Rate-limit and cost control
27.1 Plan-level limits
Max 20× plan: 5-hour rolling windows. With 2 always-on top agents + on-demand subagents (typically 1-2 active at a time during business hours), typical spend stays under 30% of plan cap. Bursts during customer incidents or heavy code work can hit 70%+.
27.2 Mitigations (built into runtime)
- Event-driven, not cron-driven. Both top agents wake on outbox events; the only fixed cron beats are 07:00 (chief brief) and 22:00 (operator loop). Burn rate scales with workload, not with the clock.
- Supervisor concurrency cap (§19.1) — 2 top agents + max 2 in-process subagents per top agent = 4 query loops total.
- Exponential backoff on 429 — Valkey-shared rate-limiter coordinates across both agents.
- Cost-aware demotion — if 7-day moving avg trends toward plan cap, the per-turn model picker demotes routine Sonnet turns to Haiku; Opus triggers (#74 §2) remain mandatory and are never demoted.
- Circuit breaker at 90% of plan — pause
operator's 22:00 learning loop and any non-emergency project subagents; keepchieflive for customer-facing work. - Per-agent daily hard caps — agent freezes itself at its own cap, opens review via
request_approval. hq autonomy freeze --reason "..."— manual emergency stop.
27.3 Cost telemetry
Every LLM call writes to ops.llm_call_log:
CREATE TABLE ops.llm_call_log (
id bigserial PRIMARY KEY,
agent_id text NOT NULL,
session_id uuid,
trace_id uuid REFERENCES nodes(id),
model text NOT NULL,
tokens_in int,
tokens_out int,
cost_usd numeric(8,4),
latency_ms int,
canary_id text,
purpose text,
started_at timestamptz DEFAULT now()
);
CREATE INDEX ON ops.llm_call_log (agent_id, started_at DESC);
Grafana panel per-agent-cost-24h + Prom gauge agent_run_cost_usd_24h{agent="..."}.
Part VI — Tools, Skills, and Surfaces
28. The hq CLI — canonical action surface
Form: hq <noun> <verb> [--filters] [--json | --text]
Exit codes: 0 ok / 1 user-error / 2 system-error / 3 not-found.
Universal flags: --json, --text (default), --actor='<string>', --reason='<string>'.
28.1 Reads (safe)
hq search <query>
hq describe <slug|uuid>
hq entity find --email|--phone
hq entity profile <slug>
hq engagement list --stage <stage>
hq project list --priority <focus|now|next|backlog>
hq task list --owner --priority
hq interaction list --entity --limit
hq timeline [--since] [--entity]
hq review list | show <id>
hq expectation list [--status] [--direction]
hq intent list [--kind] [--status]
hq gap list # all gap_* views
hq gap show <gap_name>
hq as-of <timestamp> describe <slug>
hq memory search --role <role> --query <terms>
hq trace {show,inputs,decided,replay,explain} <trace-id>
hq producer health [<slug>]
hq playbook {list,show}
hq proposal {list,show} --kind <kind>
hq autonomy status
hq watchdog status
hq agent {list,status,attach} [<role>]
28.2 Writes (produce outbox events)
hq entity create --name --kind person|org [--email] [--phone]
hq entity merge <loser> --into <winner> # always reviews
hq interaction log --channel --from|to|cc --subject --body-file --thread-id
hq conversation create --kind --started-at [--participant]
hq engagement create --entity --stage --name --price [--maintenance-months]
hq engagement stage <slug> --to <stage>
hq project create --entity --name --slug [--lead] [--tech-stack]
hq feature create --project --title --slug [--complexity S|M|L|XL]
hq task create --title [--project|--feature] --priority --owner --due
hq task complete <slug>
hq task assign <slug> --to <contact>
hq event create --kind --about [--impact]
hq intent create --kind --about --extracted-from [--urgency] [--due-at]
hq expectation create --kind --direction --about --owed-by --owed-to [--due-at]
hq review resolve <id> --choice <opt-N> [--note]
hq review defer <id> [--until <ts>]
hq review dismiss <id> --reason
hq proposal propose --kind --evidence <json> [--actor <agent>]
hq proposal rollback <id>
hq autonomy freeze [--reason] [--until <ts>]
hq autonomy thaw
hq autonomy kill --loop <playbook|calibration>
hq playbook archive <slug> --reason
28.3 Agent control (new in v1.5)
hq agent run <role> # supervisor entrypoint; reads events from stdin
hq agent send <role> <message> # DM an agent; tails response
hq agent attach <role> # live-tail transcript
hq agent pause <role> [--for <duration>]
hq agent resume <role>
hq agent restart <role> [--fresh-session]
hq agent handover <role> # force session rotation now
hq agent status <role>
28.4 MCP fallback
hq mcp-serve # only enabled on hot paths; socket-activated
Not used in default config. Reserved for measured need.
29. Skills catalog
29.1 Mandatory skills (every agent)
| Skill | Grunt / state | Purpose |
|---|---|---|
caveman |
Full | Token compression (internal reasoning + inter-agent writes) |
caveman-compress |
installed | Compresses long memory files |
hq-actor-attribution |
always on | Ensures hq.actor GUC set on every write |
cost-budget-guard |
always on | Enforces daily caps; aborts on overrun |
session-distill |
always on | Stop-hook: reads transcript, proposes events |
29.2 Role-specific skills (catalog reference) — v1.8 collapsed
Full catalog in .skills/INDEX.md. The v1.7 per-role skill split (10 sets × 4 skills) collapses into two larger sets owned by the two top agents. Many skills still exist; ownership simplifies.
| Agent | Skills |
|---|---|
chief |
entity-brief, interaction-log, draft-outreach, relationship-temperature, pipeline-report, proposal-draft, stage-advance, visitor-analytics-digest, renewal-watch, upsell-probe, nps-signal, action-now-render, morning-brief, weekly-digest, examples-find |
operator |
project-health, roadmap-show, blocker-probe, scope-diff, search, adr-draft, dependency-audit, complexity-review, invoice-chase, recurring-materialize, revenue-variance, playbook-draft, kb-gap-scan, kb-ingest, kb-query, daily-learning-loop, examples-promote, examples-find, incident-cluster, prompt-propose, config-propose, seed-case-author |
| Project subagent (template) | search, scope-diff, adr-draft, complexity-review, blocker-probe (loaded from /memories/projects/<slug>/playbook.md) |
29.3 Community skills via gh skill
gh skill install JuliusBrussee/caveman caveman
gh skill install JuliusBrussee/caveman caveman-compress
gh skill update --all # weekly cron
Own skills authored locally; not gh skill published (internal only).
30. Caveman policy (#55)
30.1 Default state
All agents operate under caveman full for internal reasoning, tool calls, inter-agent writes (proposal bodies, reasoning_trace notes, enrichment output).
30.2 Mandatory carve-outs — switch to normal mode
Every agent's prompt embeds:
Before producing ANY artifact intended for Wilson or a customer — memo,
task.title, task.description, review.question_text, email body, proposal
text, invoice line items — emit `normal mode` on its own line, produce
the artifact in clear human English (or Portuguese), then emit
`/caveman full` on its own line before continuing.
NEVER apply caveman to:
- memo.content_md
- task.title / task.description (human-visible)
- review.question_text / review.options[]
- interaction.body (outbound)
- playbook.body_md (read by LLMs AND Wilson)
- any document body (contracts, proposals, invoices)
30.3 Language fallback
- Portuguese customer → memo in Portuguese
- English codebase → task titles in English
- Set on
entity.props.preferred_language
31. MCP policy (#67)
Default: no persistent MCP server. Agents invoke hq <verb> via Bash.
Conditions for enabling hq mcp-serve:
- A tool is called >50× per agent per hour, measured over 7 days
- Subprocess-spawn latency (>100ms p95) measurably harms agent latency
- Observed in production, not theoretical
When enabled: socket-activated; starts on first request; exits after 5 min idle. Same on-demand pattern as Whisper STT.
32. The five agent surfaces (#21)
32.1 AGENTS.md hierarchy
- Root
AGENTS.md(~200 lines): what this is, orientation order, canonical commands, where state lives, forbidden actions, pointers MEMORY.md(Hermes-sized, ~2000 chars): minimum context for bounded-memory agents- Nested
AGENTS.mdin.skills/,src/cli/,src/workers/,src/runtime/,migrations/,state/,schemas/,agents/
32.2 .skills/ catalog
agentskills.io-compliant. Symlinked to ~/.claude/skills/capitao/. See §29.
32.3 state/ filesystem mirror
Maintained by view-renderer worker. ≤5s lag. Read-only for agents.
Layout:
state/
├── INDEX.md
├── focus.md · now.md · next.md · backlog.md
├── action-now.md ← the killer view (§40 of workflows)
├── projects/<slug>/README.md
├── ventures/<slug>.md
├── tasks/{focus,now,blocked,due-this-week}.md
├── agents/
│ ├── INDEX.md
│ ├── ventures/<role>.md
│ └── projects/<slug>.md
├── producers/INDEX.md
├── timeline/YYYY-MM-DD.md
└── system/
├── learning.md ← nightly autonomy digest
├── agent-incidents.md ← watchdog output
└── agent-costs.md ← per-agent daily/weekly
32.4 schemas/ JSON Schema catalog
Every node type + edge type + CLI command + webhook has a schema. Flat (no $ref to externals). Consumed by LangChain's strict tool mode.
32.5 hq CLI (§28)
33. Agent tool registry & per-agent curation (#69, #70)
Decision #70 takes the shell-invocable surface (#67, §28) and adds a structured layer: every worker action and every hq verb is registered once as an Anthropic-format tool definition (JSON Schema, strict: true, additionalProperties: false, optional input_examples), exposed natively to agent runtimes (Claude Agent SDK, Hermes, LangChain). Decision #69 sets the write-contract on top of that surface: agents write directly within their domain scope; gating is post-hoc and reserved for blast-radius actions only.
33.1 Single source of truth
src/tools/
├── registry.ts ← canonical TypeScript tool spec — one entry per tool
├── handlers/
│ ├── hq_task.ts ← per-tool handler + types
│ ├── worker_run.ts
│ └── … (one file per tool)
├── exporters/
│ ├── anthropic.ts ← → tools[] for Claude Agent SDK
│ ├── hermes.ts ← → ./.hermes/plugins/<name>.md
│ └── langchain.ts ← → BaseTool[]
└── schemas/tools.json ← auto-generated, committed for diffing
schemas/tools.md ← human-readable docs auto-rendered by `hq tools docs`
One source. Three exporters. Schema doc auto-published. Adding a 19th tool = one PR touching one directory. No drift across consumers.
33.2 The 18-tool catalog
Consolidated by domain — action enums collapse what would otherwise be 50+ verbs.
| Tool | Type | Purpose |
|---|---|---|
hq_search |
read | Polymorphic search across all node kinds (slug, text, vector, hybrid). Always-on. |
hq_describe |
read | Get one node's full state and direct edges. Always-on. |
hq_timeline |
read | Chronological event/interaction stream for an entity, project, or engagement. |
hq_trace |
read | Replay an LLM reasoning trace — inputs, decision, evidence, replay, explain. |
hq_memory |
read | Query agent memory by role, salience, time window, free-text. |
hq_producer |
read | Producer health — last_seen_at, throughput, error rate per data source. |
hq_playbook |
read | List/show playbooks with status (canary / active / decayed / archived). |
hq_tools |
meta | Discovery. Returns the full catalog with per-tool access status (granted / request). Cheap (~200 tokens). Loaded for every agent. |
hq_task |
write | `action='create' |
hq_engagement |
write | `action='create' |
hq_event_log |
write | Record event.kind ∈ {action, development, state_change, milestone}. Always-on for non-read agents. |
hq_entity |
write | `action='find' |
hq_interaction |
write | Log a manual interaction (note / call / in-person). Ingest workers log automatically. |
hq_review |
write | `action='list' |
hq_proposal |
gated | The path for the four gated cases (destructive / cross-scope / system-behavior / heuristic-flag — see §33.6 via #69). `action='propose' |
hq_autonomy |
control | `action='status' |
hq_agent |
control | `action='list' |
hq_examples_find |
read | RAG retrieval over the success-examples DB (#75 §5). --action-type X --tags Y --query Z --top N. Both agents call this before high-stakes drafts. |
hq_action_log / hq_edit_log |
read | Admin queries on the action ledger and Wilson-edit ledger. Restricted to operator. |
hq_examples_pin |
write | Manual pin of a success example so it never decays (operator invokes on Wilson's request). |
worker_run |
execute | Trigger any of the 16 workers on a specific input. Workers also run autonomously — see §33.5. |
Total tokens fully loaded ≈ 12.6 K. With per-agent curation (§33.4) the median agent loads ~5.5 K (~55% reduction).
33.3 Tool definition shape
Every tool follows the Anthropic tool-definition contract:
{
"name": "hq_<noun>",
"description": "<3-5 detailed sentences — the single biggest performance lever per Anthropic>",
"strict": true,
"input_schema": {
"type": "object",
"properties": { "action": { "enum": ["..."] }, "...": { "..." } },
"required": ["action"],
"additionalProperties": false
},
"input_examples": [ { "..." }, { "..." } ]
}
Output contract — every handler returns this shape:
{
"ok": true,
"action": "create",
"data": { "slug": "...", "id": "..." },
"audit_event_id": "evt_01HXY...",
"trace_id": "trc_01HXY...",
"error": null
}
High-signal returns only — slugs, UUIDs, counts. Never raw rows. audit_event_id enables one-command rollback via hq_proposal rollback; trace_id feeds hq_trace explain.
33.4 Per-agent curation (v1.8 collapsed roster)
Each agent's markdown config declares ## Tools with base: true (always-on) and a role-specific add: [...] list. The agent-supervisor (§22, #68) reads this at boot and assembles the registry subset passed as tools=[...] on every API request.
Base set (every agent, ~2 K tokens): hq_search, hq_describe, hq_event_log, hq_tools.
| Agent | Adds (on top of base) | Total | Tokens (≈) |
|---|---|---|---|
chief |
hq_timeline, hq_entity (read), hq_engagement (read+propose), hq_interaction, hq_proposal, hq_examples_find, hq_action_log (read-only on own trajectories), hq_trace_show, hq_project_run, gmail_thread_read, gmail_search (full archive), gmail_send, calendar_read, calendar_create_event, telegram_send_to_wilson, kb_search (read-only), Edit + Write (path allow-list), drive-mcp (read), calendar-mcp (read history), kb-mcp (read-only) |
26 | ~14.0 K |
operator |
hq_timeline, hq_entity (read+write), hq_engagement, hq_task, hq_proposal, worker_run, hq_trace, hq_memory, hq_producer, hq_autonomy, hq_agent, hq_playbook, hq_examples_find, hq_examples_promote, hq_examples_pin, hq_action_log, hq_edit_log, migration_plan, migration_apply, kb_ingest_run, graph_reconcile, finance_import |
26 | ~16.5 K |
| project-subagent (template, loaded on demand) | hq_timeline, hq_task, worker_run, hq_proposal, hq_examples_find |
9 | ~6.0 K |
Median: chief ~14.0 K / operator ~16.5 K (both grew with #77 / #76 deltas). Discovery: any tool not in an agent's set is one hq_tools(action='list') hop away; access expansion via hq_tools(action='request', tool='X', reason='Y') lands as a proposal for Wilson approval (#69 cross-scope class).
Note (v1.8 tool count): both agents now cross the ~20-tool embedding-search threshold (#74 amended #67; #77 expands chief). Loading strategy below (§33.7) opts both into Phase 2 (Tool Search Tool with defer_loading=true for cold-tier tools).
Edit/Write for chief — path allow-list (#77 §6): the runtime middleware enforces a path allow-list on chief's Edit and Write calls. Allowed paths: outputs/proposals/, outputs/contracts/ (in-draft only — post-dispatch routes through request_approval), outputs/briefs/, outputs/post-mortems/, outputs/status-reports/, outputs/customer-facing/<customer>/, agents/chief/personality.md, agents/chief/MEMORY.md. Writes outside the allow-list raise a typed cross_scope_violation error and a hint to ask operator. Violations are tracked as cross_scope_violation_count_14d (target: 0).
33.5 Dual-run workers — autonomous + tool
Decision #70 keeps autonomous worker execution intact and adds the tool surface as a second invocation path. Same handler, two callers.
| Worker | Autonomous trigger | Tool surface |
|---|---|---|
email_ingest |
IMAP IDLE (always-on) | worker_run(worker='email_ingest', params={force_refresh:true}) |
whatsapp_ingest |
daemon push | worker_run(worker='whatsapp_ingest', params={since_ts}) |
meeting_transcribe |
inotify on hot folder | worker_run(worker='meeting_transcribe', params={file_path}) |
enrich |
BullMQ on interaction.created |
worker_run(worker='enrich', params={interaction_id}) for re-enrich |
summarize |
BullMQ on payloads >50 K tokens | worker_run(worker='summarize', params={interaction_id}) |
reconcile_llm |
enrichment writer sub-call | worker_run(worker='reconcile_llm', params={candidate_set}) |
triage |
BullMQ on interaction.enriched |
worker_run(worker='triage', params={interaction_id}) |
view_renderer |
BullMQ on every ops.outbox event |
worker_run(worker='view_renderer', params={node_type, slug}) debug |
embed_worker |
BullMQ on nodes.text_changed |
worker_run(worker='embed_worker', params={node_ids}) |
kb_indexer |
inotify on wiki/ |
worker_run(worker='kb_indexer', params={path}) |
profile_worker |
BullMQ on entity-touching outbox event | worker_run(worker='profile_worker', params={entity_id}) |
review_applier |
BullMQ on review.resolved |
worker_run(worker='review_applier', params={review_id}) |
semantic_dedup |
nightly cron | worker_run(worker='semantic_dedup', params={window_days}) |
playbook_proposer |
nightly cron | worker_run(worker='playbook_proposer', params={trigger_kind}) |
calibration_analyzer |
weekly cron | worker_run(worker='calibration_analyzer', params={since:'7d'}) |
calibration_applier |
event on calibration_proposal.created |
worker_run(worker='calibration_applier', params={proposal_id}) |
agent_research |
22:00 daily cron | worker_run(worker='agent_research', params={window:'24h'}) |
proposal_analytics_mirror |
5-min poll | worker_run(worker='proposal_analytics_mirror', params={customer_slug}) |
toconline_sync |
30-min poll | worker_run(worker='toconline_sync', params={since_ts}) |
shopify_sync |
webhook + 1-h poll | worker_run(worker='shopify_sync', params={shop, since_ts}) |
todoist_mirror |
webhook + 5-min poll | worker_run(worker='todoist_mirror', params={direction}) |
The producer registry (#26) attributes both: triggered_by: 'cron:agent_research' vs triggered_by: 'agent:project:gopecauto'.
33.6 Direct-write default — friction-floor-zero (#69)
Decision #69 sets the write contract on top of the tool registry. Default = direct write with audit trail. Every tool call lands in ops.outbox + ops.llm_call_log + agent_runs; audit_event_id enables one-command rollback (hq_proposal rollback <id>).
Synchronous gating reserved for four classes only:
- Destructive actions — financial / legal / compliance writes, schema removals, edge-orphaning merges, deletion of another agent's writes.
- Cross-scope writes — project-A agent touching project-B's subgraph; any agent touching company-level state or another agent's config.
- System-behavior changes — autonomy thresholds, agent prompts, model routing, canary fractions (continues through #49).
- Heuristic flags — magnitude cap exceeded, novelty, conflicting evidence, security-scan hit.
These four route via the request_approval custom tool to Wilson (#69 + #74). The handoff writes a proposal node (audit trail) and a Telegram nudge; Wilson's accept/reject lands as an ops.wilson_edits row tied to the originating action. There is no terminal-writer chokepoint for routine writes; the action ledger (#75) provides the audit plane.
33.7 Loading strategy
| Phase | When | Strategy | Token cost |
|---|---|---|---|
| Phase 1 — Now | ≤ 25 tools per agent's curated set | Load all curated tools directly. No Tool Search Tool. | Median 5.5 K / agent |
| Phase 2 | When agent-specific catalogs exceed 30 tools | Tool Search Tool — keep hq_search / hq_task / hq_describe / worker_run / hq_event_log always loaded; defer the rest with defer_loading: true. |
~85% reduction |
| Phase 3 | Bulk orchestration (e.g., operator's 22:00 loop mining 200 incidents + 50 wilson_edits) |
Programmatic Tool Calling — model writes Python in code-execution sandbox; intermediate results never enter context. | ~37% reduction on bulk tasks |
33.8 Restrictions matrix (v1.8)
| Tool / verb | Granted to |
|---|---|
hq_autonomy (any verb) |
operator (with Wilson confirmation via request_approval for freeze and kill) |
hq_agent write verbs (pause/resume/restart/handover) |
operator |
hq_proposal rollback |
operator (chief proposes via mailbox handoff) |
hq_entity merge |
nobody direct — always proposal via request_approval |
gmail_send |
chief only (operator has no outbound email permission). Default mode: approval-required — every call wrapped by request_approval per #74 §9. Direct send only after Wilson runs hq agent ungate chief --action=email_send [--scope=…]. |
migration_apply |
operator only, plus request_approval if migration is irreversible |
Anyone needing a restricted tool escalates via hq_tools(action='request', tool='X', reason='Y') — request becomes a proposal Wilson approves.
33.9 What this changes in the existing plan
- Replaces the wording in superseded #51/#52 (Tier 1 / 10-agent) — both top agents now write directly within their respective scopes;
request_approval(#69) is the gate for the four destructive classes. - Extends #54 ("Workers as tools") — workers stay shell-invocable, AND are first-class registry tools with structured schemas. Same handler, two surfaces.
- Updates the agent-config
## Toolsmarkdown section in §17.3 + §18.3 — now distinguishes SDK built-ins, Capitão registry tools, and in-process MCP servers.
33.10 Inter-agent CLI mailbox (#71)
Full rationale in DECISIONS.md #71. CLI verbs:
hq agent send <to> <message> # fire-and-forget
hq agent ask <to> <message> --timeout 30s # sync RPC, blocks on reply (hard ceiling 5 min)
hq agent broadcast <group> <message> # @ventures | @projects | @all
hq agent reply <message-id> <body>
hq agent inbox [--unread] [--from <agent>]
hq agent roster [--scope <team|project>]
hq agent presence <role> # last-seen, current state
Wire model. hq agent ask writes a row into ops.agent_inbox (extended per #60 amendment: from_agent TEXT, correlation_id UUID, expects_reply BOOLEAN) and emits an agent.message outbox event. Supervisor (#68) routes the event to the recipient — fork-execing it cold if not warm. Recipient calls hq agent reply <message-id> <body>, which emits agent.reply.<correlation_id>. Caller's runtime LISTENs on that channel and unblocks. Every agent stays a top-level SDK process (no SDK nesting; preserves #50). Sub-30 ms transport when both ends are warm; ~1.5 s when target is cold.
SDK surface. Inside each agent's session, a custom SendMessage tool (Anthropic-pattern JSON-schema, strict: true, per #70) wraps hq agent send / hq agent ask as a subprocess. This is NOT in the base toolset; agents declare it explicitly via add: [send_message]. from_agent is required on all agent-to-agent rows; NULL means Wilson. from_agent='wilson' magic strings are forbidden (CHECK constraint).
Coordination class split (amends #50). Synchronous Q&A and short-form delegation use the CLI mailbox (the fast path). Multi-step proposals, cross-tier reviews, and anything that must survive a process exit or be replayed by hq state rebuild continue through the graph (the primary path). Rule of thumb: if you'd want it replayable, it goes through the graph.
Part VII — Safety, Governance, and Learning
34. Autonomy framework (#49) — four layers
Every self-improvement (playbook promotion, calibration change, prompt tweak, config change) flows through:
Layer 1: Pre-apply gates
✓ dangerousness check (destructive → review queue)
✓ sample size (n ≥ threshold)
✓ magnitude cap (per-week delta limits)
✓ security scan (invisible Unicode, prompt-injection patterns, fenced context)
✓ freeze state (hq autonomy freeze → hold)
│ passes
▼
Layer 2: Canary rollout
• test_fraction = 20% traffic (default)
• adaptive window: 48h min, 8 uses target, 14d max
• canary_id on every reasoning_trace
│ window closes
▼
Layer 3: Auto-decide
• regression → rollback + alert
• no change → shelved (30d cooldown)
• improvement → promote + digest note
│ promoted
▼
Layer 4: Drift monitor (continuous, 7-day rolling)
• metric degradation > tolerance for 48h → auto-rollback
• reason logged + Telegram to Wilson
Destructive carve-out (always reviews, never canary):
- Entity merges orphaning ≥5 edges
- Prop changes to fields tagged financial / legal / compliance
- Node deletions
- Schema migrations removing enum values
- Prompt changes removing confidence-gating
- Playbooks activating on financial workflows
- Agent config changes affecting write scope or budget hard caps
Emergency controls:
hq autonomy freeze [--reason "..."] [--until <ts>]hq autonomy thawhq autonomy kill --loop {playbook|calibration|all}hq proposal rollback <id>
35. Confidence gating
| Confidence | Action |
|---|---|
| ≥ 0.95 | Auto-apply deterministically |
| 0.85 – 0.95 | Auto-apply + soft review (compounds to profile) |
| 0.70 – 0.85 | Auto-apply with high scrutiny (canary eligible) |
| < 0.70 | Queue as review node; human decides |
| Destructive, any confidence | Queue as review node |
36. Watchdog tiers (#64)
| Tier | Trigger | Action |
|---|---|---|
| Soft | First timeout, suspected loop | DM agent: "check in" — 2 min grace |
| Medium | Confirmed loop, repeated timeout, error burst | Force-stop current query, fresh session on next trigger |
| Hard | Crash, budget runaway, RSS explosion | systemd restart + review |
| Critical | >5 hard trips in 1 hour | systemd-stop + quarantine status + review |
Every trip writes agent_incident node. Meta-watchdog guards watchdog.
37. Direct-write discipline (#69, #74 — replaces single-writer)
The v1.7 single-writer model (terminal writes only by triage-dispatcher) is dropped. With only two top agents, write-race risk is dominated by accidental overlap, not by deliberate dedup, and is solved with simpler tooling:
- Per-agent scope. chief writes outbound (email, customer artifacts). operator writes inward (graph, files, code, finance). The tool registry (#70) blocks cross-scope writes at the harness layer.
- Friction-floor-zero with destructive gate (#69). Each agent writes directly within its own scope. Destructive, cross-scope, system-behavior, and heuristic-flagged actions route to the
request_approvalcustom tool, which becomes aproposalnode visible to Wilson. - Action ledger as single source of truth (#75). Every write produces one
ops.agent_actionsrow, regardless of agent. Conflicts surface as duplicate-target rows with overlapping timestamps; a nightly reconcile job (operator @ 03:00) flags them for the morning brief. - Inter-agent coordination via mailbox (#71). When chief needs a graph mutation, it
hq agent ask operator. Operator owns the write. The handoff is logged as parent/child action rows.
Prevents write races (scope isolation), duplicate work (mailbox handoff), and actor-attribution confusion (action ledger).
38. Memory invariants
- No
<task-complete/>finalizes without ≥1memory_entryOR explicitmemory_entry_count=0reason - No fresh session starts without successful reconstruction query
- No
memory_entryis deletable (archive only) - Every
memory_entryhasproduced_by → reasoning_trace - Memory pool per agent capped at ~200 entries (enforced by memory-tender)
39. Action ledger + trajectory + success examples (#75 + #76) + Daily learning loop (#65 amended by #74)
The v1.7 "daily agent-research" agent collapses into operator's 22:00 learning loop. The loop runs four passes — trajectory annotation review, success-pattern promotion (text + procedure), incident mining, and roll-up — as a single Opus session, gated by friction-floor-zero (#69), and writes into the three-layer ledger: actions (#75 §1) + trajectories (#76) + success examples (#75 §3 + #76 trajectory_summary).
39.1 Action ledger schema (#75 §1, amended by #76 §1)
ops.agent_actions(id, actor, session_id, task_id, action_type, target_kind, target_id, input, output, status, model, cost_usd, duration_ms, parent_id, embedding, created_at). Every tool call by either top agent or any subagent appends one row. The task_id UUID groups all actions within a single agent task (one inbound event → one task → one task_id; subagents inherit task_id from their parent). The agent_runtime PostToolUse hook writes asynchronously to a Valkey stream; ledger-flusher drains the stream into Postgres in 1-second batches. Storage cost ~50 KB/row × 50 rows/day × 365 days ≈ 900 MB/year. Kept forever.
39.2 Wilson edit log (#75 §2)
ops.wilson_edits(id, action_id, edit_type, final_output, diff_summary, diff_score, notes, edited_at). Captured via three paths:
| Path | Detection | edit_type values |
|---|---|---|
| Email send | Outbox watcher diffs chief's draft against the actually-sent message in Gmail |
accepted, tweaked, rewrote, rejected |
| Task / proposal / file edit | hq CLI wrappers on every Wilson-driven mutation persist pre/post snapshots |
accepted, tweaked, rewrote |
| Acceptance with no change | Outbox watcher emits edit_type='accepted', diff_score=0.0 after 24 h with no Wilson modification |
accepted |
| Rejection | hq action reject <id> --reason=… |
rejected, abandoned |
Diff summary: 1-line Haiku 4.5 generation (~$0.0002 per diff). Diff score: deterministic cosine distance on Voyage 3.5-lite embeddings (ADR #82) — the same embedding service that powers nodes.embedding. Each diff scoring call is ~2 short embed requests (draft + sent) at $0.02/M tokens; ~50 diffs/day × ~500 tokens each = ~$0.0005/day, rounded into the embedding line below.
39.2.5 Trajectory capture and per-action annotation (#76)
ops.email_reply_sessions — one row per email reply attempt. Captures: task_id, thread_id, customer_slug, inbound_thread (full snapshot), classification, draft_output, draft_model, trajectory_action_ids[] (ordered list of action_ids — the procedure), retrieved_example_ids[] (which success-examples chief used as RAG anchors), approval_status, final_output, final_diff_score. Schema is generalizable to other artifact types (proposal_sessions, adr_sessions, kb_ingest_sessions) in later waves; the email case is Wave 1 priority.
ops.action_annotations — Wilson's per-action grades. Captures: action_id, task_id, grade ∈ {good, bad, missing, unnecessary}, note, annotator, created_at. Annotations are written from two paths:
| Path | Trigger |
|---|---|
| Inline approval UI | Wilson clicks thumbs/comment on any action in the trajectory pane while reviewing a draft (request_approval page) |
| Retrospective CLI | hq action annotate <action_id> --grade=… --note="…" |
For grade='missing', the annotation is attached to the closest preceding action_id with a note describing what should have happened; the runtime renders this as an interleaved gap when displaying the trajectory.
Approval UI three-pane layout (#76 §4): left = inbound thread; center = chief's draft + retrieved success examples; right = ordered trajectory list with per-action thumbs-up/thumbs-down/missing-step buttons + comment boxes. Wilson can: (1) approve/edit/reject the draft (writes wilson_edits), (2) grade any action (writes action_annotations), (3) insert a missing step (writes action_annotations with grade='missing').
CLI surfaces:
hq trace show <task_id> # render full trajectory + annotations
hq trace gaps --customer=<slug> --since=14d # all `missing` annotations grouped by pattern
hq action annotate <action_id> --grade=… --note="…"
hq examples find --include-trajectory # default true for chief and operator (#76 §6)
39.3 Success examples DB and auto-promotion (#75 §3, amended by #76 §7)
Operator at 22:00 (Opus, plan-mode) runs the two-axis auto-promotion pipeline against ops.wilson_edits × ops.action_annotations from the past 24h. The text axis (Wilson edited the artifact) and the trajectory axis (Wilson graded the procedure) compose:
| Edit type × trajectory annotations | Promotion |
|---|---|
accepted AND no bad or missing annotations |
auto_promoted after 7 days (clean text + clean process) |
accepted AND ≥1 missing annotation |
auto_promoted_with_caveat — the procedure embeds the missing note as a corrective; future retrievals see "next time also do X" inline |
tweaked AND diff_score < 0.20 AND no bad annotations |
auto_promoted (Wilson liked both procedure and bones) |
tweaked with ≥1 bad annotation |
NOT promoted — extracted as anti-pattern lesson with the specific bad action highlighted |
rewrote OR rejected |
NOT promoted — anti-pattern lesson; trajectory annotations included in the lesson body |
wilson_pinned (manual via hq example pin) |
Bypasses all rules; never decays |
The auto_promoted_with_caveat class is novel and important: it captures the case where Wilson said "the email was fine, but next time also check X." Too valuable to lose, not a clean exemplar — so the markdown mirror includes the missing step as a prescriptive instruction in the procedure section.
Promoted examples mirror to /memories/success-examples/<action_type>/<id>.md with PII redaction (Haiku strips names + addresses; replaces entity → <customer>, email_address → <email>, person names → <contact>). The mirror format includes a "What I queried (the trajectory)" section listing each action with its grade (✓ / ✗ / +missing) and Wilson's note — see #76 §5 for the full template.
39.4 Retrieval (used at draft time, trajectory-aware per #76 §6)
hq examples find --action-type <type> --tags <…> --query "<…>" --top <N> [--include-trajectory=true] returns top-N markdown cards. The --include-trajectory flag (default true for chief and operator; opt-out for cheap lookups) returns the full trajectory section — the procedure that produced the validated artifact — alongside the artifact itself.
Both agents are required (per their workspace CLAUDE.md) to call it before:
- chief: every outbound email draft (regardless of approval-required vs direct mode).
- chief: every customer-artifact draft (Opus tier).
- operator: every meaningful file restructure or code commit.
- operator: every ADR draft.
The retrieved cards are appended to the agent's prompt as a <past-wilson-validated-trajectories> block. The agent is instructed to follow the procedure in the retrieved cards before drafting (run the same queries, ask operator/chief in the same order, retrieve the same kinds of context), not just to mimic the text. Tone is the surface; procedure is the substance. This is trajectory-RAG over the agent's own validated outputs — no synthetic training data, no fine-tuning.
39.5 Incident mining (the other half of the 22:00 loop)
The same Opus pass also mines agent_incident nodes, ops.agent_error_log, the past 24h of action-ledger rows where status='failed', and operator's own self-reflection. Outputs four kinds of proposals (unchanged from v1.7):
| Kind | Routes to | Example |
|---|---|---|
| Prompt tweak | ops.improvement_proposals, #49 canary |
"chief hit 12 activity-timeouts; add 'if no new info in 5 turns, task-complete' rule" |
| New skill | operator drafts; Wilson approves | "3 actions independently derived Portuguese deadline extraction — create extract-deadline skill" |
| Config change | #49 canary | "project:garq-pdm exceeded budget 5/7 days; raise hard cap $5→$7 OR add summarize gate" |
| Seed case / test | agents/operator/incident_corpus/ |
"This loop pattern becomes an eval fixture" |
39.6 Weekly and monthly aggregations
Operator on Sunday 04:00 (Haiku — cheap rollup) concatenates the week's success-examples + lessons into a single markdown index at /memories/success-examples/_weekly/<YYYY-Www>.md. Month-end concatenates four weeks. The morning brief on the first weekday of each week pulls the prior week's index as a "What we learned" section.
39.7 Cost and budget
| Component | Daily cost (estimate) |
|---|---|
| Diff summary (50 actions/day × Haiku ~$0.0002) | $0.01 |
| Trajectory summary generation per email_reply_session (~30/day × Haiku ~$0.0005) | $0.015 |
| Embedding (50 actions × Voyage 3.5-lite, ~500 tokens each, $0.02/M) | ~$0.0005 (≈ $0.02/month — included in main Voyage spend line) |
| Auto-promotion pipeline (Opus, ~35 K tokens — slightly larger than v1.8.1 because trajectory annotations are now an input) | $0.50 |
| Markdown mirror generation with trajectory section (Haiku, ~12 K tokens) | $0.006 |
| Mailbox + watchdog overhead | $0.00 |
| Total | ~$0.53/day, capped $0.60 |
Hard cap raised from $0.50 to $0.60 to cover trajectory-summary generation. Enforced by daily_usd_hard on operator's budget plus per-call cost telemetry.
40. Playbooks (#47)
40.1 Creation
Nightly playbook-proposer (driven by operator's 22:00 learning loop) scans for:
reconcile-llm≥0.90 confidence + ≥3 evidence interactions + no reversal within 7dreviewresolved with annotation- Chained outbox sequence completed cleanly
- Error recovery after retries
Draft body: ≤4096 chars. Security scan: invisible Unicode, prompt-injection patterns, financial/legal/compliance keyword flags. Store at wiki/playbooks/<category>/<slug>.md; mirrored as playbook node.
40.2 Lifecycle
Canary 20% → auto-decide ≥75% → promote; <50% → archive. Decay: 90d unused → demote, 180d → archive. Drift monitor 7d rolling.
40.3 Consumption
Hybrid search in buildContext() returns top-2 status IN (canary, active) playbooks. Fenced as <playbook-context>[System note]...</playbook-context> to guard prompt injection.
41. Calibration loop (#48)
41.1 Observation tables (append-only)
ops.llm_call_logops.triage_override_logops.reconcile_auto_apply_logops.review_feedback_logops.schema_proposals_logops.applied_proposals_log
41.2 Pipeline
Weekly calibration-analyzer (Sun 09:00) detects override patterns; emits calibration_proposal to ops.pending_proposals. Hourly calibration-applier runs proposals through #49 framework. Config hot-reload on SIGHUP (no restart).
41.3 Tunables
config/reconcile.md— reconcile confidence thresholdsconfig/prompts.md— prompt versions per agentconfig/triage-rules.md— triage priority/owner rules
All markdown — hot-reloadable by runtime on SIGHUP.
42. Schema evolution (#38)
Enrichment output includes pending_schema_proposals[] (new enum values seen). Monthly (1st Sunday) schema-analyzer aggregates, emits to review queue. Approved → enum migration + re-enrichment of historical interactions (12-month window).