Agent Coordination

Here is the thing about multi-agent coordination that most systems get catastrophically wrong: they treat it as a scheduling problem. You have N agents and M tasks, so you build a scheduler, add a queue, maybe sprinkle in some load balancing, and declare victory. Then two agents edit the same file simultaneously and everything catches fire.

Obsidian treats coordination as what it actually is — a governance problem. Work must be discovered, claimed, and executed under rules that prevent conflict without requiring a central authority to micromanage every decision. The organizing principle is deceptively simple: if work is on your hook, you run it. No waiting for instructions. No asking permission. Just execute.

This is Constitution Principle 4 made operational: autonomy under constraint. Agent sovereignty is real, but it operates within boundaries — file reservations, claim protocols, capacity limits — that make autonomous action safe.

Coordination Architecture

┌─────────────────────────────────────────────────────────────┐
│                  WORK DISCOVERY                              │
│   Task Queue ──▶ Claim ──▶ Reserve Files ──▶ Execute        │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                  THE VAULT                                   │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ Sessions │ │  Memory  │ │  Tasks   │ │ Contacts │       │
│  │ (JSONL)  │ │(MD+Vector)│ │ (Beads)  │ │ (Graph)  │       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                  COMMUNICATION                               │
│   Agent Mail ◄──▶ Courier ◄──▶ Strike Coordination          │
└─────────────────────────────────────────────────────────────┘

Work Discovery and the GUPP Principle

GUPP stands for “Got it? U Process it. Period.” It is not a suggestion. When work appears on an agent’s queue, the agent is responsible for that work. The agent must process it or explicitly delegate it. There is no third option involving waiting around to see if someone else picks it up.

This principle emerges directly from Sovereign Autonomy — genuine decision-making power means genuine responsibility. An agent that can choose how to work but not whether to work is an agent that actually ships things.

Obsidian supports three work discovery modes. Pull mode means the agent actively polls for available work at a configurable interval. Push mode means tasks are assigned directly to a specific agent’s queue with overflow handling (reject, redirect, or buffer). Hybrid mode combines both — prefer push, fall back to pull after a timeout. Most production deployments use hybrid, because real workloads are neither perfectly predictable nor perfectly chaotic.

The flow itself is a four-stage pipeline: a task appears (from a user request, another agent, a scheduled job, or an event trigger), the task router analyzes requirements against agent skills and availability, the task is either pushed to a specific agent or placed in a shared queue for pull, and then the claiming agent takes atomic ownership.

Task Claiming

Claiming a task is an atomic operation. This is critical. Two agents cannot claim the same task. The protocol works like a compare-and-swap: check the task exists and is pending, verify no active claim exists, confirm the agent is eligible and has capacity, then perform an atomic claim that records the agent, timestamp, and expiration.

Claims expire. If an agent claims a task but does not begin executing within five minutes, the claim releases automatically. This prevents the coordination equivalent of a dead letter — work that appears assigned but is actually sitting in limbo because the claiming agent crashed or got distracted.

interface TaskClaim {
  taskId: string;
  agentId: string;
  claimTime: string;
  expiresAt: string;  // Claim expires if not started
  status: "pending" | "granted" | "denied" | "expired" | "released";
}

Assignment strategies determine which agent gets which task. Round-robin distributes evenly. Least-loaded assigns to the agent with the fewest active tasks. Skill-match picks the agent with the highest skill overlap (minimum match score configurable). Affinity prefers agents who recently worked on related files. Load balancing ensures no agent drifts more than a configurable percentage above the mean workload.

File Reservations

File reservations are the mechanism that prevents the most common multi-agent failure mode: two agents editing the same file at the same time, producing conflicts that require manual resolution and erode trust in the entire system.

A reservation is a lock with three types. Exclusive means only the holding agent can write — this is the default for edits. Shared allows multiple agents to read concurrently. Intent is a soft lock that signals planned modification without preventing reads.

The conflict resolution logic is straightforward: if no reservation exists, grant it. If a shared reservation exists and the request is also shared, grant it (multiple readers are fine). Otherwise, deny with a reason. Denied agents can wait (up to 60 seconds), queue their request (up to 5 deep), or escalate to the Warden after a timeout.

Reservations release automatically on task completion, agent idle, or agent error. The Warden can forcefully break a reservation when necessary — notifying the original holder and attempting to preserve uncommitted changes. This is Systemic Integrity in practice: the system protects itself from coordination deadlocks even when individual agents fail.

# Reserve a file for exclusive editing
obsidian reserve file src/component.tsx --agent=worker-alpha --type=exclusive

# List active reservations
obsidian reserve list

# Release when done
obsidian reserve release <reservation-id>

Agent Mail

The mail system enables asynchronous communication between agents. This is not a chat system — it is a structured message-passing protocol with types, priorities, threading, delivery confirmation, and read receipts.

Every message has a type that declares its intent: info for general information, request for action needed, response for replies, notification for system events, handoff for work transfers, escalation for problem elevation, and alert for critical issues. Priorities range from low through normal and high to urgent. Messages can carry attachments referencing files, tasks, memory documents, or sessions.

The delivery flow is rigorous: compose and send, validate and route through the Courier, confirm delivery, issue read receipt, and thread replies. Failed deliveries are tracked. Messages expire after a configurable TTL. Nothing disappears silently into the void — if it isn’t observable, it doesn’t exist.

Mail groups provide addressing for common patterns: all agents, all workers, domain-specific teams (frontend, backend), role-specific groups (operations), and on-call rotations backed by schedule configuration.

Strikes: Coordinated Multi-Agent Operations

A Strike is a coordinated effort where multiple agents work together on a task too large for any single agent. Think of it as Fractal Delegation applied to a specific campaign — a coordinator decomposes the work into scoped phases, agents claim their segments, and sync points ensure the pieces fit together.

A strike configuration declares the composition (which agents, what roles, what file scopes), the phases (analysis, implementation, testing, integration), phase dependencies, sync points, communication channels, and conflict resolution rules.

Strike Lifecycle

1. INITIATION
   Coordinator creates strike, reserves agents, opens channel

2. MOBILIZATION
   Agents join, receive briefing and scope, claim file reservations

3. EXECUTION (per phase)
   Agents execute their scope, report progress, sync at phase end

4. SYNCHRONIZATION
   All agents pause, coordinator reviews, conflicts resolved

5. COMPLETION
   Final integration, release reservations, generate report

The lifecycle ensures that parallel work converges cleanly. After each phase, a sync point forces all participating agents to pause while the coordinator reviews progress and resolves any conflicts before proceeding. This is not a bottleneck — it is a circuit breaker that prevents small misalignments from compounding into large ones.

# Launch a strike from configuration
obsidian strike launch config/strikes/large-refactor.yaml

# Monitor progress
obsidian strike status <strike-id>

# Manual sync point
obsidian strike sync <strike-id>

# View results
obsidian strike report <strike-id>

The Vault

The Vault is the shared knowledge repository for all agents — the collective memory of the system. It stores four categories of information, each optimized for its access pattern.

Sessions are stored as JSONL (JSON Lines) for efficient append-only logging. Every session records the agent, timestamps, actions taken, decisions made, and outcomes achieved. This is the audit trail that makes coordination traceable. Every event — task claimed, file written, mail sent, error encountered, decision recorded — is a line in the log.

Memory combines human-readable Markdown documents with vector embeddings for semantic search. Organized by topic with tags and relationships, memory is where agents store learnings, patterns, and institutional knowledge. The dual format means humans can read and edit the knowledge base directly while agents can search it semantically.

Tasks use the Beads format — a tracking structure with status lifecycle (backlog → pending → claimed → in_progress → review → completed), parent-child relationships, dependency chains, priority levels, and full event history. Every status change is attributed to an agent and timestamped.

Contacts maintain a graph of agent relationships — nodes with attributes (type, role, domain, skills, reliability score) and edges with interaction metrics (message count, collaboration frequency, last interaction). This graph powers affinity-based task assignment and helps agents know who to contact for domain-specific questions.

Obsidian uses hybrid search combining BM25 keyword matching with vector-based semantic search, merged through Reciprocal Rank Fusion (RRF). This matters because keyword search finds exact matches while semantic search finds conceptually related content — and production queries need both.

Hybrid Search Pipeline

                  ┌──────────┐
                  │  Query   │
                  └────┬─────┘
                       │
          ┌────────────┴────────────┐
          ▼                         ▼
 ┌────────────────┐       ┌────────────────┐
 │  BM25 Search   │       │ Vector Search  │
 │  (Keywords)    │       │  (Semantic)    │
 └───────┬────────┘       └───────┬────────┘
         │                        │
         └───────────┬────────────┘
                     ▼
        ┌────────────────────────┐
        │  Reciprocal Rank      │
        │  Fusion (RRF)         │
        │                       │
        │  Score = 1/(k+rank₁)  │
        │       + 1/(k+rank₂)   │
        └───────────┬───────────┘
                    ▼
        ┌────────────────────────┐
        │   Ranked Results       │
        └────────────────────────┘

The default fusion weights favor semantic search (0.6) over keyword search (0.4), because in practice agents are more likely to search for concepts than exact strings. Both weights are configurable. Results include scores and highlights, with filters available for domain, topic, agent, date range, and tags.

# Hybrid search across the vault
obsidian vault search "authentication refactor" --mode=hybrid

# Domain-scoped search
obsidian vault search "react hooks" --domain=frontend --limit=20

Context Priming

When an agent starts a session, it does not begin with a blank slate. Context priming loads relevant knowledge from the Vault — recent sessions from the same domain, relevant memory documents, active tasks, unread mail, and recent learnings — synthesized into a context block that fits within a configurable token limit.

Each agent’s priming configuration declares which vault queries to run, which memory topics to always include, how many recent sessions to surface, and how much mail to load. The result is an agent that begins every session already aware of what happened recently, what decisions were made, and what work is waiting.

This is the operational consequence of Constitution Principle 7: learn continuously from production reality. An agent that starts fresh every session is an agent that repeats mistakes. An agent that primes from the Vault is an agent that accumulates institutional knowledge — and that knowledge compounds.

Command Reference

# Work discovery and claiming
obsidian work discover --agent=worker-alpha
obsidian work claim <task-id> --agent=worker-alpha
obsidian work release <task-id>

# File reservations
obsidian reserve file <path> --agent=<id> --type=exclusive
obsidian reserve list
obsidian reserve release <reservation-id>

# Agent mail
obsidian mail send --from=<id> --to=<id> --subject="..." --body="..."
obsidian mail list --agent=<id>
obsidian mail read <mail-id>

# Vault operations
obsidian vault session list
obsidian vault memory search --query="..."
obsidian vault task list --status=pending
obsidian vault contact graph --format=dot

# Strikes
obsidian strike launch <config-path>
obsidian strike status <strike-id>
obsidian strike sync <strike-id>
obsidian strike report <strike-id>

If work is on your hook, you run it. That is not a management philosophy — it is a coordination protocol, and the entire system is built to enforce it.