Fractal Architecture

The Shape of Scale

Most orchestration systems scale by getting bigger. More servers, more load balancers, more configuration files that no one fully understands. Obsidian scales by getting deeper.

The fractal architecture is exactly what it sounds like: an Obsidian instance can spawn child instances, which can spawn their own children, which can spawn their children, forming a self-similar hierarchy of arbitrary depth. Each level looks the same. Each level operates the same. The pattern that works for one instance works for a thousand.

This isn’t architectural cleverness for its own sake. This is how you run a planetary-scale agent orchestration system without a single point of failure, without a single human bottleneck, and without a configuration management team that outnumbers your engineering org.

The Hierarchy

Every instance in the fractal hierarchy occupies one of three roles:

Citadel — The root instance. There is exactly one. It owns everything: every token , every agent slot, every permission. The Citadel doesn’t do work itself — it delegates downward and adjudicates upward. Think of it as the constitutional monarch of your infrastructure.

Branch — Mid-level instances that both receive delegation from above and delegate further downward. A branch managing US-East production might have children for backend, frontend, and data pipeline teams. Branches are where organizational structure maps to computational structure.

Leaf — Terminal instances that do actual work. No children, no delegation — just agents executing tasks . Leaves are where tokens get spent and results get produced.

Namespace Addressing

Every instance has a dot-separated namespace that encodes its position in the hierarchy:

citadel                        # Root
citadel.us-east                # Regional branch
citadel.us-east.prod           # Environment branch
citadel.us-east.prod.backend   # Team leaf

Namespaces support relative addressing — ../staging from citadel.us-east.prod resolves to citadel.us-east.staging. This means instances can reference siblings without knowing the full tree structure. Move a subtree, and internal references still work.

Communication Patterns

Instances communicate in exactly three directions, and the direction determines the semantics:

Upward Escalation (Child → Parent)

Children escalate what they can’t handle: errors requiring parent intervention, resource limit increases, permission elevation requests, health status reports. This is how problems flow toward the entity with authority to solve them.

Downward Delegation (Parent → Child)

Parents delegate what they won’t handle themselves: task assignments, configuration updates, resource allocation changes, upgrade commands, termination requests. This is how work flows toward the entity positioned to execute it.

Lateral Sharing (Sibling ↔ Sibling via Parent)

Here’s the rule that makes the whole thing work: siblings never communicate directly. All lateral communication routes through the common ancestor. Child A sends to Parent, Parent routes to Child B. This sounds inefficient until you realize it means the Parent maintains complete visibility over all cross-cutting communication. No shadow channels. No surprise dependencies. Every interaction is observable at the level that owns both participants.

Resource Inheritance

Resources flow downward through the hierarchy like a budget through a corporate org chart — except this one actually enforces its constraints.

The Citadel starts with the total allocation. It distributes to regional branches. Branches distribute to environment instances. At every level, four rules hold:

  1. Children cannot exceed parent allocation. Ever.
  2. Sum of children cannot exceed parent total. Conservation of tokens.
  3. Parent keeps a reserve for emergencies. Default 20%.
  4. Unused allocation can be reclaimed. Idle resources get redistributed to siblings that need them.

Dynamic Reallocation

When Child B is maxed out and Child A is sitting idle, the parent doesn’t wait for a human to notice. It identifies the imbalance, reallocates from A to B, and logs the decision. Constraints: you can’t reduce allocation below current usage, every child keeps a minimum, and the reserve is sacred except for genuine emergencies.

This is the constitutional principle in action — autonomy under constraint . Each instance operates independently within its budget . When the budget needs to change, the parent adjusts. No tickets. No approval chains. Just math.

Autonomy and Disconnection

What happens when a child loses contact with its parent? It keeps working.

Each instance has a configurable disconnection tolerance — by default, five minutes of autonomous operation. During disconnection, the child can execute queued tasks , respond to agents , and log locally. What it cannot do: spawn new children, modify its own configuration , or access parent secrets .

When the connection restores, the child syncs everything that happened during the outage: queued escalations, health status, task completions, local metrics . The parent processes the backlog and the hierarchy reconverges. No data loss. No inconsistent state. Just a brief period of reduced capability followed by automatic recovery.

This is graceful degradation by design. The system doesn’t crash when the network hiccups — it narrows its operational scope and waits for reconnection with exponential backoff.

Self-Improvement

Here’s where the fractal architecture earns its most unsettling capability: Obsidian instances can upgrade themselves.

The upgrade flows bottom-up with a canary gate:

  1. Registry announces a new version
  2. Citadel decides to upgrade (or an operator triggers it)
  3. Canary phase — 10% of children upgrade first
  4. Monitor canary — health OK? Error rate normal? Performance acceptable?
  5. If yes: rolling upgrade across remaining children
  6. If no: rollback canary, abort everything
  7. After all children succeed: Citadel upgrades itself — snapshot state, start new process, transfer connections, terminate old process

Maximum self-upgrade downtime: 5 seconds. If anything goes wrong, automatic rollback triggers at a 5% error rate or 3 consecutive health check failures.

The system that manages your agents can update itself without human intervention, using the same chaos-tested , canary-gated, automatically-reversible deployment patterns it uses for everything else.

Instance Configuration

A complete instance is configured through a single YAML file that declares its identity, hierarchy position, resource budgets , autonomy rules, communication settings, upgrade strategy, observability pipeline, and security posture:

instance:
  namespace: "citadel.us-east.prod"
  role: "branch"

hierarchy:
  parent:
    namespace: "citadel.us-east"
    endpoint: "grpc://us-east.obsidian.internal:9000"
    auth:
      type: "mtls"

resources:
  tokens:
    total: 1000000
    per_hour: 50000
    reserve_percent: 20
  agents:
    max: 100
    max_concurrent_tasks: 50
  instances:
    max_children: 50
    max_depth: 5

autonomy:
  disconnection_tolerance: "5m"
  offline_mode:
    allow: [execute_queued_tasks, respond_to_agents, local_logging]
    deny: [spawn_children, modify_config, access_parent_secrets]

Environment-specific overrides layer on top. Production gets stricter Warden enforcement, higher token limits, lower tracing sample rates. The base configuration stays readable. The overrides stay minimal.

CLI Commands

# Instance lifecycle
obs instance list --namespace "citadel.us-east.*"
obs instance spawn --namespace "citadel.us-east.prod.backend" --config ./backend.yaml
obs instance terminate citadel.us-east.dev.temp

# Resource management
obs resources show citadel.us-east.prod
obs resources allocate --from citadel.us-east --to citadel.us-east.prod --tokens 1000000
obs resources reclaim --from citadel.us-east.dev --idle-threshold "10m"

# Upgrades
obs instance upgrade --namespace "citadel.us-east.*" --version "2.1.0" --strategy canary
obs instance upgrade-status <upgrade-id>
obs instance rollback <upgrade-id>

# Communication
obs message send --to parent --type escalate.error --payload '{"error": "db connection failed"}'
obs message send --to ../staging --type share.data --payload '{"key": "shared-state"}'
obs message queue --direction upward

Operational Patterns

The fractal architecture enables deployment patterns that would be architectural fever dreams in traditional systems:

Regional Deployment Citadel at the top, regional branches underneath, environment instances below those. Each region owns its data locality. Cross-region communication routes through the hierarchy. Move a region by repointing one parent reference.

Team Isolation — Each team gets a branch instance with its own resource budget , agent pool, and secret scope. Teams can’t interfere with each other because the hierarchy enforces boundaries structurally, not culturally.

Graduated Environments — Dev spawns from staging spawns from production. Promotion means moving tasks up the hierarchy. Rollback means pointing them back down. The environments aren’t separate clusters — they’re branches of the same tree.

Why This Works

The fractal architecture works because it applies a single set of principles at every level. A leaf instance managing three agents uses the same resource inheritance, the same communication patterns, and the same upgrade strategy as the Citadel managing fifty regional branches.

This is Constitution Principle #5: self-similarity at every scale. What works for one agent works for a thousand. What works for one instance works for an arbitrary hierarchy of them.

The alternative — special cases for different scales, different protocols for different levels, different tooling for different environments — is how you end up with an orchestration system that requires its own orchestration system to manage. Obsidian manages Obsidian . Infinite depth. Infinite scale. One pattern.