Only the LLM Should Be Slow

Principle XII: Only the LLM Should Be Slow

Only the LLM should be slow.

There is exactly one component in a language-model-powered system that has a legitimate excuse for taking more than a few milliseconds: the language model itself. Everything else — the CLI, the routing layer, the task pipeline, the agent lifecycle management — should be so fast that users forget it exists.

Performance as Architecture

Performance in Obsidian is not a future optimization. It is not a “nice to have” that gets addressed after the features are complete. It is an architectural constraint at the same level as constitutional compliance. Every operation has a time budget. Every new feature must demonstrate that it fits within that budget. Regressions are not tolerated.

This is why Obsidian is written in Rust. Not because Rust is fashionable (though it is), but because a system that orchestrates language model calls — operations measured in seconds — cannot afford to add hundreds of milliseconds of orchestration overhead. The orchestration layer should be invisible. The user should experience the latency of the model, not the latency of the system that talks to the model.

The obs CLI Standard

The obs command-line interface is the primary human interface to Obsidian, and it must feel instant. Not “fast for a CLI tool.” Instant. Sub-50ms for local operations. Streaming output for anything that involves model calls. Fail fast — if something is going to fail, fail in milliseconds, not after a five-second timeout.

This standard exists because CLI tools that waste human time erode trust. A developer who waits two seconds for a command to parse arguments and validate configuration is a developer who writes a shell script to avoid using your tool. Performance is UX.

Budget Every Operation

Obsidian’s approach to performance is quantitative, not qualitative. Every operation type has an explicit latency budget:

  • CLI startup: Under 50ms
  • Task submission: Under 100ms (excluding model calls)
  • Agent lifecycle events: Under 50ms
  • Pipeline stage transitions: Under 20ms
  • Constitutional compliance checks: Under 10ms

These are not aspirations. They are SLAs. The Warden monitors system performance as part of systemic integrity — because a system that is constitutionally correct but unusably slow has failed its users just as thoroughly as one that is fast but wrong.

Fail Fast on Regressions

Performance regressions are treated as bugs. Not low-priority bugs. Not “we’ll get to it” bugs. Blocking bugs. A PR that adds 50ms to CLI startup does not merge. A refactor that doubles pipeline latency does not ship. The performance budget is enforced with the same rigor as the Constitution.

Implications

Every component in Obsidian must be performance-aware. Lazy initialization for features that may not be used. Parallel execution where dependencies permit. Zero unnecessary allocations in hot paths. JSON output and exit codes for automation, because scriptability at machine speed is a non-negotiable requirement.

Relationship to Other Principles

Performance is what makes Leverage Over Effort (Principle VI) feel real — leverage that takes five seconds to invoke is not leverage, it is a queue. It enables Autonomy Under Constraint (Principle VII) by ensuring that constraint checking never becomes the bottleneck. And it is the user-facing manifestation of the entire constitutional project: a system that is powerful and simple, because those are not opposites.