Architecture

How Working Mind works under the hood.

Overview

Working Mind is a local-first terminal AI agent built on a single-agent loop with tool calling. The core is pack-agnostic -- it knows nothing about any domain. Packs define the domain through prompts, tools, skills, and personas.

Tech Stack

Layer	Technology	Why
Runtime	Node.js / Bun	Cross-platform, npm distribution
LLM SDK	OpenAI SDK (`openai`)	Unified API for all OpenAI-compatible providers. Anthropic, Google, and Ollama use their own adapters but the OpenAI SDK handles the majority of endpoints including OpenRouter and any OpenAI-compatible server.
TUI	OpenTUI (`@opentui/core` + `@opentui/react`)	React-based terminal UI framework. Renders the chat interface, tool call panels, sidebar, and input bar. Built on React 19 with Ink-compatible rendering.
MCP	`@modelcontextprotocol/sdk`	Official MCP SDK for stdio transport. Handles tool discovery, server lifecycle, and message routing.
Memory	Native SQLite (`bun:sqlite` / `better-sqlite3` / `sql.js`)	Built-in knowledge graph. No MCP server dependency. Stores entities, relations, observations with FTS5 search, temporal validity, and contradiction detection.
Language	TypeScript	Type-safe agent logic, strict null checks, esbuild for fast bundling.

The OpenAI SDK is the backbone of the provider layer. Working Mind uses it for:

Streaming chat completions with tool calling
Structured function call formatting
OpenRouter and any OpenAI-compatible endpoint (Ollama's OpenAI-compat API, vLLM, LM Studio)

Anthropic and Google have their own API formats, so Working Mind uses dedicated adapters for those providers while the OpenAI SDK covers everything else.

Agent Loop

The main loop runs in runAgent():

Stream -- Send the conversation (system prompt + history) to the LLM. Stream the response.
Check for tool calls -- If the LLM responds with tool calls, execute them. If not, return the response to the user.
Execute tools -- For each tool call: parse arguments, get user approval, execute, push result to conversation.
Repeat -- Continue the loop until the LLM responds without tool calls or the turn budget is exhausted.

The loop has guardrails:

Turn budget -- Maximum iterations (default: 20, configurable)
Context compaction -- When conversation exceeds 100K characters, older messages are compacted into a synthetic summary
Orphan cleanup -- Tool result messages without matching tool calls are removed before each turn
Cancellation -- Ctrl+C aborts the current LLM request

System Prompt Assembly

The system prompt is assembled from multiple sources:

Pack prompt -- From the pack's prompt.md (or config.systemPrompts.default if no pack)
Current task -- Directive from slash commands (e.g., /ingest sets currentTask)
Available skills -- List of inactive skills the user can activate
Active skills -- Instructions from currently active skills
Turn budget -- Information about the available tool-call turns
Knowledge index -- Truncated list of entities in the knowledge graph
Project rules -- From AGENTS.md, LAB.md, or WBRAIN.md in the current directory

MCP Integration

MCP servers provide tools. The flow:

Pack declares servers -- pack.json lists required and optional servers
Registry connects -- On startup, Working Mind connects to each MCP server via stdio transport
Tools discovered -- Each server exposes tools (e.g., mcp__brave-search__web_search)
Tool filtering -- Packs can filter which tools are available (via toolFilter in personas)
Execution -- When the LLM calls a tool, Working Mind routes it to the correct MCP server

All MCP servers run as local child processes. No remote connections. Stdio transport sidesteps the security vulnerabilities of remote MCP servers.

Pack Loading

Packs are loaded at startup:

Resolve pack names -- From --pack flags (default: starter)
Find pack directory -- Check builtin packs/ directory, then ~/.wmind/packs/
Read pack.json -- Parse manifest, validate schema
Read prompt.md -- Load the system prompt
Read personas, skills, commands -- Load from subdirectories
Register MCP servers -- Declare servers in the MCP registry
Create agent -- Assemble system prompt, merge tools, store on agent instance

Tool Resolution

When the agent is created, tools come from three sources:

Pack tools -- Defined in the pack's tools array (usually empty for declarative packs)
Native tools -- Built-in tools like memory_* (12 knowledge graph tools)
MCP tools -- Discovered from connected MCP servers

Persona tool filters can restrict which tools are available:

preset: "all" -- all tools available
preset: "readonly" -- only read tools
preset: "none" -- no tools
include: [...] -- only these tools
exclude: [...] -- all tools except these

Session Persistence

Sessions are saved to ~/.wmind/sessions/ as JSON files. Each session stores:

Conversation messages
Agent persona and pack name
Pack system prompt (for correct reconstruction on resume)
Active skills
Creation and update timestamps

When you resume a session, Working Mind reconstructs the agent with the same pack prompt, tools, and conversation history.

Context Compaction

When the conversation exceeds 100K characters, older messages are replaced with a synthetic summary:

Identify old messages -- Everything before the most recent tool-call sequence
Preserve tool pairs -- Tool calls and their results must stay together
Generate summary -- Replace old messages with a note summarizing what happened
Continue -- The agent operates on the compacted context

This prevents context window overflow while maintaining conversation coherence.

Key Design Decisions

Single agent, not multi-agent -- One well-configured agent with the right tools matches multi-agent systems at lower token cost
MCP as tool layer -- MCP servers are sensors (search, read) and actuators (create, write, scrape)
Local-first -- No cloud dependency for the core. API keys go to LLM providers, not to us
Declarative packs -- No code required. Anyone can create a pack by editing markdown files
Approval gate -- Tool calls require user approval by default. Auto-approve is opt-in

Configuration

Model providers, MCP servers, and settings.

Commands

Builtin slash commands and how they work.