Architecture
Overview
Working Mind is a local-first terminal AI agent built on a single-agent loop with tool calling. The core is pack-agnostic -- it knows nothing about any domain. Packs define the domain through prompts, tools, skills, and personas.
Tech Stack
| Layer | Technology | Why |
|---|---|---|
| Runtime | Node.js / Bun | Cross-platform, npm distribution |
| LLM SDK | OpenAI SDK (openai) | Unified API for all OpenAI-compatible providers. Anthropic, Google, and Ollama use their own adapters but the OpenAI SDK handles the majority of endpoints including OpenRouter and any OpenAI-compatible server. |
| TUI | OpenTUI (@opentui/core + @opentui/react) | React-based terminal UI framework. Renders the chat interface, tool call panels, sidebar, and input bar. Built on React 19 with Ink-compatible rendering. |
| MCP | @modelcontextprotocol/sdk | Official MCP SDK for stdio transport. Handles tool discovery, server lifecycle, and message routing. |
| Memory | Native SQLite (bun:sqlite / better-sqlite3 / sql.js) | Built-in knowledge graph. No MCP server dependency. Stores entities, relations, observations with FTS5 search, temporal validity, and contradiction detection. |
| Language | TypeScript | Type-safe agent logic, strict null checks, esbuild for fast bundling. |
The OpenAI SDK is the backbone of the provider layer. Working Mind uses it for:
- Streaming chat completions with tool calling
- Structured function call formatting
- OpenRouter and any OpenAI-compatible endpoint (Ollama's OpenAI-compat API, vLLM, LM Studio)
Anthropic and Google have their own API formats, so Working Mind uses dedicated adapters for those providers while the OpenAI SDK covers everything else.
Agent Loop
The main loop runs in runAgent():
- Stream -- Send the conversation (system prompt + history) to the LLM. Stream the response.
- Check for tool calls -- If the LLM responds with tool calls, execute them. If not, return the response to the user.
- Execute tools -- For each tool call: parse arguments, get user approval, execute, push result to conversation.
- Repeat -- Continue the loop until the LLM responds without tool calls or the turn budget is exhausted.
The loop has guardrails:
- Turn budget -- Maximum iterations (default: 20, configurable)
- Context compaction -- When conversation exceeds 100K characters, older messages are compacted into a synthetic summary
- Orphan cleanup -- Tool result messages without matching tool calls are removed before each turn
- Cancellation -- Ctrl+C aborts the current LLM request
System Prompt Assembly
The system prompt is assembled from multiple sources:
- Pack prompt -- From the pack's
prompt.md(orconfig.systemPrompts.defaultif no pack) - Current task -- Directive from slash commands (e.g.,
/ingestsetscurrentTask) - Available skills -- List of inactive skills the user can activate
- Active skills -- Instructions from currently active skills
- Turn budget -- Information about the available tool-call turns
- Knowledge index -- Truncated list of entities in the knowledge graph
- Project rules -- From
AGENTS.md,LAB.md, orWBRAIN.mdin the current directory
MCP Integration
MCP servers provide tools. The flow:
- Pack declares servers --
pack.jsonlists required and optional servers - Registry connects -- On startup, Working Mind connects to each MCP server via stdio transport
- Tools discovered -- Each server exposes tools (e.g.,
mcp__brave-search__web_search) - Tool filtering -- Packs can filter which tools are available (via
toolFilterin personas) - Execution -- When the LLM calls a tool, Working Mind routes it to the correct MCP server
All MCP servers run as local child processes. No remote connections. Stdio transport sidesteps the security vulnerabilities of remote MCP servers.
Pack Loading
Packs are loaded at startup:
- Resolve pack names -- From
--packflags (default:starter) - Find pack directory -- Check builtin
packs/directory, then~/.wmind/packs/ - Read
pack.json-- Parse manifest, validate schema - Read prompt.md -- Load the system prompt
- Read personas, skills, commands -- Load from subdirectories
- Register MCP servers -- Declare servers in the MCP registry
- Create agent -- Assemble system prompt, merge tools, store on agent instance
Tool Resolution
When the agent is created, tools come from three sources:
- Pack tools -- Defined in the pack's
toolsarray (usually empty for declarative packs) - Native tools -- Built-in tools like
memory_*(12 knowledge graph tools) - MCP tools -- Discovered from connected MCP servers
Persona tool filters can restrict which tools are available:
preset: "all"-- all tools availablepreset: "readonly"-- only read toolspreset: "none"-- no toolsinclude: [...]-- only these toolsexclude: [...]-- all tools except these
Session Persistence
Sessions are saved to ~/.wmind/sessions/ as JSON files. Each session stores:
- Conversation messages
- Agent persona and pack name
- Pack system prompt (for correct reconstruction on resume)
- Active skills
- Creation and update timestamps
When you resume a session, Working Mind reconstructs the agent with the same pack prompt, tools, and conversation history.
Context Compaction
When the conversation exceeds 100K characters, older messages are replaced with a synthetic summary:
- Identify old messages -- Everything before the most recent tool-call sequence
- Preserve tool pairs -- Tool calls and their results must stay together
- Generate summary -- Replace old messages with a note summarizing what happened
- Continue -- The agent operates on the compacted context
This prevents context window overflow while maintaining conversation coherence.
Key Design Decisions
- Single agent, not multi-agent -- One well-configured agent with the right tools matches multi-agent systems at lower token cost
- MCP as tool layer -- MCP servers are sensors (search, read) and actuators (create, write, scrape)
- Local-first -- No cloud dependency for the core. API keys go to LLM providers, not to us
- Declarative packs -- No code required. Anyone can create a pack by editing markdown files
- Approval gate -- Tool calls require user approval by default. Auto-approve is opt-in