Architecture overview
How a keystroke becomes pixels — the layered model from MCP server through mux through pty through font shaper to GPU. For contributors and the curious.
2026-05-03T00:00:00.000Z
One process, three control planes
Unterm runs as a single OS process per window. That process simultaneously hosts three control planes that all talk to the same in-memory Mux:
- The GUI / GPU renderer on the foreground thread — owns the OpenGL (or WebGPU) context, paints frames, dispatches OS input events.
- The MCP server on a worker thread — TCP listener bound to
127.0.0.1:19876(preferred port; falls back to 19877+ if taken). Speaks line-delimited JSON-RPC 2.0, gates writes behind a UUID auth token. - The HTTP web settings server on a worker thread — bound to
127.0.0.1:19877(preferred), serves the Tailwind+Alpine SPA at/and a REST surface at/api/*.
All three are started from async_run_terminal_gui in wezterm-gui/src/main.rs:416. The MCP server is brought up first because it owns the UUID token; the HTTP server is then handed the same token so both servers share auth state. Both servers carry an Arc<McpHandler> and call directly into the Mux API — no IPC, no async runtime, no extra socket hop:
┌──────────────────────────────────────────┐
│ unterm (one process) │
│ │
user keypress ───► ┌───┴────┐ ┌────────┐ ┌───────────────┐ │
│ GUI │◄──►│ │◄──►│ local pty │──┼─► child shell
│ thread │ │ Mux │ │ reader thread │ │ (zsh, bash, …)
└───┬────┘ │ │ └───────────────┘ │
│ │ │ │
agent (Claude/CLI) ─┐ │ │ shared │ │
TCP 19876 ──────────┼──►│ MCP │ state │ │
│ │ thread │ │ │
│ └─────────┤ │ │
browser ────────────┐ │ │ │
TCP 19877 ──────────┼────────────►│ HTTP │ │
│ │ thread │ │
│ └────────┘ │
└──────────────────────────────────────────────┘
Both ports plus the auth token land in ~/.unterm/server.json on launch. Per-instance metadata lands in ~/.unterm/instances/<nato>.json. That file is the canonical handshake — every external tool, including unterm-cli, reads it to find the live instance.
This is the central architectural decision: the agent surface is inside the same binary as the renderer, sharing memory with Mux. There is no “AI process” to crash separately, no message bus, no second source of truth for terminal state.
Process model
unterm (GUI process — one per window)
├── thread: main (window event loop, GL, paint)
├── thread: mux-server (Unix socket listener for wezterm-client)
├── thread: mcp-server (TCP 19876)
├── thread: web-settings (TCP 19877)
├── thread: update-check (GitHub poller, 6h cycle)
└── thread: read-pty-N (one per pane, blocking read on the master fd)
│
└── child process: $SHELL (the user's zsh / bash / pwsh)
A few things worth pointing out:
unterm-mux(binary built fromwezterm-mux-server/) is a separate binary for headless multiplexing — daemon mode for SSH-style “attach later from the GUI” workflows. The default desktop launch does not use it. The GUI process embeds its ownMuxdirectly. The Unix socket at$WEZTERM_RUNTIME_DIR/gui-sock-<pid>is opened by a worker thread inside the GUI process (wezterm-gui/src/main.rs:721) so a secondunterminvocation can talk to the first.unterm-cli(binary built fromwezterm/) is short-lived. It opens a TCP socket to127.0.0.1:19876, callsauth.loginwith the token fromserver.json, runs one or more JSON-RPC methods, and exits. It never embeds a Mux of its own.- One window = one process. Spawning a second window with
Cmd-Nfrom the menu actually re-execsuntermand the second process registers itself asbravo(or the next free NATO name) in~/.unterm/instances/. Tabs and panes are inside one window/process; new windows are new processes.
The crate stack
The workspace has ~40 crates inherited from WezTerm. The ones you’ll touch most:
| Crate | Responsibility |
|---|---|
window/ | OS-level windows, GL/Metal/WebGPU contexts, raw key/mouse events. The ::window::Window trait is what the GUI thread holds. |
wezterm-font/ | Font discovery (FreeType/CoreText/DirectWrite), glyph rasterization, HarfBuzz shaping. FontConfiguration lives here. |
termwiz/ | ”Terminal Wizardry.” The escape-sequence parser, cell model, surface diffing. wezterm-escape-parser is the low-level state machine. |
term/ | The virtual terminal emulator core — interprets vtparse actions into a 2D cell grid with attributes. Output of read_from_pane_pty flows through here. |
vtparse/ | Pure VT500-family escape parser. Stateless, table-driven, no allocations on the hot path. |
mux/ | The terminal multiplexer — owns all panes, tabs, windows, the per-pane PTY reader threads, and the notification fan-out. Mux::get() is the global. |
pty/ | Cross-platform PTY allocation (openpty on Unix, ConPTY on Windows). portable_pty is its public alias. |
wezterm-client/ | Client side of the mux protocol — used by unterm GUI to attach to a unterm-mux daemon over Unix socket / TLS. |
wezterm-mux-server-impl/ | Server side of the same mux protocol. Used both by the standalone unterm-mux daemon and by the in-process socket worker the GUI starts. |
config/ | Loads wezterm.lua, exposes ConfigHandle. The Lua API surface for user config. |
wezterm-gui/ | The product layer. Everything Unterm-specific lives here: mcp/, web_settings/, server_info.rs, system_proxy.rs, update_check.rs, i18n/, recording/, overlay/* (settings menu, theme picker, command palette), termwindow/ (the window controller). The binary built from this crate is unterm. |
wezterm/ | The CLI binary, built as unterm-cli. Hosts both the legacy cli subcommands (mux RPC) and the new unterm_cli/ subcommands (MCP RPC). |
Everything below wezterm-gui/ is a near-verbatim WezTerm crate; everything in wezterm-gui/src/{mcp,web_settings,server_info,system_proxy,update_check,i18n,recording} is Unterm-specific.
The keystroke-to-pixel pipeline
Pick a single keypress — the user types a. Trace it:
1. OS: NSWindow / X11 / Wayland delivers a key event to the window crate.
2. window: Connection event handler converts to KeyEvent, fires window callback.
3. termwindow::keyevent::raw_key_event_impl (wezterm-gui/src/termwindow/keyevent.rs:430)
4. inputmap::lookup_key — does it match a configured key binding?
If yes: dispatch KeyAssignment (paste, spawn, …) and stop.
If no: continue.
5. encode_kitty_input / encode_win32_input — optional protocol encoding for
apps that requested it via DECRPM (vim, helix, etc).
Falls back to raw bytes for plain mode.
6. pane.writer().write_all(b"a") — writes to the pty master fd.
7. Kernel copies bytes through pty into the slave fd.
8. Shell reads from its stdin, processes (echoing in canonical mode), writes
its rendering of the cursor advance back to the master fd.
9. read_from_pane_pty (mux/src/lib.rs:299) — the per-pane blocking reader
thread reads bytes off the master fd into a 32K buffer.
10. parse_buffered_data — feeds bytes to vtparse, which feeds Action events to
term::Terminal::perform_actions. Mutates the cell grid.
11. Mux fires MuxNotification::PaneOutput → invalidate the GUI window.
12. GUI thread next paint cycle calls termwindow::render::paint::paint_pass
(wezterm-gui/src/termwindow/render/paint.rs:162).
13. Per-line shaping: wezterm-font shapes runs of cells with HarfBuzz and
caches GlyphCache entries.
14. Quads are pushed to the GL layer. glium (or webgpu) submits the draw call.
15. SwapBuffers / Metal present.
The interesting hot paths:
- Steps 6-8 cross the kernel. This is where most “feels laggy” complaints come from. There’s no work Unterm can do here.
- Step 9 lives on a dedicated thread per pane. It does blocking reads. If a pane spews multi-megabyte output (
yes,cat largefile), this thread can saturate;parse_buffered_datais the buffer between it and the grid mutator. - Step 11’s notification is coalesced. The GUI doesn’t repaint per-byte — it repaints once per frame (vsync) and absorbs all mutations that landed in the meantime.
- Step 13 is the most allocation-heavy step.
wezterm-font/src/shaper/cachesGlyphInfoaggressively; cache misses on cold lines (resize, theme switch) are the biggest single cost on a slow paint.
End-to-end target on a healthy machine is well under 16 ms (one vsync). Pathological cases — big terminal, complex CJK shaping, fresh font load — push into 30-60 ms. That’s normal; below 16 ms is “feels native,” above 33 ms is “feels laggy.”
The MCP control plane
The MCP server is a thin TCP-and-JSON wrapper around McpHandler, which is itself a thin wrapper around Mux::get(). Listener at wezterm-gui/src/mcp/server.rs:51; dispatch table at wezterm-gui/src/mcp/handler.rs:101.
The flow for a remote session.input (the agent types into a pane):
agent → 127.0.0.1:19876
│
▼
mcp-server thread (run_server)
│ one thread::spawn per accepted connection
▼
mcp-client thread (handle_client)
│ parses one JSON line; checks auth_token
▼
McpHandler::handle("session.input", params)
│
▼
Mux::get_pane(id) ← same global Mux the GUI is reading from
│
▼
pane.writer().write_all(bytes) ← pty master fd
That last step is identical to step 6 of the keystroke pipeline. From the shell’s point of view, agent input and human input are byte-identical.
State that the MCP plane needs to mutate but the Mux doesn’t naturally own — proxy settings, command policy, audit log, recording state — lives in a OnceLock<Mutex<McpState>> inside handler.rs. This is deliberately not in the Mux because it’s product surface, not terminal-emulation surface; the Mux remains close to upstream.
When the MCP plane changes pane state (spawn, resize, kill), it dispatches the same MuxNotification events the GUI key path does. The GUI repaints, the agent never has to “tell the GUI to update.” One Mux, one notification stream, three control planes reading and writing the same in-memory state.
The Web Settings server
wezterm-gui/src/web_settings/server.rs is a hand-rolled HTTP/1.1 server — no async runtime, no Hyper, no Actix. Routes at wezterm-gui/src/web_settings/server.rs:8:
GET / — SPA shell (assets::INDEX_HTML)
GET /static/<name> — bundled Tailwind / Alpine / app.js
GET /bootstrap.json — token + ports (no auth — so the SPA can self-bootstrap)
GET /api/health — liveness
GET /api/state — aggregate snapshot (theme, proxy, recording, instances)
POST /api/proxy — proxy_configure / proxy_disable
POST /api/theme — writes ~/.unterm/theme.json
POST /api/recording/start — recording::start_recording
POST /api/recording/stop — recording::stop_recording
GET /api/sessions — list recorded sessions on disk
GET /api/sessions/:id/markdown — read a redacted session transcript
No keep-alive, no chunked encoding, no upgrade dance. Browsers handle that fine for a localhost-only SPA.
The SPA itself is index.html + app.js + style.css + vendored tailwind.js and alpine.js. All five files live in wezterm-gui/assets/settings/ and are inlined into the binary at build time via include_str! (see wezterm-gui/src/web_settings/assets.rs:8). There is no JS build step. Editing the SPA is “edit the file, rebuild the binary.” This is intentional: it keeps the contributor onboarding ramp on the SPA exactly the same as the Rust ramp.
When the user clicks “Apply theme” in the SPA:
browser POST /api/theme {name: "dracula"}
│
▼
HTTP handler thread
│
▼
write ~/.unterm/theme.json ← config-watcher in `config/` crate notices
│
▼
ConfigHandle reload fires ← all subscribers in the GUI get a notification
│
▼
GUI repaints with new colors
The HTTP plane never touches GL state directly. It writes a config file; the existing config-reload pipeline does the rest.
Discovery and multi-instance
wezterm-gui/src/server_info.rs is the discovery layer. Each launched unterm claims the lowest free NATO-phonetic name (alpha, bravo, charlie … zulu, then alpha2, bravo2, …) via O_EXCL atomic file creation. Conflicts when two instances start in the same millisecond resolve by retry on EEXIST.
On disk:
~/.unterm/
├── instances/
│ ├── alpha.json { id, mcp_port, http_port, auth_token, pid, started_at, … }
│ ├── bravo.json
│ └── charlie.json
├── server.json — pointer to the active instance (back-compat for single-instance agents)
├── active.json — explicit active-instance pointer
└── theme.json, lang.json, proxy.json, update_check.json, …
The MCP instance.list / instance.info / instance.focus / instance.set_title methods enumerate these. unterm-cli session list walks the directory and offers each instance to the user. active.json is updated only when the previous active instance dies — not on every focus event — so disk IO stays minimal.
For more on multi-instance discovery and the active-instance handoff, see the multi-instance doc (separate page).
Why a fork, not a plugin
The obvious alternative would have been a WezTerm Lua plugin. Several Unterm features can’t be implemented as Lua:
- Local TCP servers. Lua-on-WezTerm has no
TcpListener. Spawning a JSON-RPC server from inside the Lua sandbox is not on offer. - HTTP server with bundled SPA. Same constraint — and even if you could open the listener from Lua, embedding ~150 KB of Tailwind+Alpine assets at compile time wants
include_str!, which is a Rust-side concern. - Recording with token redaction. The redaction pipeline (
wezterm-gui/src/recording/redact.rs) wants to scan raw byte streams as they flow throughparse_buffered_data— Lua-side hooks fire after escape parsing, too late for byte-level masking. - macOS code signing + notarization + the WiX MSI.
ci/sign-macos.shandinstaller/Unterm.wxsneed a binary they own —weztermcodesigned binaries with the user’s plugin loaded would require either a separate signed binary anyway (which is the fork) or running unsigned (which is a non-starter for distribution). - i18n. The translated string set spans menus, modals, settings UI, CLI output, and toast notifications. Threading nine locale dictionaries through
wezterm.luawould be its own product. - Multi-instance discovery and per-instance NATO naming. Has to be in the binary so that the GUI process knows its own identity from the moment it boots.
So the fork is honest about what it is: a thin product layer on top of WezTerm’s renderer, font, mux, and SSH stack, owning the build pipeline, the binary identity, and the user-facing surface. Upstream WezTerm is left untouched in our tree (everything outside wezterm-gui/src/{mcp,web_settings,server_info,system_proxy,update_check,i18n,recording} and the renamed binaries is essentially upstream).
What changes upstream from WezTerm
The Unterm-specific surface, top to bottom:
| Path | Purpose |
|---|---|
wezterm-gui/src/mcp/ | MCP JSON-RPC server: server.rs (TCP listener), handler.rs (~50 methods). |
wezterm-gui/src/web_settings/ | Hand-rolled HTTP/1.1 server + bundled SPA loader. |
wezterm-gui/src/server_info.rs | Multi-instance NATO naming + on-disk discovery. |
wezterm-gui/src/system_proxy.rs | OS proxy auto-detection (scutil / gsettings / registry / port scan). |
wezterm-gui/src/update_check.rs | Background GitHub release poller. |
wezterm-gui/src/i18n/ | 9-locale translation table with embedded JSON dictionaries. |
wezterm-gui/src/recording/ | Pane-byte recording with token/secret redaction → markdown. |
wezterm-gui/src/session_state.rs | Last-window position/size persistence (the menu bar restore). |
wezterm-gui/src/overlay/{settings_menu,theme_selector,proxy_settings,shell_selector,tab_context_menu}.rs | Unterm-specific overlay screens. |
wezterm-gui/assets/settings/ | The web settings SPA — index.html, app.js, style.css, vendored Tailwind+Alpine. |
wezterm/src/unterm_cli/ | CLI subcommands that dispatch over MCP rather than the legacy mux protocol. |
ci/sign-macos.sh, ci/release-mac.sh | Codesign + notarize pipeline. |
installer/Unterm.wxs | Windows MSI definition. |
assets/macos/, assets/icon/ | Renamed bundle ID (com.unzoo.unterm), Unterm.icns. |
| Binary names | unterm (GUI), unterm-cli, unterm-mux. The crate names were not renamed (wezterm-gui still builds the unterm binary) so upstream merges stay tractable. |
Where to start contributing
By intent:
Adding a new MCP method — wezterm-gui/src/mcp/handler.rs. Add a match arm in McpHandler::handle (~line 101), implement the handler. If the method needs new persistent state, it goes either in ~/.unterm/<feature>.json (write through serde_json::to_writer) or in the McpState struct if it’s per-process and ephemeral. Add the method to server_capabilities (~line 280) so introspection works.
Adding a Web Settings page — three places: a route in wezterm-gui/src/web_settings/server.rs:8, a section in wezterm-gui/assets/settings/index.html, an Alpine component in wezterm-gui/assets/settings/app.js. The data plumbing should call existing McpHandler methods rather than reimplementing the operation — the SPA is a UI for the MCP surface, not a parallel implementation.
Adding a new CLI subcommand — wezterm/src/unterm_cli/<name>.rs, registered in wezterm/src/unterm_cli/mod.rs:8. The new command should be a thin client over the MCP method you already added — see proxy.rs for the pattern.
Fixing a render bug — wezterm-gui/src/termwindow/render/. The frame entry point is paint_pass at paint.rs:162. Per-line shaping lives in wezterm-font/src/shaper/. If the bug is in the cell grid (wrong character, wrong attribute), it’s upstream of paint — look at term/ and termwiz/.
Fixing a font / shaping bug — wezterm-font/. Shapers under shaper/, font discovery under locator/, rasterizers under rasterizer/. The HarfBuzz wrapper is hbwrap.rs, FreeType is ftwrap.rs. CoreText is the macOS preferred path; FreeType+HarfBuzz is the cross-platform fallback.
Changing the multi-instance discovery semantics — wezterm-gui/src/server_info.rs. The on-disk format is documented at the top of that file. Be careful with active.json semantics: the design lock-in (2026-05-02) is that it updates only on instance death, not on focus, so the disk write rate stays bounded.
Touching the MCP auth model — wezterm-gui/src/mcp/server.rs:114. The auth-login dance happens once per connection. The token is regenerated on every launch and lives in ~/.unterm/server.json with mode 0600. Don’t add a way to read it from the LAN.
Adding a translation — wezterm-gui/src/i18n/locales/. Copy en.json, translate values, register the new code in LOCALE_CODES and LOCALE_BUNDLES in wezterm-gui/src/i18n/mod.rs:29. The CLI, GUI menus, and Web Settings dictionaries all read from the same embedded data.
Touching session recording — wezterm-gui/src/recording/. recorder.rs is the lifecycle, redact.rs is the token-masking pass, render.rs is the markdown emission, index.rs is the on-disk session listing. Recordings are byte streams, not key streams — the redactor sees what the shell actually wrote, not what the user typed.
If you’re filing a PR and you’re not sure where it fits, the rule of thumb: anything that an external agent should be able to do, the MCP handler is the right entry point, and everything else (CLI, Web Settings, GUI menu) should layer on top of that handler rather than reimplement it. The MCP handler is the single product API; the rest are presentations.