v1 v2

Architecture documented by Eric Cheng and his AI assistant. This page visualizes their original writeup. See v2 for Discord-first, Qwen3-TTS, and Mac mini adaptations.

OpenClaw → iMessage Flow

A complete pipeline that lets the owner command their AI assistant via Telegram, which then sends messages (text and TTS voice) to contacts via iMessage — all running on dedicated, isolated hardware with standalone accounts.

High-Level Architecture

      graph LR
        OWNER["Owner
Telegram App"]
        GW["Dedicated MacBook Pro
OpenClaw Gateway"]
        AGENT["Main Agent
Claude Opus 4.6"]
        PROFILE["Contact Profile
memory/contacts/"]
        KOKORO["Kokoro TTS
82M ONNX"]
        FFMPEG["ffmpeg
WAV → M4A"]
        IMSG["imsg CLI
Messages.app"]
        RECV["Recipient
iMessage"]

        OWNER -->|"command"| GW
        GW --> AGENT
        AGENT -->|"reads"| PROFILE
        AGENT -->|"text"| KOKORO
        KOKORO -->|"WAV"| FFMPEG
        FFMPEG -->|"M4A"| IMSG
        AGENT -->|"text only"| IMSG
        IMSG -->|"iMessage"| RECV
        AGENT -->|"confirms"| GW
        GW -->|"reply"| OWNER

        classDef owner fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
        classDef mac fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
        classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
        classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
        classDef recv fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0

        class OWNER owner
        class GW,AGENT,PROFILE mac
        class KOKORO,FFMPEG tts
        class IMSG imsg
        class RECV recv

Telegram (command surface)

OpenClaw (processing)

TTS pipeline

iMessage (delivery)

Hardware & Account Isolation

Dedicated MacBook Pro

Runs OpenClaw 24/7 — not the owner's personal machine

Standalone Apple ID (not the owner's personal account)
Dedicated email address for the assistant
Dedicated eSIM on a standalone iPhone
Home network + Tailscale for remote access

Why Physical Isolation Matters

Security through separation

iMessage requires a real Apple ID signed into Messages.app
Full Disk Access granted to OpenClaw.app process only
No personal data leakage — assistant's machine has assistant's accounts only
External boot drive = physical kill switch

Message Flow — Step by Step

      graph TD
        CMD["Owner sends command
'Send X a voice message about Y'"]
        ACK["Agent reacts 👀
reads contact profile + permissions"]
        COMPOSE["Compose message
language, tone, dialect per profile"]
        PREVIEW["Draft preview
sent to owner on Telegram"]
        APPROVE{"Owner approves?"}
        ITERATE["Iterate
'more grit' / 'shorter'"]
        TTS_CHECK{"Voice message?"}
        TTS["6a. Kokoro TTS
generate speech"]
        CONVERT["6b. ffmpeg
WAV → M4A"]
        SEND_TEXT["imsg send --text"]
        SEND_VOICE["imsg send --file"]
        LOG["Log to outbound-log.md
timestamp, recipient, approval ref"]
        CONFIRM["Confirm delivery
to owner on Telegram"]

        CMD --> ACK
        ACK --> COMPOSE
        COMPOSE --> PREVIEW
        PREVIEW --> APPROVE
        APPROVE -->|"no / revise"| ITERATE
        ITERATE --> COMPOSE
        APPROVE -->|"yes"| TTS_CHECK
        TTS_CHECK -->|"yes"| TTS
        TTS --> CONVERT
        CONVERT --> SEND_VOICE
        TTS_CHECK -->|"no"| SEND_TEXT
        SEND_TEXT --> LOG
        SEND_VOICE --> LOG
        LOG --> CONFIRM

        classDef cmd fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
        classDef agent fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
        classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
        classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
        classDef send fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
        classDef log fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0

        class CMD,PREVIEW,CONFIRM,ITERATE cmd
        class ACK,COMPOSE agent
        class APPROVE,TTS_CHECK decision
        class TTS,CONVERT tts
        class SEND_TEXT,SEND_VOICE send
        class LOG log

Telegram interaction

Agent processing

TTS / decision

iMessage send

Audit

Step Details

Owner sends command on Telegram

Natural language: "Send María a voice message wishing her happy birthday". Telegram is the only inbound command surface — the owner never touches the MacBook directly.

Agent reads context

Reacts with 👀, then loads memory/contacts/<name>.md for language, tone, voice preferences. Checks PERMISSIONS.md to verify outbound is allowed for this contact.

Compose message

Writes the message respecting the contact's language (e.g. Castilian Spanish with distinción), humor level, and relationship dynamic. Adapts tone per profile.

Draft preview → Owner on Telegram

The full draft is sent back for review. The owner can approve, request changes ("shorter", "more casual"), or cancel. Nothing leaves the Mac until explicitly approved.

Owner approves

"yes" / "send it" / "perfect". If the owner wants changes, the loop returns to step 3. Multiple rounds are normal.

TTS generation (voice messages only)

Kokoro-82M generates speech using the contact's assigned voice (e.g. em_alex for Spanish male). Output: WAV → ffmpeg converts to M4A (AAC 128k) for iMessage compatibility. ~2.3s per sentence on Apple Silicon.

Send via iMessage

Text: imsg send --to "+1..." --text "message"
Voice: imsg send --to "+1..." --file output.m4a
The CLI automates Messages.app — recipient sees a normal iMessage.

Log & confirm

Every outbound message logged to memory/outbound-log.md with timestamp, channel, recipient, summary, and approval reference. Delivery confirmation sent to owner on Telegram.

TTS Pipeline — Text to Voice

      graph LR
        TEXT["Approved text
Castilian Spanish"]
        PROFILE["Voice profile
em_alex, speed 1.0"]
        KOKORO["Kokoro-82M
ONNX · Python 3.12"]
        WAV["output.wav
~2.3s/sentence"]
        FFMPEG["ffmpeg
AAC 128k"]
        M4A["output.m4a
iMessage-ready"]

        TEXT --> KOKORO
        PROFILE --> KOKORO
        KOKORO --> WAV
        WAV --> FFMPEG
        FFMPEG --> M4A

        classDef input fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
        classDef process fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
        classDef output fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0

        class TEXT,PROFILE input
        class KOKORO,FFMPEG process
        class WAV,M4A output

Kokoro-82M

Apache 2.0 · ONNX runtime

82M parameter model — runs on CPU
~2.3 seconds per sentence on Apple Silicon
8 languages: EN, ES, FR, IT, PT, JA, ZH, HI
Per-contact voice assignment via profiles

Voice Assignment

Contact profiles control voice selection

Default English: af_heart (female)
Per-contact voices in memory/contacts/*.md
Language auto-detected from profile dialect setting
Speed, pitch customizable per contact

Security Architecture — 5 Layers

      graph TD
        REQ["Outbound message request"]
        L1["Layer 1: SOUL.md
Hard rules · Default DENY"]
        L2["Layer 2: AGENTS.md
Mandatory permission check"]
        L3["Layer 3: PERMISSIONS.md
Per-contact allowlists"]
        L4["Layer 4: Channel Config
dmPolicy: allowlist
groupPolicy: disabled"]
        L5["Layer 5: Audit Log
outbound-log.md"]
        SEND["✅ Message sent"]
        BLOCK["❌ Blocked"]

        REQ --> L1
        L1 -->|"pass"| L2
        L1 -->|"deny"| BLOCK
        L2 -->|"pass"| L3
        L2 -->|"deny"| BLOCK
        L3 -->|"approved contact"| L4
        L3 -->|"unknown contact"| BLOCK
        L4 -->|"pass"| L5
        L5 --> SEND

        classDef req fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
        classDef layer fill:#fff1f2,stroke:#be123c,stroke-width:1.5px,color:#881337
        classDef pass fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
        classDef block fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d

        class REQ req
        class L1,L2,L3,L4,L5 layer
        class SEND pass
        class BLOCK block

Permission Tiers

Tier	Policy	Applies To
T1 — Autonomous	No approval needed	Read messages, triage, 2FA for own accounts
T2 — Pre-approved	Allowed for specific contacts + message types	Not yet configured — future expansion
T3 — Approval Required	Draft → preview → approve → send	All outbound messages (current default)

Inbound Monitoring — Reply Detection

      graph LR
        HB["⏰ Heartbeat
every 15 min"]
        CHATS["imsg chats
--limit 10"]
        HIST["imsg history
--chat-id N"]
        TRIAGE{"Triage"}
        URGENT["🔴 URGENT
relay immediately"]
        NOTABLE["🟡 NOTABLE
relay to owner"]
        SKIP["⚪ SKIP
routine, ignore"]

        HB --> CHATS
        CHATS --> HIST
        HIST --> TRIAGE
        TRIAGE --> URGENT
        TRIAGE --> NOTABLE
        TRIAGE --> SKIP

        classDef hb fill:#eef2ff,stroke:#6366f1,stroke-width:1.5px,color:#3730a3
        classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:1.5px,color:#e2e8f0
        classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
        classDef urgent fill:#fef2f2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d
        classDef notable fill:#fffbeb,stroke:#ca8a04,stroke-width:1.5px,color:#713f12
        classDef skip fill:#f3f4f6,stroke:#6b7280,stroke-width:1px,color:#374151

        class HB hb
        class CHATS,HIST imsg
        class TRIAGE decision
        class URGENT urgent
        class NOTABLE notable
        class SKIP skip

What unauthorized contacts experience

No indication that an AI is involved

They receive an iMessage from the assistant's Apple ID (looks like a normal person)
If they reply, the message lands in Messages.app on the Mac
OpenClaw silently ignores it (no processing, no response, no error)
The assistant can still read the reply via imsg CLI for relay to the owner during heartbeats

Software Stack

OpenClaw Gateway

v2026.3.2 · LaunchAgent

Main agent: Claude Opus 4.6 (Anthropic)
Sub-agents: Kimi K2.5 via OpenRouter (heartbeats, triage)
Config: ~/.openclaw/openclaw.json
Process: /Applications/OpenClaw.app

Telegram Channel

Inbound command surface

DM policy: pairing (one-time approval code for unknowns)
Role: Admin channel — commands issued and confirmed here
Owner's sender ID allowlisted

iMessage Channel

Outbound messaging surface

CLI: imsg (Homebrew)
DM policy: allowlist (silently drops unknowns)
Group policy: disabled
FDA: Granted to OpenClaw.app only

TTS Stack

Local, no cloud dependency

Kokoro-82M: ONNX, Python 3.12 venv
ffmpeg: WAV → M4A conversion
License: Apache 2.0 (commercial OK)
Languages: EN, ES, FR, IT, PT, JA, ZH, HI

Contact Profiles

Stored in `memory/contacts/<name>.md`

Each contact gets personalized handling

Phone / handles — contact details for each channel
Language & dialect — e.g. Castilian Spanish with distinción, vosotros
TTS voice — assigned voice ID, language, speed
Tone — playful, formal, technical — calibrated per relationship
Relationship context — dynamics, inside jokes, communication history

Future Expansion

Tier 2 Autonomy

Pre-approve specific contacts + message types. E.g. "confirm plans with X" without requiring approval each time.

Agent Mode

Allow the assistant to respond autonomously to approved contacts using personality profiles — a conversational AI with the owner's voice.

SMS & WhatsApp

SMS available via the dedicated iPhone's eSIM (imsg --service sms). WhatsApp via linked device on same Mac. Same permission framework.

Higher-Quality TTS

Fish Speech S1-mini pending license approval. More natural prosody and voice cloning capabilities.

Core principle: The owner is always in control. Every outbound message requires explicit approval. The assistant has the capability to send messages autonomously — but the permission defaults to deny. Trust is earned incrementally, tier by tier.