v1 v2
Architecture documented by Eric Cheng and his AI assistant. This page visualizes their original writeup. See v2 for Discord-first, Qwen3-TTS, and Mac mini adaptations.

OpenClaw → iMessage Flow

A complete pipeline that lets the owner command their AI assistant via Telegram, which then sends messages (text and TTS voice) to contacts via iMessage — all running on dedicated, isolated hardware with standalone accounts.

High-Level Architecture

      graph LR
        OWNER["Owner
Telegram App"] GW["Dedicated MacBook Pro
OpenClaw Gateway"] AGENT["Main Agent
Claude Opus 4.6"] PROFILE["Contact Profile
memory/contacts/"] KOKORO["Kokoro TTS
82M ONNX"] FFMPEG["ffmpeg
WAV → M4A"] IMSG["imsg CLI
Messages.app"] RECV["Recipient
iMessage"] OWNER -->|"command"| GW GW --> AGENT AGENT -->|"reads"| PROFILE AGENT -->|"text"| KOKORO KOKORO -->|"WAV"| FFMPEG FFMPEG -->|"M4A"| IMSG AGENT -->|"text only"| IMSG IMSG -->|"iMessage"| RECV AGENT -->|"confirms"| GW GW -->|"reply"| OWNER classDef owner fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0 classDef mac fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0 classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0 classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0 classDef recv fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0 class OWNER owner class GW,AGENT,PROFILE mac class KOKORO,FFMPEG tts class IMSG imsg class RECV recv
Telegram (command surface)
OpenClaw (processing)
TTS pipeline
iMessage (delivery)

Hardware & Account Isolation

Dedicated MacBook Pro

Runs OpenClaw 24/7 — not the owner's personal machine
  • Standalone Apple ID (not the owner's personal account)
  • Dedicated email address for the assistant
  • Dedicated eSIM on a standalone iPhone
  • Home network + Tailscale for remote access

Why Physical Isolation Matters

Security through separation
  • iMessage requires a real Apple ID signed into Messages.app
  • Full Disk Access granted to OpenClaw.app process only
  • No personal data leakage — assistant's machine has assistant's accounts only
  • External boot drive = physical kill switch

Message Flow — Step by Step

      graph TD
        CMD["Owner sends command
'Send X a voice message about Y'"] ACK["Agent reacts 👀
reads contact profile + permissions"] COMPOSE["Compose message
language, tone, dialect per profile"] PREVIEW["Draft preview
sent to owner on Telegram"] APPROVE{"Owner approves?"} ITERATE["Iterate
'more grit' / 'shorter'"] TTS_CHECK{"Voice message?"} TTS["6a. Kokoro TTS
generate speech"] CONVERT["6b. ffmpeg
WAV → M4A"] SEND_TEXT["imsg send --text"] SEND_VOICE["imsg send --file"] LOG["Log to outbound-log.md
timestamp, recipient, approval ref"] CONFIRM["Confirm delivery
to owner on Telegram"] CMD --> ACK ACK --> COMPOSE COMPOSE --> PREVIEW PREVIEW --> APPROVE APPROVE -->|"no / revise"| ITERATE ITERATE --> COMPOSE APPROVE -->|"yes"| TTS_CHECK TTS_CHECK -->|"yes"| TTS TTS --> CONVERT CONVERT --> SEND_VOICE TTS_CHECK -->|"no"| SEND_TEXT SEND_TEXT --> LOG SEND_VOICE --> LOG LOG --> CONFIRM classDef cmd fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0 classDef agent fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0 classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0 classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0 classDef send fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0 classDef log fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0 class CMD,PREVIEW,CONFIRM,ITERATE cmd class ACK,COMPOSE agent class APPROVE,TTS_CHECK decision class TTS,CONVERT tts class SEND_TEXT,SEND_VOICE send class LOG log
Telegram interaction
Agent processing
TTS / decision
iMessage send
Audit

Step Details

1

Owner sends command on Telegram

Natural language: "Send María a voice message wishing her happy birthday". Telegram is the only inbound command surface — the owner never touches the MacBook directly.

2

Agent reads context

Reacts with 👀, then loads memory/contacts/<name>.md for language, tone, voice preferences. Checks PERMISSIONS.md to verify outbound is allowed for this contact.

3

Compose message

Writes the message respecting the contact's language (e.g. Castilian Spanish with distinción), humor level, and relationship dynamic. Adapts tone per profile.

4

Draft preview → Owner on Telegram

The full draft is sent back for review. The owner can approve, request changes ("shorter", "more casual"), or cancel. Nothing leaves the Mac until explicitly approved.

5

Owner approves

"yes" / "send it" / "perfect". If the owner wants changes, the loop returns to step 3. Multiple rounds are normal.

6

TTS generation (voice messages only)

Kokoro-82M generates speech using the contact's assigned voice (e.g. em_alex for Spanish male). Output: WAV → ffmpeg converts to M4A (AAC 128k) for iMessage compatibility. ~2.3s per sentence on Apple Silicon.

7

Send via iMessage

Text: imsg send --to "+1..." --text "message"
Voice: imsg send --to "+1..." --file output.m4a
The CLI automates Messages.app — recipient sees a normal iMessage.

8

Log & confirm

Every outbound message logged to memory/outbound-log.md with timestamp, channel, recipient, summary, and approval reference. Delivery confirmation sent to owner on Telegram.

TTS Pipeline — Text to Voice

      graph LR
        TEXT["Approved text
Castilian Spanish"] PROFILE["Voice profile
em_alex, speed 1.0"] KOKORO["Kokoro-82M
ONNX · Python 3.12"] WAV["output.wav
~2.3s/sentence"] FFMPEG["ffmpeg
AAC 128k"] M4A["output.m4a
iMessage-ready"] TEXT --> KOKORO PROFILE --> KOKORO KOKORO --> WAV WAV --> FFMPEG FFMPEG --> M4A classDef input fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0 classDef process fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0 classDef output fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0 class TEXT,PROFILE input class KOKORO,FFMPEG process class WAV,M4A output

Kokoro-82M

Apache 2.0 · ONNX runtime
  • 82M parameter model — runs on CPU
  • ~2.3 seconds per sentence on Apple Silicon
  • 8 languages: EN, ES, FR, IT, PT, JA, ZH, HI
  • Per-contact voice assignment via profiles

Voice Assignment

Contact profiles control voice selection
  • Default English: af_heart (female)
  • Per-contact voices in memory/contacts/*.md
  • Language auto-detected from profile dialect setting
  • Speed, pitch customizable per contact

Security Architecture — 5 Layers

      graph TD
        REQ["Outbound message request"]
        L1["Layer 1: SOUL.md
Hard rules · Default DENY"] L2["Layer 2: AGENTS.md
Mandatory permission check"] L3["Layer 3: PERMISSIONS.md
Per-contact allowlists"] L4["Layer 4: Channel Config
dmPolicy: allowlist
groupPolicy: disabled"] L5["Layer 5: Audit Log
outbound-log.md"] SEND["✅ Message sent"] BLOCK["❌ Blocked"] REQ --> L1 L1 -->|"pass"| L2 L1 -->|"deny"| BLOCK L2 -->|"pass"| L3 L2 -->|"deny"| BLOCK L3 -->|"approved contact"| L4 L3 -->|"unknown contact"| BLOCK L4 -->|"pass"| L5 L5 --> SEND classDef req fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0 classDef layer fill:#fff1f2,stroke:#be123c,stroke-width:1.5px,color:#881337 classDef pass fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0 classDef block fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d class REQ req class L1,L2,L3,L4,L5 layer class SEND pass class BLOCK block

Permission Tiers

TierPolicyApplies To
T1 — Autonomous No approval needed Read messages, triage, 2FA for own accounts
T2 — Pre-approved Allowed for specific contacts + message types Not yet configured — future expansion
T3 — Approval Required Draft → preview → approve → send All outbound messages (current default)

Inbound Monitoring — Reply Detection

      graph LR
        HB["⏰ Heartbeat
every 15 min"] CHATS["imsg chats
--limit 10"] HIST["imsg history
--chat-id N"] TRIAGE{"Triage"} URGENT["🔴 URGENT
relay immediately"] NOTABLE["🟡 NOTABLE
relay to owner"] SKIP["⚪ SKIP
routine, ignore"] HB --> CHATS CHATS --> HIST HIST --> TRIAGE TRIAGE --> URGENT TRIAGE --> NOTABLE TRIAGE --> SKIP classDef hb fill:#eef2ff,stroke:#6366f1,stroke-width:1.5px,color:#3730a3 classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:1.5px,color:#e2e8f0 classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0 classDef urgent fill:#fef2f2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d classDef notable fill:#fffbeb,stroke:#ca8a04,stroke-width:1.5px,color:#713f12 classDef skip fill:#f3f4f6,stroke:#6b7280,stroke-width:1px,color:#374151 class HB hb class CHATS,HIST imsg class TRIAGE decision class URGENT urgent class NOTABLE notable class SKIP skip

What unauthorized contacts experience

No indication that an AI is involved

Software Stack

OpenClaw Gateway

v2026.3.2 · LaunchAgent
  • Main agent: Claude Opus 4.6 (Anthropic)
  • Sub-agents: Kimi K2.5 via OpenRouter (heartbeats, triage)
  • Config: ~/.openclaw/openclaw.json
  • Process: /Applications/OpenClaw.app

Telegram Channel

Inbound command surface
  • DM policy: pairing (one-time approval code for unknowns)
  • Role: Admin channel — commands issued and confirmed here
  • Owner's sender ID allowlisted

iMessage Channel

Outbound messaging surface
  • CLI: imsg (Homebrew)
  • DM policy: allowlist (silently drops unknowns)
  • Group policy: disabled
  • FDA: Granted to OpenClaw.app only

TTS Stack

Local, no cloud dependency
  • Kokoro-82M: ONNX, Python 3.12 venv
  • ffmpeg: WAV → M4A conversion
  • License: Apache 2.0 (commercial OK)
  • Languages: EN, ES, FR, IT, PT, JA, ZH, HI

Contact Profiles

Stored in memory/contacts/<name>.md

Each contact gets personalized handling

Future Expansion

Tier 2 Autonomy

Pre-approve specific contacts + message types. E.g. "confirm plans with X" without requiring approval each time.

Agent Mode

Allow the assistant to respond autonomously to approved contacts using personality profiles — a conversational AI with the owner's voice.

SMS & WhatsApp

SMS available via the dedicated iPhone's eSIM (imsg --service sms). WhatsApp via linked device on same Mac. Same permission framework.

Higher-Quality TTS

Fish Speech S1-mini pending license approval. More natural prosody and voice cloning capabilities.

Core principle: The owner is always in control. Every outbound message requires explicit approval. The assistant has the capability to send messages autonomously — but the permission defaults to deny. Trust is earned incrementally, tier by tier.