A complete pipeline that lets the owner command their AI assistant via Telegram, which then sends messages (text and TTS voice) to contacts via iMessage — all running on dedicated, isolated hardware with standalone accounts.
High-Level Architecture
graph LR
OWNER["Owner
Telegram App"]
GW["Dedicated MacBook Pro
OpenClaw Gateway"]
AGENT["Main Agent
Claude Opus 4.6"]
PROFILE["Contact Profile
memory/contacts/"]
KOKORO["Kokoro TTS
82M ONNX"]
FFMPEG["ffmpeg
WAV → M4A"]
IMSG["imsg CLI
Messages.app"]
RECV["Recipient
iMessage"]
OWNER -->|"command"| GW
GW --> AGENT
AGENT -->|"reads"| PROFILE
AGENT -->|"text"| KOKORO
KOKORO -->|"WAV"| FFMPEG
FFMPEG -->|"M4A"| IMSG
AGENT -->|"text only"| IMSG
IMSG -->|"iMessage"| RECV
AGENT -->|"confirms"| GW
GW -->|"reply"| OWNER
classDef owner fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
classDef mac fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
classDef recv fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
class OWNER owner
class GW,AGENT,PROFILE mac
class KOKORO,FFMPEG tts
class IMSG imsg
class RECV recv
Hardware & Account Isolation
Message Flow — Step by Step
graph TD
CMD["Owner sends command
'Send X a voice message about Y'"]
ACK["Agent reacts 👀
reads contact profile + permissions"]
COMPOSE["Compose message
language, tone, dialect per profile"]
PREVIEW["Draft preview
sent to owner on Telegram"]
APPROVE{"Owner approves?"}
ITERATE["Iterate
'more grit' / 'shorter'"]
TTS_CHECK{"Voice message?"}
TTS["6a. Kokoro TTS
generate speech"]
CONVERT["6b. ffmpeg
WAV → M4A"]
SEND_TEXT["imsg send --text"]
SEND_VOICE["imsg send --file"]
LOG["Log to outbound-log.md
timestamp, recipient, approval ref"]
CONFIRM["Confirm delivery
to owner on Telegram"]
CMD --> ACK
ACK --> COMPOSE
COMPOSE --> PREVIEW
PREVIEW --> APPROVE
APPROVE -->|"no / revise"| ITERATE
ITERATE --> COMPOSE
APPROVE -->|"yes"| TTS_CHECK
TTS_CHECK -->|"yes"| TTS
TTS --> CONVERT
CONVERT --> SEND_VOICE
TTS_CHECK -->|"no"| SEND_TEXT
SEND_TEXT --> LOG
SEND_VOICE --> LOG
LOG --> CONFIRM
classDef cmd fill:#111e30,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
classDef agent fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef send fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
classDef log fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
class CMD,PREVIEW,CONFIRM,ITERATE cmd
class ACK,COMPOSE agent
class APPROVE,TTS_CHECK decision
class TTS,CONVERT tts
class SEND_TEXT,SEND_VOICE send
class LOG log
Step Details
Natural language: "Send María a voice message wishing her happy birthday". Telegram is the only inbound command surface — the owner never touches the MacBook directly.
Reacts with 👀, then loads memory/contacts/<name>.md for language, tone, voice preferences. Checks PERMISSIONS.md to verify outbound is allowed for this contact.
Writes the message respecting the contact's language (e.g. Castilian Spanish with distinción), humor level, and relationship dynamic. Adapts tone per profile.
The full draft is sent back for review. The owner can approve, request changes ("shorter", "more casual"), or cancel. Nothing leaves the Mac until explicitly approved.
"yes" / "send it" / "perfect". If the owner wants changes, the loop returns to step 3. Multiple rounds are normal.
Kokoro-82M generates speech using the contact's assigned voice (e.g. em_alex for Spanish male). Output: WAV → ffmpeg converts to M4A (AAC 128k) for iMessage compatibility. ~2.3s per sentence on Apple Silicon.
Text: imsg send --to "+1..." --text "message"
Voice: imsg send --to "+1..." --file output.m4a
The CLI automates Messages.app — recipient sees a normal iMessage.
Every outbound message logged to memory/outbound-log.md with timestamp, channel, recipient, summary, and approval reference. Delivery confirmation sent to owner on Telegram.
TTS Pipeline — Text to Voice
graph LR
TEXT["Approved text
Castilian Spanish"]
PROFILE["Voice profile
em_alex, speed 1.0"]
KOKORO["Kokoro-82M
ONNX · Python 3.12"]
WAV["output.wav
~2.3s/sentence"]
FFMPEG["ffmpeg
AAC 128k"]
M4A["output.m4a
iMessage-ready"]
TEXT --> KOKORO
PROFILE --> KOKORO
KOKORO --> WAV
WAV --> FFMPEG
FFMPEG --> M4A
classDef input fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
classDef process fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef output fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
class TEXT,PROFILE input
class KOKORO,FFMPEG process
class WAV,M4A output
af_heart (female)memory/contacts/*.mdSecurity Architecture — 5 Layers
graph TD
REQ["Outbound message request"]
L1["Layer 1: SOUL.md
Hard rules · Default DENY"]
L2["Layer 2: AGENTS.md
Mandatory permission check"]
L3["Layer 3: PERMISSIONS.md
Per-contact allowlists"]
L4["Layer 4: Channel Config
dmPolicy: allowlist
groupPolicy: disabled"]
L5["Layer 5: Audit Log
outbound-log.md"]
SEND["✅ Message sent"]
BLOCK["❌ Blocked"]
REQ --> L1
L1 -->|"pass"| L2
L1 -->|"deny"| BLOCK
L2 -->|"pass"| L3
L2 -->|"deny"| BLOCK
L3 -->|"approved contact"| L4
L3 -->|"unknown contact"| BLOCK
L4 -->|"pass"| L5
L5 --> SEND
classDef req fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
classDef layer fill:#fff1f2,stroke:#be123c,stroke-width:1.5px,color:#881337
classDef pass fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
classDef block fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d
class REQ req
class L1,L2,L3,L4,L5 layer
class SEND pass
class BLOCK block
| Tier | Policy | Applies To |
|---|---|---|
| T1 — Autonomous | No approval needed | Read messages, triage, 2FA for own accounts |
| T2 — Pre-approved | Allowed for specific contacts + message types | Not yet configured — future expansion |
| T3 — Approval Required | Draft → preview → approve → send | All outbound messages (current default) |
Inbound Monitoring — Reply Detection
graph LR
HB["⏰ Heartbeat
every 15 min"]
CHATS["imsg chats
--limit 10"]
HIST["imsg history
--chat-id N"]
TRIAGE{"Triage"}
URGENT["🔴 URGENT
relay immediately"]
NOTABLE["🟡 NOTABLE
relay to owner"]
SKIP["⚪ SKIP
routine, ignore"]
HB --> CHATS
CHATS --> HIST
HIST --> TRIAGE
TRIAGE --> URGENT
TRIAGE --> NOTABLE
TRIAGE --> SKIP
classDef hb fill:#eef2ff,stroke:#6366f1,stroke-width:1.5px,color:#3730a3
classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:1.5px,color:#e2e8f0
classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
classDef urgent fill:#fef2f2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d
classDef notable fill:#fffbeb,stroke:#ca8a04,stroke-width:1.5px,color:#713f12
classDef skip fill:#f3f4f6,stroke:#6b7280,stroke-width:1px,color:#374151
class HB hb
class CHATS,HIST imsg
class TRIAGE decision
class URGENT urgent
class NOTABLE notable
class SKIP skip
imsg CLI for relay to the owner during heartbeatsSoftware Stack
~/.openclaw/openclaw.json/Applications/OpenClaw.apppairing (one-time approval code for unknowns)imsg (Homebrew)allowlist (silently drops unknowns)disabledContact Profiles
memory/contacts/<name>.mdFuture Expansion
Pre-approve specific contacts + message types. E.g. "confirm plans with X" without requiring approval each time.
Allow the assistant to respond autonomously to approved contacts using personality profiles — a conversational AI with the owner's voice.
SMS available via the dedicated iPhone's eSIM (imsg --service sms). WhatsApp via linked device on same Mac. Same permission framework.
Fish Speech S1-mini pending license approval. More natural prosody and voice cloning capabilities.