Command your AI assistant via Discord or Telegram. It sends messages — text or AI-generated voice — to your contacts via iMessage. Everything runs locally on a Mac. No cloud TTS, no third-party message routing.
What changed in v2
| Component | v1 | v2 |
|---|---|---|
| Command surface | Telegram only | Discord (primary) + Telegram |
| Hardware | MacBook Pro | Mac mini (always-on, headless) |
| TTS engine | Kokoro-82M (fixed presets) | Qwen3-TTS 0.6B/1.7B (voice cloning, natural language control) |
| Voice profiles | Preset voice IDs | Voice design (describe it) or voice clone (3s sample) |
| Languages | 8 | 10 (added Korean, Russian) |
| Security | 5 layers | 5 layers + FileVault, macOS Firewall, Tailscale |
High-Level Architecture
graph LR
DISCORD["Discord
Primary command surface"]
TELEGRAM["Telegram
Secondary / mobile"]
GW["Mac mini
OpenClaw Gateway
always-on, headless"]
AGENT["Main Agent
Claude Opus"]
PROFILE["Contact Profiles
language · tone · voice"]
QWEN["Qwen3-TTS
0.6B / 1.7B · local"]
FFMPEG["ffmpeg
WAV → M4A"]
IMSG["imsg CLI
Messages.app"]
RECV["Recipient
iMessage"]
DISCORD -->|"command"| GW
TELEGRAM -->|"command"| GW
GW --> AGENT
AGENT -->|"reads"| PROFILE
AGENT -->|"text"| QWEN
PROFILE -->|"voice desc / clone ref"| QWEN
QWEN -->|"WAV"| FFMPEG
FFMPEG -->|"M4A"| IMSG
AGENT -->|"text only"| IMSG
IMSG -->|"iMessage"| RECV
AGENT -->|"confirms"| GW
GW -->|"reply"| DISCORD
classDef discord fill:#151a2e,stroke:#7289da,stroke-width:2px,color:#e2e8f0
classDef tg fill:#111e30,stroke:#60a5fa,stroke-width:1.5px,color:#e2e8f0
classDef mac fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef imsg fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
classDef recv fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
class DISCORD discord
class TELEGRAM tg
class GW,AGENT,PROFILE mac
class QWEN,FFMPEG tts
class IMSG imsg
class RECV recv
Hardware & Account Isolation
Message Flow — Step by Step
graph TD
CMD["Owner sends command
'Send X a voice message about Y'"]
ACK["Agent reads context
contact profile + permissions"]
COMPOSE["Compose message
language · tone · dialect per profile"]
PREVIEW["Draft preview
sent to owner for approval"]
APPROVE{"Approve?"}
ITERATE["Owner gives direction
'more grit' · 'shorter' · 'add a joke'"]
TTS_CHECK{"Voice message?"}
TTS["6a. Qwen3-TTS
voice design or clone"]
CONVERT["6b. ffmpeg
WAV → M4A"]
SEND_TEXT["imsg send --text"]
SEND_VOICE["imsg send --file"]
LOG["Log to outbound-log.md"]
CONFIRM["Confirm on Discord"]
CMD --> ACK
ACK --> COMPOSE
COMPOSE --> PREVIEW
PREVIEW --> APPROVE
APPROVE -->|"revise"| ITERATE
ITERATE --> COMPOSE
APPROVE -->|"send it"| TTS_CHECK
TTS_CHECK -->|"yes"| TTS
TTS --> CONVERT
CONVERT --> SEND_VOICE
TTS_CHECK -->|"text only"| SEND_TEXT
SEND_TEXT --> LOG
SEND_VOICE --> LOG
LOG --> CONFIRM
classDef cmd fill:#151a2e,stroke:#7289da,stroke-width:2px,color:#e2e8f0
classDef agent fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
classDef tts fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef send fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
classDef log fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
class CMD,PREVIEW,CONFIRM,ITERATE cmd
class ACK,COMPOSE agent
class APPROVE,TTS_CHECK decision
class TTS,CONVERT tts
class SEND_TEXT,SEND_VOICE send
class LOG log
Natural language: "Send María a voice message wishing her happy birthday". Discord is the primary surface — a dedicated channel the owner uses for all assistant interaction. Telegram works too, especially on mobile.
Loads memory/contacts/<name>.md for language, tone, voice preferences, and relationship context. Checks PERMISSIONS.md to verify outbound is allowed for this contact.
Writes the message respecting the contact's language (e.g. Castilian Spanish with distinción), humor level, and relationship dynamic. The agent brings its own personality — it's not just a relay.
The full draft is sent back for review. Nothing leaves the Mac until explicitly approved. The owner can iterate with tone direction — "more casual", "add humor", "shorter". Multiple rounds are normal and encouraged.
Qwen3-TTS generates speech using either voice design (describe the voice: "warm male, slight accent, moderate pace") or voice clone (3-second reference audio sample). Runs 100% locally on Apple Silicon. Output WAV → ffmpeg converts to M4A for iMessage.
Text: imsg send --to "+1..." --text "message". Voice: imsg send --to "+1..." --file output.m4a. The CLI automates Messages.app — recipient sees a normal iMessage from the assistant's Apple ID.
Every outbound message logged to memory/outbound-log.md with timestamp, channel, recipient, summary, and approval reference. Delivery confirmation sent to the owner on Discord.
Qwen3-TTS — Voice Generation
graph LR
TEXT["Approved text"]
DESC["Voice description
'warm, friendly, slight accent'"]
CLONE["Reference audio
3-10s clip"]
QWEN["Qwen3-TTS
0.6B or 1.7B
local ONNX"]
WAV["output.wav"]
FFMPEG["ffmpeg
AAC 128k"]
M4A["output.m4a
iMessage-ready"]
TEXT --> QWEN
DESC -->|"voice design"| QWEN
CLONE -->|"voice clone"| QWEN
QWEN --> WAV
WAV --> FFMPEG
FFMPEG --> M4A
classDef input fill:#0d1f18,stroke:#34d399,stroke-width:1.5px,color:#e2e8f0
classDef voice fill:#eef0ff,stroke:#5865f2,stroke-width:1.5px,color:#3b40a0
classDef process fill:#1a1806,stroke:#fbbf24,stroke-width:1.5px,color:#e2e8f0
classDef output fill:#0d1f14,stroke:#4ade80,stroke-width:2px,color:#e2e8f0
class TEXT input
class DESC,CLONE voice
class QWEN,FFMPEG process
class WAV,M4A output
Natural language control: "A deep, calm voice with warmth. Professional but approachable. Moderate pace." No pre-made voices to browse — just describe it.
Zero-shot cloning from a short reference clip. Store a sample per contact and the system reproduces their voice — or yours. Clear audio, minimal background noise.
Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian — plus dialectal voice profiles. Language set per contact, not globally.
Command Surfaces — Discord + Telegram
#assistant channel"yes"dmPolicy: "pairing" — one-time code for unknownsSecurity Architecture — 5 Layers + OS Hardening
graph TD
REQ["Outbound message request"]
L1["Layer 1: SOUL.md
Hard rules · Default DENY"]
L2["Layer 2: AGENTS.md
Mandatory permission check"]
L3["Layer 3: PERMISSIONS.md
Per-contact allowlists"]
L4["Layer 4: Channel Config
dmPolicy: allowlist
groupPolicy: disabled"]
L5["Layer 5: Audit Log
outbound-log.md"]
OS["OS Hardening
FileVault · Firewall · Tailscale"]
SEND["✅ Message sent"]
BLOCK["❌ Blocked"]
REQ --> L1
L1 -->|"pass"| L2
L1 -->|"deny"| BLOCK
L2 -->|"pass"| L3
L2 -->|"deny"| BLOCK
L3 -->|"approved"| L4
L3 -->|"unknown"| BLOCK
L4 -->|"pass"| L5
L5 --> SEND
OS -.->|"protects all layers"| L1
classDef req fill:#0e1119,stroke:#1a1d24,stroke-width:1.5px,color:#e2e8f0
classDef layer fill:#fff1f2,stroke:#be123c,stroke-width:1.5px,color:#881337
classDef os fill:#eef2ff,stroke:#6366f1,stroke-width:1.5px,color:#3730a3
classDef pass fill:#0d1f18,stroke:#34d399,stroke-width:2px,color:#e2e8f0
classDef block fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d
class REQ req
class L1,L2,L3,L4,L5 layer
class OS os
class SEND pass
class BLOCK block
| Tier | Policy | Applies To |
|---|---|---|
| T1 | Autonomous | Read messages, triage, 2FA |
| T2 | Pre-approved | Specific contacts + types |
| T3 | Approval required | All outbound (default) |
imsg CLI and relay notable ones to the ownerInbound Monitoring — Reply Detection
graph LR
CRON["⏰ Scheduled check
cron job"]
CHATS["imsg chats
--limit 10"]
HIST["imsg history
--chat-id N"]
TRIAGE{"Triage"}
URGENT["🔴 URGENT
relay immediately"]
NOTABLE["🟡 NOTABLE
relay to owner"]
SKIP["⚪ SKIP
ignore"]
CRON --> CHATS
CHATS --> HIST
HIST --> TRIAGE
TRIAGE --> URGENT
TRIAGE --> NOTABLE
TRIAGE --> SKIP
classDef cron fill:#eef2ff,stroke:#6366f1,stroke-width:1.5px,color:#3730a3
classDef imsg fill:#edfcf0,stroke:#16a34a,stroke-width:1.5px,color:#14532d
classDef decision fill:#1a1806,stroke:#fbbf24,stroke-width:2px,color:#e2e8f0
classDef urgent fill:#fef2f2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d
classDef notable fill:#fffbeb,stroke:#ca8a04,stroke-width:1.5px,color:#713f12
classDef skip fill:#f3f4f6,stroke:#6b7280,stroke-width:1px,color:#374151
class CRON cron
class CHATS,HIST imsg
class TRIAGE decision
class URGENT urgent
class NOTABLE notable
class SKIP skip
A scheduled cron job checks imsg chats for recent activity. New replies are triaged: urgent items relayed immediately to the owner on Discord, notable items batched into the next check, routine messages skipped. The monitoring agent runs on a cheaper model to avoid burning tokens on routine checks.
Contact Profiles
memory/contacts/<name>.mdSoftware Stack
~/.openclaw/openclaw.jsonimsg (Homebrew)allowlistdisabledFuture Expansion
Pre-approve specific contacts + message types. "Confirm plans with María" without owner approval each time. Trust earned incrementally.
Allow the assistant to respond autonomously to approved contacts using their personality profile — a conversational AI with the owner's voice and context.
SMS via the dedicated iPhone's eSIM (imsg --service sms). WhatsApp via linked device on same Mac. Same permission framework applies.
Build a library of 3-second voice samples per contact. The assistant picks the right voice automatically — or blends characteristics from multiple references.