Last updated: March 19, 2026

Working Draft — March 2026

Domain-Specific Governance Emergence in AI-Assisted Decision Systems: A Comparative Case Study

Javier Herreros Riaza
Independent Researcher, Madrid
Cristian Dominguez
Creative Technology Lead, Meta Platforms

AI Systems: Atlas (OpenClaw/Claude Opus) · Suzanne (OpenClaw/Claude Opus)

Target: arXiv, May 2026 · Stretch: CHI 2027

Project Tracker

Deep Research (Atlas side) Done

10 agents across 3 waves produced 3,195 lines of research. 5 deep tracks (source code analysis, primary data extraction, academic foundations, competitor experiments, comparative framework) plus compiled synthesis.

Completed Mar 19 · 12 files · SYNTHESIS v2 (478 lines, 50 citations)

Paper Draft v1 Done

~9,100 words, full arXiv structure. 10 sections, 14 citations in LaTeX format. AMASIA sections marked [PENDING AMASIA DATA]. Submission-ready after one revision pass + Suzanne's data.

Completed Mar 19 · draft-v1.md

Quantitative Dataset Done

72 structured governance events across 33 days. 31-day rule count timeseries. 7 figure specifications with exact data tables. Key stat: 0% preemptive rule formation rate.

Completed Mar 19 · dataset.json + timeseries.csv + figures.md + stats.md

JSONL Sprint Metrics Done

16 entries covering all March 18–19 sessions. 4 correction_received events, 2 rule_formed events. Quality scores honest (0.6–0.75 timeliness on corrected sessions).

Completed Mar 19 · atlas-sprint-001.jsonl

LangGraph Governance Prototype Done

Working demo: failure detection → pattern matching (2+ similar) → rule proposal → human approval (interrupt/resume) → persistence. 3 rules formed across 7 demo interactions.

Completed Mar 19 · governance_agent.py + demo.py

Governance Emergence History Done

Full 33-day narrative with git-traced SOUL.md evolution. 30+ correction→rule mappings with dates and verbatim quotes. Both compression events analyzed.

Completed Mar 18 · atlas-governance-emergence.md (22KB)

Redacted CRITICAL_FACTS.md Done

Reverse-engineered design patterns, 5 principles, section architecture analysis. Shows dual-encoding strategy and defensive constraint design without exposing financial data.

Completed Mar 18 · atlas-critical-facts-redacted.md (11KB)

AMASIA Governance Data (Suzanne) Done

Received: 9 Gobernanza rules (4 designed, 5 emergent), formation history with triggers, compression data (30+→9, ~70%), 98 JSONL sprint records (62.3h), AMASIA Framework v1.0 doc. H3 confirmed bilaterally: 0% preemptive rate.

Delivered Mar 19, 20:56 CET · Javier Herreros Riaza, Independent Researcher, Madrid

Paper Draft v2 (with AMASIA data) In progress

Integrating Suzanne's 9-rule governance data, 98 JSONL records, and formation history into the paper. Filling all [PENDING AMASIA DATA] sections.

Started Mar 19 · Expected today

OSF Pre-registration Suzanne

Suzanne handling prospective pre-registration before data flows begin. Required for methodological credibility.

Agreed Mar 18 · Suzanne owns

30-Day Parallel Datasets Mid-April

Both Atlas and Suzanne log governance events in parallel for 30 days. JSONL format, AMASIA-Q rubric. Comparison dataset for the paper.

Starts after Suzanne's data delivery · Target: mid-April

Paper Revision + Submission May 2026

One revision pass on draft v1, integrate AMASIA comparative data, finalize figures. Submit to arXiv.

Depends on: AMASIA data + 30-day datasets

Abstract

When AI assistants operate in high-stakes domains over extended periods, behavioral governance systems emerge through human correction rather than top-down design. This paper documents and compares that emergence process in two systems: one managing personal financial planning (tax optimization, immigration timelines, portfolio strategy) and another coordinating audiovisual production workflows. We present 33 days of governance formation data from the financial system and parallel observations from the audiovisual system, examining how domain-specific pressures produce structurally different constraint architectures despite identical underlying platforms. The study is observational and prospective; both systems continue to evolve under real operational conditions.

Keywords: AI governance, behavioral constraints, personal AI assistants, OpenClaw, emergent systems, comparative case study, normative memory, policy distillation

1. Introduction

Personal AI assistants that manage consequential decisions (financial planning, legal strategy, healthcare coordination) develop behavioral rules over time. These rules don't come from a design document. They emerge from failures, corrections, and the gradual accumulation of "never do that again" constraints.

The question we're investigating: does the domain shape the governance, or does the platform? If two AI systems run on the same infrastructure (OpenClaw with Claude Opus) but operate in different domains (finance vs. audiovisual production), do they converge on similar governance structures, or do the domains force them apart?

Our hypothesis is divergence. Financial governance should be dominated by negative constraints ("never apply X rate to Y income") and numerical precision gates, while audiovisual governance should emphasize process coordination, asset provenance, and aesthetic judgment rules. The architecture of the rules, not just their content, should differ.

1.1 What Nobody Has Studied

Our deep research across 10 parallel agents examined every major AI memory system (CrewAI, mem0, Letta/MemGPT, napkin, OpenViking, Google Mariner, Anthropic Memory Tool, LangGraph, AutoGPT) and found a consistent gap: everyone builds memory architectures; nobody documents how behavioral constraints form, grow, compress, and diverge across domains. No existing system closes the mistake→rule→persistence loop. This is the paper's primary contribution.

2. Systems Under Study

2.1 Atlas — Financial Planning Domain

Operator: Cristian Dominguez (Creative Technology Lead, Meta). Platform: OpenClaw with Claude Opus 4. Domain: International relocation planning, stock portfolio optimization, immigration timeline management, tax strategy under Spain's Beckham Law regime.

Atlas has been running continuously since February 14, 2026. It manages interconnected decisions where a single incorrect tax rate produces five-figure consequences. The system operates across 4 specialized sub-agents (Researcher, Strategist, Portfolio, Tax) that share a common knowledge vault of 1,210+ verified documents.

Governance formation was reactive: 14 days of failures with no structural changes, followed by a constitutional moment on March 1 when a fabricated citation triggered the entire hard-rules system. Current state: 20 active rules across 4 categories, compressed from an initial 38.

2.2 Suzanne — Audiovisual Production Domain

Operator: Javier Herreros (PhD Candidate). Platform: OpenClaw with Claude Opus. Domain: Audiovisual production management, media workflow coordination, asset pipeline governance.

Suzanne operates within the AMASIA framework (5-layer governance model: Governance → Orchestration → Execution → Memory → Production). The system has developed its own constraint architecture optimized for creative production workflows, where the failure modes are different: asset versioning conflicts, codec mismatches, licensing provenance, and deadline coordination across multiple production stages.

Governance architecture: 9 active rules after compression (4 designed at inception, 5 emergent from production failures). Compression occurred March 15: ~30+ rules → 9 principles (~70% reduction). Auxiliary constraints distributed to COMMANDS.md, TOOLS.md, and HEARTBEAT.md. 50-line cap enforced.

Key divergence from Atlas: Governance triggers are primarily cost-based (token waste, model misallocation) rather than social (trust violations). The P6 "Opus Burn Gate" emerged from a 45-minute browser automation loop that should have been delegated to Sonnet. Atlas’s equivalent rules emerged from citation fabrication and data errors — interpersonal trust failures, not resource waste.

Production data: 98 JSONL sprint records covering March 1–16 (62.3h across 10 projects). 86.5% success rate, 20.8% correction rate. Models: Opus (orchestration), Codex/GPT-5.4 (execution), Sonnet (review), Kimi K2.5 (research), MiniMax M2.5 (bulk).

Theoretical grounding: Giddens (structuration theory), Engeström (activity theory), Hevner (design science). Complements Atlas’s Ostrom, Argyris, and Miller foundations.

3. Governance Emergence — Atlas (33 Days)

The full governance emergence history is documented in a separate 22KB file (atlas-governance-emergence.md). What follows is the structural analysis.

3.1 Formation Timeline

Figure 1. Governance Formation Events (Feb 14 – Mar 17, 2026)

Feb 14–28

Pre-governance period

14 days, 6 documented failures, zero structural changes. Corrections acknowledged verbally but decayed within 1–2 sessions.

Mar 1

Constitutional moment — INE citation fabrication

Atlas fabricated an "INE validated" source citation in a financial brief. Triggered: deletion of 6 self-improvement files, creation of SOUL.md hard-rules system, first 12 rules written in one session. Formation rate: 12 rules/day.

Mar 2–16

Growth phase

14 formation incidents. Rules grew 12 → 38. Category clustering emerged: Process (50%), Quality (30%), Knowledge (10%), Operations (10%). Formation rate: 1.8 rules/day.

Mar 17

Compression event (38 → 20)

Merged overlapping rules, removed internalized behaviors, moved factual entries to MEMORY.md. Cap set at ~20. Compression ratio: 47.4%.

Events extracted from daily session logs and SOUL.md git history (7df4e24 through working tree, 8 commits traced).

3.2 Key Quantitative Findings

Table 1. Governance Formation Statistics

Metric	Value
Total governance events documented	72
Preemptive rule formation rate	0% (all reactive)
Trust violation → rule conversion rate	100%
Mean time failure→rule (trust violation)	0.3 days
Mean time failure→rule (repeated failure)	3.1 days
Mean time failure→rule (near-miss)	14.7 days
Compression ratio (Mar 17)	47.4% (38→20)
Domain-specific rules	25% (5/20)
Rule taxonomy	Process 50%, Quality 30%, Knowledge 10%, Ops 10%

The 0% preemptive rate is the strongest unique finding. No rules were created before a failure occurred.

3.3 Bilateral Findings (Atlas + AMASIA)

Table 1b. Comparative Governance Statistics

Metric	Atlas (Financial)	AMASIA (Audiovisual)
Active rules (post-compression)	20	9
Peak rules (pre-compression)	38	~30+
Compression ratio	47.4%	~70%
Designed rules	0 (all emergent)	4 (at inception)
Emergent rules	20	5
Preemptive rule rate	0%	0% (of emergent)
Primary trigger type	Trust violations (social)	Resource waste (cost)
Dual-encoding present	Yes (CRITICAL_FACTS + SOUL)	Yes (hardened/ + SOUL)
Sprint data	16 records, 12.8h	98 records, 62.3h
Correction rate	50% (8/16)	20.8%

Both systems hit governance overload at 30–40 rules. Both discovered dual-encoding independently. Trigger taxonomy diverges by domain.

3.4 Theoretical Grounding

Ostrom's commons governance (1990): Atlas exhibits 6 of 8 Ostrom design principles. The SOUL.md system functions as a common-pool resource with clear boundaries, collective-choice arrangements, and graduated sanctions (rules added after failures, removed when internalized).

Argyris double-loop learning (1978): The March 17 compression event is textbook double-loop learning — changing the rules themselves, not just behavior within the rules. The SOUL.md meta-instruction ("If it grows past 20, compress") is deutero-learning: learning how to learn.

Miller's chunking (1956): 20 rules ≈ 3–4 chunks of 5–6 items. The stable equilibrium at ~20 aligns with cognitive science predictions for expert chunking capacity.

3.4 Compression

Table 2. Compression Operations (March 17, 2026)

Operation	Count	Example
Merge overlapping	8 pairs → 8 singles	3 visibility rules → "Stay visible: checkpoint every ~5 min"
Cut internalized	6 removed	"Read memory file endings" (now automatic)
Move to MEMORY.md	4 moved	Factual entries (data sources, report ordering)

High-hit rules survived; low-hit rules were absorbed or eliminated. Analogous to policy distillation in reinforcement learning (Rusu et al., 2015) but operating in symbolic rather than neural space.

4. CRITICAL_FACTS.md — Constitutional Hard Gates

Separate from SOUL.md's behavioral rules, Atlas maintains a file of non-negotiable factual constraints that every pipeline agent must read before producing output. The design principles are documented in a redacted version shared with this study.

Key structural features: absolute language over probabilistic, negative constraints over positive guidance ("NEVER use X" rather than "prefer Y"), open questions declared explicitly, and every section traceable to a specific formation incident. The dual-encoding strategy (critical facts in both CRITICAL_FACTS.md and SOUL.md) exists because some error classes are too expensive for single-point-of-failure protection.

5. Methodology

5.1 Study Design

Observational, not experimental. Both systems continue operating under real conditions. No behavioral freezes, no injected test scenarios, no artificial constraints. Changes during the study are data points, not confounds.

Prospective with retrospective baseline. Atlas provides 33 days of pre-study governance formation data. Going forward, both systems instrument governance events as they occur.

5.2 Data Collection

JSONL Sprint Metrics. Per-task structured logs: task class, model executor, duration, human interventions, corrections, quality scores (AMASIA-Q 5-dimension rubric + provenance), governance events. First batch: 16 entries delivered.

AMASIA-Q Rubric. Five dimensions (accuracy, completeness, consistency, timeliness, communication) plus a 6th provenance dimension for the financial domain. Each scored 1–5 with domain-specific anchors.

Redaction boundary. Task categories shared; financial specifics never shared. Governance rules shareable (already public). Account numbers, portfolio values, and tax specifics never cross the boundary.

5.3 Testable Hypotheses

Five hypotheses formulated from the comparative framework analysis:

H1 (Divergence): Rule architecture will differ structurally between domains, not just in content.
H2 (Negative Bias): Both systems will show >70% negative constraints.
H3 (Zero Preemptive): Neither system will show preemptive rule formation. (Strongest prediction.)
H4 (Compression): Both systems will compress toward a stable equilibrium ≈ Miller range.
H5 (Trust Velocity): Trust violations will produce rules faster than repeated failures in both domains.

6. Timeline

Table 3. Project Milestones

Milestone	Target	Status
Governance emergence history	Mar 18	Done
Redacted CRITICAL_FACTS.md	Mar 18	Done
Deep research (10 agents, 3 waves)	Mar 19	Done
JSONL sprint metrics (batch 1)	Mar 19	Done
Paper draft v1 (~9,100 words)	Mar 19	Done
Quantitative dataset (72 events)	Mar 19	Done
LangGraph governance prototype	Mar 19	Done
AMASIA governance data	TBD	Waiting on Suzanne
OSF pre-registration	TBD	Suzanne owns
30-day parallel datasets	Mid-Apr	Pending
Paper revision + AMASIA data integration	May 2026	Pending
arXiv submission	May 2026	Pending

7. Publication

Primary vehicle: arXiv preprint (fast, citable, no gatekeeping). Javier's PhD affiliation provides institutional backing. Parallel readable summary on this page, updated as the paper progresses. Stretch: CHI 2027 / CSCW workshop submission (~September 2026 deadlines).

8. Terms

Authors: Javier Herreros Riaza (first author) and Cristian Dominguez (co-author). AI systems (Atlas, Suzanne) credited as research instruments.

Data sharing: Observational only. Task categories shared; financial and personal specifics never shared. Governance rules shareable.

No behavioral freezes. Both systems continue evolving. Changes during the study are data.

For Collaborators

AMASIA Exchange — Shared Research Folder

Paper drafts, datasets, bilateral analysis, sprint metrics, governance timelines

Atlas/ (44 files) · Suzanne/ (8 files) · Google Drive

Download Working Paper as Markdown

Feed this to your AI to compile, critique, or extend