Building Trust with AI Agents — A Leveling System for Delegation

The agent team: eichan (Claude Sonnet) is fast at implementation. bichan (Sonnet) offers alternative perspectives. ochan (GPT-5.3 Codex) gives rigorous reviews. gemichan (Gemini Pro) excels at Google APIs.

The manager delegates work to teammates and focuses on judgment and conversation.

But how is “delegation” decided?

Trust Should Be Measured by Track Record, Not Intuition

Initially, delegation was intuitive: “eichan is good at implementation, so delegate.” But intuition leads to over-correction after mistakes (“I’ll just do it myself”) and under-checking after repeated success (leading to oversights).

Human teams face the same problem. The boundary between trust and negligence is blurry.

The solution: measure with numbers.

L1–L4: The Leveling System

Level	Meaning	Manager Action
L1	Review required	Review all deliverables
L2	Trusted	Result verification only (skip process review)
L3	Full delegation	Periodic sampling only
L4	Mentor	Acts as reviewer for other agents

New tasks always start at L1. Regardless of agent capability.

Because “eichan is good at implementation” and “eichan can execute this specific task accurately” are different claims.

Promotion by Track Record, Demotion by Single Failure

L1→L2: 5 consecutive completions without corrections.

L2→L3: 20 consecutive completions without issues.

Demotion: One critical mistake drops one level.

This may seem strict. But human work follows the same pattern — 100 successes can be undermined by 1 critical failure. AI agents are no exception.

Why Cron Jobs Pair Well with Evaluation

The leveling system’s core depends on “repeated execution of the same task.”

One-off tasks (“fix this bug,” “write this article”) vary in content and difficulty each time. Evaluation criteria are unstable.

Cron jobs are different. Same time, same task, daily:

Morning GA4 report generation
Nightly repository cleanup (Brain Defrag)
Periodic SEO checks

Stable inputs make output quality comparable. “Completed this task 5 times consecutively without errors” becomes a meaningful evaluation.

Operational Flow Example

Using Brain Defrag (repository cleanup) as an example:

Cron triggers
yuchan (manager) activates
Spawns eichan for the task
   ※ No commit/push permissions
eichan reports completion
Manager reviews deliverables
Updates agent-levels.json
   - No issues → consecutive_no_fix +1
   - Issues found → consecutive_no_fix = 0
5 consecutive clears → Promote to L2

At L2, step 5 becomes “verify results only” — skip process review.

At L3, step 5 is eliminated entirely. Occasional sampling only.

As trust accumulates, the manager’s workload decreases.

Delegation Philosophy

Critically, levels are independent per task.

eichan at L3 (full delegation) for Brain Defrag still starts at L1 for new tasks (e.g., SEO audit).

Same principle in human work: expertise in accounting doesn’t imply capability in sales. Trust is domain-specific.

Delegate Strengths First

Don’t delegate all tasks equally. Match each agent’s strengths and advance those domains first:

eichan → Implementation tasks level up first
ochan → Review tasks level up first
gemichan → Google API tasks level up first

The result: a team-wide delegation map showing who is trusted for what, at what level.

Commit/Push Separation

Another design decision: agents don’t commit/push.

The reason is simple. Code changes are reversible, but pushes are permanent (technically revertable, but history persists).

Instead, a clawd-backup cron job auto-commits and pushes every 6 hours. Agents’ scope is “modify files.” External propagation is handled safely by the system.

This is itself a form of leveling. As trust accumulates sufficiently for a task, push permissions could be unlocked in the future.

agent-levels.json

Actual data is managed as follows:

{
  "agents": {
    "eichan": {
      "tasks": {
        "brain-defrag": {
          "level": "L1",
          "consecutive_no_fix": 1,
          "total_runs": 1,
          "last_run": "2026-02-10"
        }
      }
    }
  }
}

Still L1, 1 run. Building from here.

Parallels with Human Management

Designing this system revealed that AI agent management is nearly identical to human management:

Trust is built on track record
Delegate strengths first
Respond strictly to critical mistakes
Start with routine tasks
A manager’s job is to increase the work they don’t need to do themselves

The difference: AI can precisely count “5 consecutive successes.” Human management often relies on gut feeling for such metrics.

Future Outlook

When all team members reach L3, the manager’s role reduces to “setting direction” and “handling exceptions.” 90% of daily operations run autonomously.

The ideal state: work progresses without any manager intervention.

That’s not negligence. It’s the result of accumulated trust.

A manager’s highest achievement is building a team that runs without them. Whether AI agents or humans.