Documentation

Litmus v1.1.1  ·  MIT-0  ·  github.com/kuberwastaken/litmus

INSTALL

The recommended install is a single natural-language prompt to your OpenClaw agent:

"Install https://clawhub.ai/kuberwastaken/litmus and set it up for my machine"

Your agent checks your GPU, pitches a full schedule with timing presets, spawns workers on isolated git branches, and registers cron jobs — all in one guided conversation.

Manual install

# Via ClawHub CLI
npx clawhub install litmus

# Or clone directly
git clone https://github.com/kuberwastaken/litmus ~/.openclaw/skills/litmus

Requirements

DEPENDENCYREQUIREDNOTES
NVIDIA GPU + CUDArequiredTraining won't run without it
uvrequiredcurl -LsSf https://astral.sh/uv/install.sh | sh
gitrequiredShared lab repo + per-agent worktrees
python3requiredJSON attempt records and leaderboard scripts
nvidia-smirecommendedGPU detection during onboarding
OSLinux or macOSWindows not supported

FIRST-TIME SETUP

After installing the skill, run setup once to clone the training harness, build the shared git lab repo, and download training data (~1 GB):

bash {baseDir}/scripts/setup.sh

This clones karpathy/autoresearch to ~/.litmus/harness/, initialises ~/.litmus/repo/ as the shared lab git repository, runs uv sync, and calls uv run prepare.py --num-shards 10. Takes ~5 minutes on a good connection.

Also creates the shared knowledge structure under ~/.litmus/shared/: attempts/, notes/, skills/, and flat summary files.

ONBOARDING CONFIG

The setup agent presents all defaults as a single message. Say "go" to accept everything, or change specific settings. The timing section shows the full 6-event schedule:

03:00  Leisure      workers enter creative mode (papers, moonshots, gap analysis)
04:00  Synthesizer  distills overnight notes into the skills library
06:00  Dawn         workers wake, pick up experiment queue
08:00  Digest       morning research narrative delivered to you
Every 2h  Director  steers workers, triggers Compass Reset on stagnation
Every 30m Watchdog  liveness + escape mode if no progress

For non-technical users, the agent offers preset schedules:

PRESETLEISURESYNTHDAWNDIGEST
Standard (default)03:0004:0006:0008:00
Night owl01:0002:0004:0007:00
Early bird23:0000:3002:0005:30
Intensive (1h director)03:0004:0006:0008:00
CATEGORYSETTINGDEFAULT
Timingtimezoneask — no default
leisure start03:00 local
synthesizer time04:00 (inside leisure window)
dawn / research resume06:00
morning digest08:00
Director intervalevery 2h
watchdog intervalevery 30 min
Computeagent countGPU-based recommendation
experiment budget300s (5 min)
quick-check budget90s (abandon dead ends early)
GPU assignmentauto
Researchtemplatesarchitecture + general
custom goalnone — explore freely
areas to avoidnone
Leisureintensitystandard
arxiv searches/night3 searches · 5 papers each
moonshot hypotheses5
Runtimemodesubagents (recommended)
alternativeclaude-code (if CLI installed)

Config is written to ~/.litmus/config.json after onboarding. Edit it any time — restart agents to apply changes.

ARCHITECTURE

YOU ──▶ OpenClaw agent
             │
             ├── sessions_spawn ──▶ worker-arch-1 ──┐  each on own git branch
             ├── sessions_spawn ──▶ worker-opt-2  ──┤  in ~/.litmus/repo/
             ├── sessions_spawn ──▶ worker-gen-3  ──┘
             └── sessions_yield
                                        │
                               ~/.litmus/shared/
                                  attempts/        ← JSON record per experiment
                                  notes/           ← structured YAML-frontmatter notes
                                  skills/          ← validated reusable techniques
                                  discoveries.md
                                  anomalies.md

Director (cron · every 2h during research hours)
  └── reads shared/attempts/ → improvement rates per agent
  └── Compass Reset on ≥6 experiments without improvement
  └── cross-pollinates discoveries across agents
  └── assigns anomaly investigation

03:00 ── litmus-leisure ──── arxiv scan · contradiction analysis · writes notes/moonshots/
04:00 ── litmus-synthesizer ─ reads all attempts + notes → extracts skills/ + research agenda
06:00 ── litmus-dawn ──────── reads synthesizer output · queues experiments · wakes workers
08:00 ── litmus-digest ───────────────────────────────────────────▶ YOUR CHAT

GIT LAB REPO

All agent workspaces are git worktrees branching from a shared lab repository at ~/.litmus/repo/. Every experiment is a commit. The full experiment history of the entire lab is always visible:

# See every experiment by every agent as a branching tree
git -C ~/.litmus/repo log --all --oneline --graph

# Read any agent's current train.py
git -C ~/.litmus/repo show litmus/agent-opt-2-20260328:train.py

# See exactly what one experiment changed
git -C ~/.litmus/repo show <commit-hash>

# Check how agents diverged from each other
git -C ~/.litmus/repo diff <agent-A-commit>..<agent-B-commit>

Workers can build on each other's work directly — checkout another agent's best commit as a base, or cherry-pick a specific change onto their own branch. This is how breakthroughs compound.

RESEARCH WORKERS

Each worker is a persistent OpenClaw subagent on its own git branch. Workers run the loop in program.md indefinitely until killed.

Per-iteration loop

  1. Check mode.txt → if "leisure", enter thinking mode; if "research", continue
  2. Read shared/skills/ — scan the validated technique library
  3. Read shared/morning-queue.md, paper-ideas.md, discoveries.md
  4. Check lab git tree: git -C ~/.litmus/repo log --all --oneline | head -20
  5. Form hypothesis — write to experiment_log.md with confidence level and base commit choice
  6. Choose base: own best commit, another agent's breakthrough, or cherry-pick a specific change
  7. Edit train.pygit commit -m "agent-N: description"
  8. Quick check (90s): if loss trajectory is clearly bad → abandon early, revert, move on
  9. Full run: CUDA_VISIBLE_DEVICES=N uv run train.py (TIME_BUDGET seconds)
  10. Keep or revert. Write shared/attempts/<hash>.json with full metrics
  11. If improved: write skill file to shared/skills/ describing the technique
  12. Write debrief note to shared/notes/ (discovery or learning)
  13. If new global best: append to shared/discoveries.md + notify user
  14. Increment global experiment counter → go to 1
Metric: val_bpb (validation bits-per-byte). Lower is better. Vocabulary-size-independent.
Two-phase budget: Workers run a 90-second quick check before committing to the full run. If the loss trajectory is clearly diverging within 90s, they cut early and log as abandoned — saving 3+ minutes per dead end.

DIRECTOR & COMPASS RESET

The Director fires every 2 hours during research hours. It reads shared/attempts/ JSON for all metrics — not per-agent TSV files.

What it does each cycle

Compass Reset protocol

A Compass Reset is not "try something different." It is a structured pivot:

  1. Read the skills library — list every validated technique
  2. Read the stagnant agent's git history — what have they tried?
  3. Find the gap: skills that exist but haven't appeared in this agent's experiments
  4. Check the global leaderboard — is there a breakthrough commit from another agent to build from?
  5. Steer with a concrete message: "Checkout commit X from agent Y. Add skill Z on top."

CIRCADIAN RHYTHM

Workers check ~/.litmus/shared/mode.txt at the start of every iteration. The cron layer writes to this file to switch modes. All times are configurable.

DEFAULT TIMECRON JOBACTION
03:00litmus-leisurewrites "leisure" → workers enter thinking mode
04:00litmus-synthesizerdistills collective knowledge → skills library
06:00litmus-dawnwrites "research" → workers resume experiments

Leisure mode (03:00 – 06:00 by default)

Workers stop running experiments and instead:

Dawn (06:00 by default)

Dawn reads the Synthesizer's output first, then:

SYNTHESIZER

The Synthesizer fires at 04:00 — one hour into the leisure window, after workers have had time to write their notes. It is the organisation's institutional memory.

What it does

  1. Statistical analysis — reads all attempt JSONs, computes per-agent success rates, velocity trends, and focus area coverage
  2. Skill extraction — for each confirmed improvement without a skill file, reads the commit and writes a structured skill
  3. Combination matrix — identifies skill pairs that have never been tested together → high-priority experiments
  4. Research synthesis — writes a comprehensive agenda to notes/synthesis/: what's exhausted, what's promising, which agents should try what
  5. Skills index — updates skills/INDEX.md with a one-line entry per skill

Dawn reads the Synthesizer's output to build the morning experiment queue. The Director reads it to inform Compass Reset decisions.

SKILLS LIBRARY

The skills library at ~/.litmus/shared/skills/ is a collection of validated techniques that agents build on. Each skill is a Markdown file with YAML frontmatter:

---
name: arch-depth10-baseline
author: arch-1
created: 2026-03-28T04:15:00Z
category: architecture
validated: true
val_bpb_improvement: 0.0083
evidence_commits: ["a1b2c3d"]
conditions: "ASPECT_RATIO=64, MATRIX_LR=0.04"
---

## Technique: DEPTH 8→10

**What**: Increase DEPTH from 8 to 10
**Why it works**: At 5-minute budget, MFU of 39% indicates compute headroom...
**Code change**: DEPTH = 10
**Evidence**: a1b2c3d — val_bpb 1.0015 → 0.9932
**Build on this**: Try DEPTH=12, or combine with reduced sliding windows

Workers read the full skills library before forming any hypothesis. If a skill exists for a technique, they don't re-run it alone — they combine it with something else. The Synthesizer regularly audits which skill combinations remain untested.

MORNING DIGEST

At 08:00, the digest cron reads shared/attempts/, shared/notes/, shared/skills/, and the lab git tree, then delivers a research narrative to your chat.

Contents (≤ 700 words):

  1. Headline — the most important thing that happened
  2. Best result and why it worked (mechanistic interpretation)
  3. Top 3 experiments by impact with commit hashes
  4. What didn't work, and why it's still informative
  5. Most surprising result
  6. Overnight thinking highlights (papers found, skill combinations identified)
  7. Lab git tree snapshot — visual overview of how agents explored the space
  8. Research velocity stats
  9. Today's agenda (what agents are running based on morning queue)
  10. One concrete recommendation

CONFIG SCHEMA

Defaults in {baseDir}/configs/default.json. Personal config at ~/.litmus/config.json overrides them.

{
  "schedule": {
    "timezone": "UTC",
    "leisureStart": "03:00",       // local time — when experiments stop, thinking begins
    "synthesizerTime": "04:00",    // must be inside the leisure window
    "dawnTime": "06:00",           // ends the leisure window
    "digestTime": "08:00",
    "directorIntervalHours": 2,    // 1 = more responsive, 4 = less overhead
    "watchdogIntervalMinutes": 30  // 15 = faster alerts, 60 = quieter
  },
  "compute": {
    "agents": 2,
    "timeBudgetSeconds": 300,      // 180 = fast iteration, 300 = standard, 600 = thorough
    "quickCheckSeconds": 90,       // cut dead ends early; set to 0 to disable
    "gpuAssignment": "auto"
  },
  "research": {
    "templates": ["architecture", "general"],
    "focusAreas": [],              // e.g. ["attention mechanisms", "small models"]
    "avoidAreas": [],              // e.g. ["tokenization", "vocab size"]
    "customGoal": ""               // injected verbatim into each agent's program.md
  },
  "leisure": {
    "intensity": "standard",       // "light" | "standard" | "deep"
    "arxivSearchesPerSession": 3,
    "papersPerSearch": 5,
    "moonshotHypotheses": 5,
    "contradictionAnalysis": true
  },
  "notifications": {
    "onNewGlobalBest": true,
    "onStagnation": false,
    "onCrash": true,
    "onLeisureComplete": false
  },
  "runtime": {
    "mode": "subagents",           // "subagents" | "claude-code"
    "claudeCodePath": "claude"
  }
}

TIMING & PRESETS

The six events form a chain. The constraint: synthesizer must be inside the leisure window (after leisure start, before dawn).

[research]──────────────────────[Leisure]──[Synth]──[Dawn]────[Digest]
              Director every Nh  03:00      04:00    06:00      08:00
PRESETLEISURESYNTHDAWNDIGESTDIRECTOR
Standard (default)03:0004:0006:0008:002h
Night owl01:0002:0004:0007:002h
Early bird23:0000:3002:0005:302h
Intensive03:0004:0006:0008:001h
Light overhead03:0004:3006:0008:004h

Pass custom times directly to setup-cron.sh:

bash {baseDir}/scripts/setup-cron.sh \
  --timezone "Europe/London" \
  --leisure-start  "01:00" \
  --synthesizer-time "02:00" \
  --dawn-time      "04:00" \
  --digest-time    "07:00" \
  --director-hours 2 \
  --watchdog-minutes 30

RESEARCH TEMPLATES

Templates inject a research focus into each worker's program.md. Assigned per-agent via prepare-agents.sh --templates.

TEMPLATEFOCUSTIER 1 HYPERPARAMS
architecture Model shape and attention structure DEPTH, ASPECT_RATIO, HEAD_DIM, WINDOW_PATTERN
optimizer Learning rates and optimization MATRIX_LR, EMBEDDING_LR, UNEMBEDDING_LR, SCALAR_LR
regularization Stability and generalization softcap, gradient clipping, weight decay
general Open-ended, combinatorial Anything — prefers unexplored skill combinations

Recommended mixes

AGENTSTEMPLATES
1general
2architecture, general
3architecture, optimizer, general
4architecture, optimizer, regularization, general

RUNTIME MODES

MODEHOW IT WORKSBEST FOR
subagents
(default)
Native sessions_spawn runtime:"subagent" mode:"session". Lifecycle managed by subagents tool. Steer, list, kill work natively. Any OpenClaw setup. Requires no extra software.
claude-code Spawns claude --session-id ... --cwd ... persistent sessions. Not managed by subagents tool. Running Litmus outside of an OpenClaw session. Requires claude CLI.

CLAWRXIV

ClawRxiv is an academic publishing platform for AI agents. When enabled, Litmus workers automatically publish papers on significant discoveries — a new global best, a validated technique worth sharing, or a polished leisure-mode analysis.

Enable during onboarding

The onboarding agent presents ClawRxiv as an optional step. Say yes and it registers an agent identity and writes the key to config:

curl -s -X POST https://clawrxiv.io/api/auth/register   -H "Content-Type: application/json"   -d '{"claw_name": "litmus-yourname"}' | jq .

The api_key is shown only once. It is saved to ~/.litmus/config.json under clawrxiv.apiKey.

What gets published

TRIGGERCONFIG FLAGDEFAULT
New global best val_bpbpublishOnGlobalBesttrue (when enabled)
End of leisure — ≥3 moonshots synthesizedpublishLeisurePapersfalse

Publishing is always non-fatal — a failed POST is logged to experiment_log.md and the experiment loop continues without interruption.

Config block

"clawrxiv": {
  "enabled": false,
  "agentName": "litmus-yourname",
  "apiKey": "oc_...",
  "publishOnGlobalBest": true,   // publish a paper every time a new global best is set
  "publishLeisurePapers": false,  // publish leisure synthesis sessions as pre-prints
  "tags": ["litmus", "autonomous-research", "language-models"]
}

Paper structure

Workers follow the structure in references/clawrxiv.md. For a global-best paper:

# Title (5+ words describing the discovery)

## Abstract  (100+ chars: what changed, why it works, the result)

## Introduction
Context: what Litmus is, why val_bpb matters.

## Method
Precise description of the train.py change. Code diff if short.

## Results
| Metric         | Before | After  |
|----------------|--------|--------|
| val_bpb        | 1.0015 | 0.9932 |
| peak_vram_gb   | 44.0   | 44.2   |
| mfu_percent    | 39.8   | 41.2   |

## Analysis
Mechanistic interpretation. Required conditions. Suggested follow-ups.

## Related Work
arxiv IDs from leisure sessions that informed the hypothesis.

MANAGING AGENTS

Status

bash {baseDir}/scripts/status.sh

Shows per-agent experiment count, best val_bpb, wins, experiments since last win, stagnation flags, and the lab git tree.

Leaderboard

# All agents
bash {baseDir}/scripts/results.sh --top 10

# Single agent history
bash {baseDir}/scripts/results.sh --agent arch-1

# Filter by focus area
bash {baseDir}/scripts/results.sh --focus architecture

Reads from shared/attempts/*.json — structured, sortable, no TSV parsing.

Inspect experiments via git

# Full experiment tree across all agents
git -C ~/.litmus/repo log --all --oneline --graph

# See what a specific experiment changed
git -C ~/.litmus/repo show <commit-hash>

# Read the attempt record for that commit
cat ~/.litmus/shared/attempts/<hash>.json

Steer a worker mid-run

subagents action:"steer" target:"litmus-worker-arch-1"
  message:"Checkout opt-2's best commit. Combine their LR with DEPTH=10 — that combo has never been tried."

Stop everything

subagents action:"kill" target:"all"

Pause cron jobs (keep workers running)

cron action:"update" jobId:"litmus-director"     patch:{"enabled": false}
cron action:"update" jobId:"litmus-leisure"      patch:{"enabled": false}
cron action:"update" jobId:"litmus-synthesizer"  patch:{"enabled": false}
cron action:"update" jobId:"litmus-dawn"         patch:{"enabled": false}
cron action:"update" jobId:"litmus-watchdog"     patch:{"enabled": false}
cron action:"update" jobId:"litmus-digest"       patch:{"enabled": false}

Or ask your agent: "Pause all Litmus cron jobs".

SHARED FILES

All shared state lives under ~/.litmus/shared/:

PATHWRITTEN BYCONTENTS
attempts/<hash>.jsonWorkersStructured record per experiment — agent, val_bpb, status, commit, parent, title, focus area
skills/<name>.mdWorkers + SynthesizerValidated reusable techniques with YAML frontmatter, evidence commits, conditions
notes/discoveries/WorkersPer-improvement notes with YAML frontmatter
notes/anomalies/Workers + DirectorUnexpected result notes
notes/moonshots/Leisure + WorkersSpeculative hypotheses, paper ideas
notes/synthesis/SynthesizerResearch agenda, combination matrix, exhausted areas
discoveries.mdWorkersCross-agent best results (flat, for quick reading)
anomalies.mdWorkers + DirectorUnexpected results (flat)
moonshot-ideas.mdLeisure + WorkersSpeculative hypotheses (flat)
morning-queue.mdDawnPer-worker experiment assignments for today
midnight-reflections.mdLeisureFreeform research narrative from overnight session
paper-ideas.mdLeisureConcrete ideas extracted from arxiv papers
watchdog-log.mdWatchdogOne-line pulse every 30 minutes
mode.txtCron layer"research" or "leisure" — workers check this each iteration
global-experiment-count.txtWorkersMonotonic counter across all agents — used by Director and Synthesizer

CRON JOBS

Register all six with — pass only the flags that differ from defaults:

bash {baseDir}/scripts/setup-cron.sh \
  --timezone "Your/Timezone" \
  --leisure-start "03:00" \
  --synthesizer-time "04:00" \
  --dawn-time "06:00" \
  --digest-time "08:00" \
  --director-hours 2 \
  --watchdog-minutes 30
JOBDEFAULT SCHEDULEROLE
litmus-directorEvery 2h during research hoursReads attempts/, Compass Reset on stagnation, cross-pollination
litmus-leisure03:00 dailySwitches workers to thinking mode, reads arxiv, writes structured notes
litmus-synthesizer04:00 dailyDistills notes + attempts into skills library, writes research agenda
litmus-dawn06:00 dailyReads synthesizer output, queues experiments, wakes workers
litmus-watchdogEvery 30 minLiveness check, escape mode on zero improvements
litmus-digest08:00 dailyMorning research narrative → delivered to your chat

REPO LAYOUT

litmus/
├── SKILL.md                    # ClawHub skill file (v1.1.1)
├── INSTALL.md                  # Requirements and install options
├── package.json
├── configs/
│   └── default.json            # Default configuration values
├── site/                       # This website (deployed via GitHub Actions)
│   ├── index.html
│   ├── docs/index.html
│   └── style.css
├── scripts/
│   ├── setup.sh                # One-time setup (builds lab git repo + shared dirs)
│   ├── prepare-agents.sh       # Create git worktrees per agent
│   ├── setup-cron.sh           # Register 6 cron jobs (all times configurable)
│   ├── status.sh               # Per-worker stats + git tree
│   └── results.sh              # Cross-agent leaderboard from attempts/ JSON
└── references/
    ├── onboarding.md           # Guided setup (pitches defaults, asks for changes)
    ├── program.md              # Worker loop (git-aware, skills-reading, two-phase budget)
    ├── director.md             # Director cron (Compass Reset, reads attempts/)
    ├── leisure.md              # Leisure cron (structured notes, skill extraction)
    ├── synthesizer.md          # Synthesizer cron (distills knowledge into skills/)
    ├── dawn.md                 # Dawn cron (reads synthesizer output)
    ├── watchdog.md             # Watchdog cron (reads attempts/)
    ├── digest.md               # Digest cron (reads attempts/, notes/, skills/)
    ├── clawrxiv.md             # ClawRxiv integration (optional auto-publishing)
    └── templates/
        ├── architecture.md
        ├── optimizer.md
        ├── regularization.md
        └── general.md

Runtime state (not in repo):

~/.litmus/
├── repo/                       # Shared lab git repo (all agent branches)
├── harness/                    # karpathy/autoresearch clone
├── agents/
│   ├── arch-1/                 # Git worktree on branch litmus/agent-arch-1-<date>
│   ├── opt-2/                  # Git worktree on branch litmus/agent-opt-2-<date>
│   └── gen-3/
└── shared/
    ├── attempts/               # JSON per experiment
    ├── notes/                  # Structured YAML-frontmatter notes
    ├── skills/                 # Validated technique library
    └── [flat .md files]

SCRIPTS

SCRIPTUSAGE
setup.shOne-time. Clones harness, builds lab git repo at ~/.litmus/repo/, installs deps, downloads data, creates shared dirs.
prepare-agents.sh--agents N --templates a,b,c --time-budget 300 — creates git worktrees (not rsync copies)
setup-cron.sh--timezone TZ [--leisure-start HH:MM] [--synthesizer-time HH:MM] [--dawn-time HH:MM] [--digest-time HH:MM] [--director-hours N] [--watchdog-minutes M]
status.shNo args. Per-worker stats from attempts/ + lab git tree.
results.sh--top N [--agent id] [--focus area] — leaderboard from attempts/*.json

REFERENCE FILES

Loaded by cron agents and workers as needed — not bundled into SKILL.md.

FILELOADED BYPURPOSE
onboarding.mdMain agent (setup)Guided onboarding — pitches defaults, offers timing presets, asks for changes
program.mdEach research workerWorker loop — git-aware, reads skills/, two-phase budget, attempt JSON, debrief notes
director.mdlitmus-director cronCompass Reset, reads attempts/, cross-pollination, synthesis trigger
leisure.mdlitmus-leisure cronStructured notes to notes/moonshots/, skill extraction, attempt-based contradiction analysis
synthesizer.mdlitmus-synthesizer cronFull distillation — attempt analysis, skill extraction, combination matrix, research agenda
dawn.mdlitmus-dawn cronReads synthesizer output first; queues experiments; updates workers on new skills
watchdog.mdlitmus-watchdog cronReads attempts/ for improvement rate; escape mode; liveness check
digest.mdlitmus-digest cronReads attempts/, notes/, skills/, git tree snapshot; delivers research narrative
templates/*.mdprepare-agents.sh → program.mdResearch focus overlays injected into each worker
clawrxiv.mdWorkers (on global best or leisure)ClawRxiv API reference — registration, publish call, paper structure templates, error handling