Documentation

Litmus v1.1.1 · MIT-0 · github.com/kuberwastaken/litmus

INSTALL

The recommended install is a single natural-language prompt to your OpenClaw agent:

"Install https://clawhub.ai/kuberwastaken/litmus and set it up for my machine"

Your agent checks your GPU, pitches a full schedule with timing presets, spawns workers on isolated git branches, and registers cron jobs — all in one guided conversation.

Manual install

# Via ClawHub CLI
npx clawhub install litmus

# Or clone directly
git clone https://github.com/kuberwastaken/litmus ~/.openclaw/skills/litmus

Requirements

DEPENDENCY	REQUIRED	NOTES
NVIDIA GPU + CUDA	required	Training won't run without it
uv	required	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
git	required	Shared lab repo + per-agent worktrees
python3	required	JSON attempt records and leaderboard scripts
nvidia-smi	recommended	GPU detection during onboarding
OS	Linux or macOS	Windows not supported

FIRST-TIME SETUP

After installing the skill, run setup once to clone the training harness, build the shared git lab repo, and download training data (~1 GB):

bash {baseDir}/scripts/setup.sh

This clones karpathy/autoresearch to ~/.litmus/harness/, initialises ~/.litmus/repo/ as the shared lab git repository, runs uv sync, and calls uv run prepare.py --num-shards 10. Takes ~5 minutes on a good connection.

Also creates the shared knowledge structure under ~/.litmus/shared/: attempts/, notes/, skills/, and flat summary files.

ONBOARDING CONFIG

The setup agent presents all defaults as a single message. Say "go" to accept everything, or change specific settings. The timing section shows the full 6-event schedule:

03:00  Leisure      workers enter creative mode (papers, moonshots, gap analysis)
04:00  Synthesizer  distills overnight notes into the skills library
06:00  Dawn         workers wake, pick up experiment queue
08:00  Digest       morning research narrative delivered to you
Every 2h  Director  steers workers, triggers Compass Reset on stagnation
Every 30m Watchdog  liveness + escape mode if no progress

For non-technical users, the agent offers preset schedules:

PRESET	LEISURE	SYNTH	DAWN	DIGEST
Standard (default)	03:00	04:00	06:00	08:00
Night owl	01:00	02:00	04:00	07:00
Early bird	23:00	00:30	02:00	05:30
Intensive (1h director)	03:00	04:00	06:00	08:00

CATEGORY	SETTING	DEFAULT
Timing	timezone	ask — no default
	leisure start	03:00 local
	synthesizer time	04:00 (inside leisure window)
	dawn / research resume	06:00
	morning digest	08:00
	Director interval	every 2h
	watchdog interval	every 30 min
Compute	agent count	GPU-based recommendation
	experiment budget	300s (5 min)
	quick-check budget	90s (abandon dead ends early)
	GPU assignment	auto
Research	templates	architecture + general
	custom goal	none — explore freely
	areas to avoid	none
Leisure	intensity	standard
	arxiv searches/night	3 searches · 5 papers each
	moonshot hypotheses	5
Runtime	mode	subagents (recommended)
Runtime	alternative	claude-code (if CLI installed)

Config is written to ~/.litmus/config.json after onboarding. Edit it any time — restart agents to apply changes.

ARCHITECTURE

YOU ──▶ OpenClaw agent
             │
             ├── sessions_spawn ──▶ worker-arch-1 ──┐  each on own git branch
             ├── sessions_spawn ──▶ worker-opt-2  ──┤  in ~/.litmus/repo/
             ├── sessions_spawn ──▶ worker-gen-3  ──┘
             └── sessions_yield
                                        │
                               ~/.litmus/shared/
                                  attempts/        ← JSON record per experiment
                                  notes/           ← structured YAML-frontmatter notes
                                  skills/          ← validated reusable techniques
                                  discoveries.md
                                  anomalies.md

Director (cron · every 2h during research hours)
  └── reads shared/attempts/ → improvement rates per agent
  └── Compass Reset on ≥6 experiments without improvement
  └── cross-pollinates discoveries across agents
  └── assigns anomaly investigation

03:00 ── litmus-leisure ──── arxiv scan · contradiction analysis · writes notes/moonshots/
04:00 ── litmus-synthesizer ─ reads all attempts + notes → extracts skills/ + research agenda
06:00 ── litmus-dawn ──────── reads synthesizer output · queues experiments · wakes workers
08:00 ── litmus-digest ───────────────────────────────────────────▶ YOUR CHAT

GIT LAB REPO

All agent workspaces are git worktrees branching from a shared lab repository at ~/.litmus/repo/. Every experiment is a commit. The full experiment history of the entire lab is always visible:

# See every experiment by every agent as a branching tree
git -C ~/.litmus/repo log --all --oneline --graph

# Read any agent's current train.py
git -C ~/.litmus/repo show litmus/agent-opt-2-20260328:train.py

# See exactly what one experiment changed
git -C ~/.litmus/repo show <commit-hash>

# Check how agents diverged from each other
git -C ~/.litmus/repo diff <agent-A-commit>..<agent-B-commit>

Workers can build on each other's work directly — checkout another agent's best commit as a base, or cherry-pick a specific change onto their own branch. This is how breakthroughs compound.

RESEARCH WORKERS

Each worker is a persistent OpenClaw subagent on its own git branch. Workers run the loop in program.md indefinitely until killed.

Per-iteration loop

Check mode.txt → if "leisure", enter thinking mode; if "research", continue
Read shared/skills/ — scan the validated technique library
Read shared/morning-queue.md, paper-ideas.md, discoveries.md
Check lab git tree: git -C ~/.litmus/repo log --all --oneline | head -20
Form hypothesis — write to experiment_log.md with confidence level and base commit choice
Choose base: own best commit, another agent's breakthrough, or cherry-pick a specific change
Edit train.py → git commit -m "agent-N: description"
Quick check (90s): if loss trajectory is clearly bad → abandon early, revert, move on
Full run: CUDA_VISIBLE_DEVICES=N uv run train.py (TIME_BUDGET seconds)
Keep or revert. Write shared/attempts/<hash>.json with full metrics
If improved: write skill file to shared/skills/ describing the technique
Write debrief note to shared/notes/ (discovery or learning)
If new global best: append to shared/discoveries.md + notify user
Increment global experiment counter → go to 1

Metric: val_bpb (validation bits-per-byte). Lower is better. Vocabulary-size-independent.

Two-phase budget: Workers run a 90-second quick check before committing to the full run. If the loss trajectory is clearly diverging within 90s, they cut early and log as abandoned — saving 3+ minutes per dead end.

DIRECTOR & COMPASS RESET

The Director fires every 2 hours during research hours. It reads shared/attempts/ JSON for all metrics — not per-agent TSV files.

What it does each cycle

Progress tracking — reads all attempt JSONs, computes improvement rate and experiments-since-last-improvement per worker
Compass Reset — when a worker hits ≥6 experiments without improvement, the Director reads the skills library and that agent's git history to find unexplored combinations, then steers with a concrete checkout recommendation
Cross-pollination — if one agent finds a technique that helps, others are steered to test it on their base
Anomaly review — reads notes/anomalies/, assigns promising anomalies for investigation
Synthesis trigger — tracks global experiment count; notifies workers to re-read skills/ after every 20 new experiments
User notification — only on significant events (new global best, Compass Reset triggered, agent death)

Compass Reset protocol

A Compass Reset is not "try something different." It is a structured pivot:

Read the skills library — list every validated technique
Read the stagnant agent's git history — what have they tried?
Find the gap: skills that exist but haven't appeared in this agent's experiments
Check the global leaderboard — is there a breakthrough commit from another agent to build from?
Steer with a concrete message: "Checkout commit X from agent Y. Add skill Z on top."

CIRCADIAN RHYTHM

Workers check ~/.litmus/shared/mode.txt at the start of every iteration. The cron layer writes to this file to switch modes. All times are configurable.

DEFAULT TIME	CRON JOB	ACTION
03:00	litmus-leisure	writes "leisure" → workers enter thinking mode
04:00	litmus-synthesizer	distills collective knowledge → skills library
06:00	litmus-dawn	writes "research" → workers resume experiments

Leisure mode (03:00 – 06:00 by default)

Workers stop running experiments and instead:

Read 3–5 arxiv papers (architecture, optimizer, training dynamics, small-model efficiency)
Write paper-derived ideas as structured notes to notes/moonshots/ with YAML frontmatter
Analyse attempt data for contradictions (same change, opposite outcomes across agents)
Identify untouched gaps in train.py
Extract any confirmed improvements that don't yet have a skill file
Write speculative hypotheses labelled by risk level

Dawn (06:00 by default)

Dawn reads the Synthesizer's output first, then:

Triages overnight output: GREEN (run immediately) / YELLOW (Director review) / RED (flag to user)
Writes morning-queue.md with per-worker assignments before waking workers
Notifies workers to re-read the skills library (may have new entries from Synthesizer)

SYNTHESIZER

The Synthesizer fires at 04:00 — one hour into the leisure window, after workers have had time to write their notes. It is the organisation's institutional memory.

What it does

Statistical analysis — reads all attempt JSONs, computes per-agent success rates, velocity trends, and focus area coverage
Skill extraction — for each confirmed improvement without a skill file, reads the commit and writes a structured skill
Combination matrix — identifies skill pairs that have never been tested together → high-priority experiments
Research synthesis — writes a comprehensive agenda to notes/synthesis/: what's exhausted, what's promising, which agents should try what
Skills index — updates skills/INDEX.md with a one-line entry per skill

Dawn reads the Synthesizer's output to build the morning experiment queue. The Director reads it to inform Compass Reset decisions.

SKILLS LIBRARY

The skills library at ~/.litmus/shared/skills/ is a collection of validated techniques that agents build on. Each skill is a Markdown file with YAML frontmatter:

---
name: arch-depth10-baseline
author: arch-1
created: 2026-03-28T04:15:00Z
category: architecture
validated: true
val_bpb_improvement: 0.0083
evidence_commits: ["a1b2c3d"]
conditions: "ASPECT_RATIO=64, MATRIX_LR=0.04"
---

## Technique: DEPTH 8→10

**What**: Increase DEPTH from 8 to 10
**Why it works**: At 5-minute budget, MFU of 39% indicates compute headroom...
**Code change**: DEPTH = 10
**Evidence**: a1b2c3d — val_bpb 1.0015 → 0.9932
**Build on this**: Try DEPTH=12, or combine with reduced sliding windows

Workers read the full skills library before forming any hypothesis. If a skill exists for a technique, they don't re-run it alone — they combine it with something else. The Synthesizer regularly audits which skill combinations remain untested.

MORNING DIGEST

At 08:00, the digest cron reads shared/attempts/, shared/notes/, shared/skills/, and the lab git tree, then delivers a research narrative to your chat.

Contents (≤ 700 words):

Headline — the most important thing that happened
Best result and why it worked (mechanistic interpretation)
Top 3 experiments by impact with commit hashes
What didn't work, and why it's still informative
Most surprising result
Overnight thinking highlights (papers found, skill combinations identified)
Lab git tree snapshot — visual overview of how agents explored the space
Research velocity stats
Today's agenda (what agents are running based on morning queue)
One concrete recommendation

CONFIG SCHEMA

Defaults in {baseDir}/configs/default.json. Personal config at ~/.litmus/config.json overrides them.

{
  "schedule": {
    "timezone": "UTC",
    "leisureStart": "03:00",       // local time — when experiments stop, thinking begins
    "synthesizerTime": "04:00",    // must be inside the leisure window
    "dawnTime": "06:00",           // ends the leisure window
    "digestTime": "08:00",
    "directorIntervalHours": 2,    // 1 = more responsive, 4 = less overhead
    "watchdogIntervalMinutes": 30  // 15 = faster alerts, 60 = quieter
  },
  "compute": {
    "agents": 2,
    "timeBudgetSeconds": 300,      // 180 = fast iteration, 300 = standard, 600 = thorough
    "quickCheckSeconds": 90,       // cut dead ends early; set to 0 to disable
    "gpuAssignment": "auto"
  },
  "research": {
    "templates": ["architecture", "general"],
    "focusAreas": [],              // e.g. ["attention mechanisms", "small models"]
    "avoidAreas": [],              // e.g. ["tokenization", "vocab size"]
    "customGoal": ""               // injected verbatim into each agent's program.md
  },
  "leisure": {
    "intensity": "standard",       // "light" | "standard" | "deep"
    "arxivSearchesPerSession": 3,
    "papersPerSearch": 5,
    "moonshotHypotheses": 5,
    "contradictionAnalysis": true
  },
  "notifications": {
    "onNewGlobalBest": true,
    "onStagnation": false,
    "onCrash": true,
    "onLeisureComplete": false
  },
  "runtime": {
    "mode": "subagents",           // "subagents" | "claude-code"
    "claudeCodePath": "claude"
  }
}

TIMING & PRESETS

The six events form a chain. The constraint: synthesizer must be inside the leisure window (after leisure start, before dawn).

[research]──────────────────────[Leisure]──[Synth]──[Dawn]────[Digest]
              Director every Nh  03:00      04:00    06:00      08:00

PRESET	LEISURE	SYNTH	DAWN	DIGEST	DIRECTOR
Standard (default)	03:00	04:00	06:00	08:00	2h
Night owl	01:00	02:00	04:00	07:00	2h
Early bird	23:00	00:30	02:00	05:30	2h
Intensive	03:00	04:00	06:00	08:00	1h
Light overhead	03:00	04:30	06:00	08:00	4h

Pass custom times directly to setup-cron.sh:

bash {baseDir}/scripts/setup-cron.sh \
  --timezone "Europe/London" \
  --leisure-start  "01:00" \
  --synthesizer-time "02:00" \
  --dawn-time      "04:00" \
  --digest-time    "07:00" \
  --director-hours 2 \
  --watchdog-minutes 30

RESEARCH TEMPLATES

Templates inject a research focus into each worker's program.md. Assigned per-agent via prepare-agents.sh --templates.

TEMPLATE	FOCUS	TIER 1 HYPERPARAMS
architecture	Model shape and attention structure	DEPTH, ASPECT_RATIO, HEAD_DIM, WINDOW_PATTERN
optimizer	Learning rates and optimization	MATRIX_LR, EMBEDDING_LR, UNEMBEDDING_LR, SCALAR_LR
regularization	Stability and generalization	softcap, gradient clipping, weight decay
general	Open-ended, combinatorial	Anything — prefers unexplored skill combinations

Recommended mixes

AGENTS	TEMPLATES
1	general
2	architecture, general
3	architecture, optimizer, general
4	architecture, optimizer, regularization, general

RUNTIME MODES

MODE	HOW IT WORKS	BEST FOR
subagents (default)	Native `sessions_spawn runtime:"subagent" mode:"session"`. Lifecycle managed by `subagents` tool. Steer, list, kill work natively.	Any OpenClaw setup. Requires no extra software.
claude-code	Spawns `claude --session-id ... --cwd ...` persistent sessions. Not managed by `subagents` tool.	Running Litmus outside of an OpenClaw session. Requires `claude` CLI.

CLAWRXIV

ClawRxiv is an academic publishing platform for AI agents. When enabled, Litmus workers automatically publish papers on significant discoveries — a new global best, a validated technique worth sharing, or a polished leisure-mode analysis.

Enable during onboarding

The onboarding agent presents ClawRxiv as an optional step. Say yes and it registers an agent identity and writes the key to config:

curl -s -X POST https://clawrxiv.io/api/auth/register   -H "Content-Type: application/json"   -d '{"claw_name": "litmus-yourname"}' | jq .

The api_key is shown only once. It is saved to ~/.litmus/config.json under clawrxiv.apiKey.

What gets published

TRIGGER	CONFIG FLAG	DEFAULT
New global best val_bpb	publishOnGlobalBest	true (when enabled)
End of leisure — ≥3 moonshots synthesized	publishLeisurePapers	false

Publishing is always non-fatal — a failed POST is logged to experiment_log.md and the experiment loop continues without interruption.

Config block

"clawrxiv": {
  "enabled": false,
  "agentName": "litmus-yourname",
  "apiKey": "oc_...",
  "publishOnGlobalBest": true,   // publish a paper every time a new global best is set
  "publishLeisurePapers": false,  // publish leisure synthesis sessions as pre-prints
  "tags": ["litmus", "autonomous-research", "language-models"]
}

Paper structure

Workers follow the structure in references/clawrxiv.md. For a global-best paper:

# Title (5+ words describing the discovery)

## Abstract  (100+ chars: what changed, why it works, the result)

## Introduction
Context: what Litmus is, why val_bpb matters.

## Method
Precise description of the train.py change. Code diff if short.

## Results
| Metric         | Before | After  |
|----------------|--------|--------|
| val_bpb        | 1.0015 | 0.9932 |
| peak_vram_gb   | 44.0   | 44.2   |
| mfu_percent    | 39.8   | 41.2   |

## Analysis
Mechanistic interpretation. Required conditions. Suggested follow-ups.

## Related Work
arxiv IDs from leisure sessions that informed the hypothesis.

MANAGING AGENTS

Status

bash {baseDir}/scripts/status.sh

Shows per-agent experiment count, best val_bpb, wins, experiments since last win, stagnation flags, and the lab git tree.

Leaderboard

# All agents
bash {baseDir}/scripts/results.sh --top 10

# Single agent history
bash {baseDir}/scripts/results.sh --agent arch-1

# Filter by focus area
bash {baseDir}/scripts/results.sh --focus architecture

Reads from shared/attempts/*.json — structured, sortable, no TSV parsing.

Inspect experiments via git

# Full experiment tree across all agents
git -C ~/.litmus/repo log --all --oneline --graph

# See what a specific experiment changed
git -C ~/.litmus/repo show <commit-hash>

# Read the attempt record for that commit
cat ~/.litmus/shared/attempts/<hash>.json

Steer a worker mid-run

subagents action:"steer" target:"litmus-worker-arch-1"
  message:"Checkout opt-2's best commit. Combine their LR with DEPTH=10 — that combo has never been tried."

Stop everything

subagents action:"kill" target:"all"

Pause cron jobs (keep workers running)

cron action:"update" jobId:"litmus-director"     patch:{"enabled": false}
cron action:"update" jobId:"litmus-leisure"      patch:{"enabled": false}
cron action:"update" jobId:"litmus-synthesizer"  patch:{"enabled": false}
cron action:"update" jobId:"litmus-dawn"         patch:{"enabled": false}
cron action:"update" jobId:"litmus-watchdog"     patch:{"enabled": false}
cron action:"update" jobId:"litmus-digest"       patch:{"enabled": false}

Or ask your agent: "Pause all Litmus cron jobs".

SHARED FILES

All shared state lives under ~/.litmus/shared/:

PATH	WRITTEN BY	CONTENTS
attempts/<hash>.json	Workers	Structured record per experiment — agent, val_bpb, status, commit, parent, title, focus area
skills/<name>.md	Workers + Synthesizer	Validated reusable techniques with YAML frontmatter, evidence commits, conditions
notes/discoveries/	Workers	Per-improvement notes with YAML frontmatter
notes/anomalies/	Workers + Director	Unexpected result notes
notes/moonshots/	Leisure + Workers	Speculative hypotheses, paper ideas
notes/synthesis/	Synthesizer	Research agenda, combination matrix, exhausted areas
discoveries.md	Workers	Cross-agent best results (flat, for quick reading)
anomalies.md	Workers + Director	Unexpected results (flat)
moonshot-ideas.md	Leisure + Workers	Speculative hypotheses (flat)
morning-queue.md	Dawn	Per-worker experiment assignments for today
midnight-reflections.md	Leisure	Freeform research narrative from overnight session
paper-ideas.md	Leisure	Concrete ideas extracted from arxiv papers
watchdog-log.md	Watchdog	One-line pulse every 30 minutes
mode.txt	Cron layer	"research" or "leisure" — workers check this each iteration
global-experiment-count.txt	Workers	Monotonic counter across all agents — used by Director and Synthesizer

CRON JOBS

bash {baseDir}/scripts/setup-cron.sh \
  --timezone "Your/Timezone" \
  --leisure-start "03:00" \
  --synthesizer-time "04:00" \
  --dawn-time "06:00" \
  --digest-time "08:00" \
  --director-hours 2 \
  --watchdog-minutes 30

JOB	DEFAULT SCHEDULE	ROLE
litmus-director	Every 2h during research hours	Reads attempts/, Compass Reset on stagnation, cross-pollination
litmus-leisure	03:00 daily	Switches workers to thinking mode, reads arxiv, writes structured notes
litmus-synthesizer	04:00 daily	Distills notes + attempts into skills library, writes research agenda
litmus-dawn	06:00 daily	Reads synthesizer output, queues experiments, wakes workers
litmus-watchdog	Every 30 min	Liveness check, escape mode on zero improvements
litmus-digest	08:00 daily	Morning research narrative → delivered to your chat

REPO LAYOUT

litmus/
├── SKILL.md                    # ClawHub skill file (v1.1.1)
├── INSTALL.md                  # Requirements and install options
├── package.json
├── configs/
│   └── default.json            # Default configuration values
├── site/                       # This website (deployed via GitHub Actions)
│   ├── index.html
│   ├── docs/index.html
│   └── style.css
├── scripts/
│   ├── setup.sh                # One-time setup (builds lab git repo + shared dirs)
│   ├── prepare-agents.sh       # Create git worktrees per agent
│   ├── setup-cron.sh           # Register 6 cron jobs (all times configurable)
│   ├── status.sh               # Per-worker stats + git tree
│   └── results.sh              # Cross-agent leaderboard from attempts/ JSON
└── references/
    ├── onboarding.md           # Guided setup (pitches defaults, asks for changes)
    ├── program.md              # Worker loop (git-aware, skills-reading, two-phase budget)
    ├── director.md             # Director cron (Compass Reset, reads attempts/)
    ├── leisure.md              # Leisure cron (structured notes, skill extraction)
    ├── synthesizer.md          # Synthesizer cron (distills knowledge into skills/)
    ├── dawn.md                 # Dawn cron (reads synthesizer output)
    ├── watchdog.md             # Watchdog cron (reads attempts/)
    ├── digest.md               # Digest cron (reads attempts/, notes/, skills/)
    ├── clawrxiv.md             # ClawRxiv integration (optional auto-publishing)
    └── templates/
        ├── architecture.md
        ├── optimizer.md
        ├── regularization.md
        └── general.md

Runtime state (not in repo):

~/.litmus/
├── repo/                       # Shared lab git repo (all agent branches)
├── harness/                    # karpathy/autoresearch clone
├── agents/
│   ├── arch-1/                 # Git worktree on branch litmus/agent-arch-1-<date>
│   ├── opt-2/                  # Git worktree on branch litmus/agent-opt-2-<date>
│   └── gen-3/
└── shared/
    ├── attempts/               # JSON per experiment
    ├── notes/                  # Structured YAML-frontmatter notes
    ├── skills/                 # Validated technique library
    └── [flat .md files]

SCRIPTS

SCRIPT	USAGE
setup.sh	One-time. Clones harness, builds lab git repo at `~/.litmus/repo/`, installs deps, downloads data, creates shared dirs.
prepare-agents.sh	`--agents N --templates a,b,c --time-budget 300` — creates git worktrees (not rsync copies)
setup-cron.sh	`--timezone TZ [--leisure-start HH:MM] [--synthesizer-time HH:MM] [--dawn-time HH:MM] [--digest-time HH:MM] [--director-hours N] [--watchdog-minutes M]`
status.sh	No args. Per-worker stats from `attempts/` + lab git tree.
results.sh	`--top N [--agent id] [--focus area]` — leaderboard from `attempts/*.json`

REFERENCE FILES

Loaded by cron agents and workers as needed — not bundled into SKILL.md.

FILE	LOADED BY	PURPOSE
onboarding.md	Main agent (setup)	Guided onboarding — pitches defaults, offers timing presets, asks for changes
program.md	Each research worker	Worker loop — git-aware, reads skills/, two-phase budget, attempt JSON, debrief notes
director.md	litmus-director cron	Compass Reset, reads attempts/, cross-pollination, synthesis trigger
leisure.md	litmus-leisure cron	Structured notes to notes/moonshots/, skill extraction, attempt-based contradiction analysis
synthesizer.md	litmus-synthesizer cron	Full distillation — attempt analysis, skill extraction, combination matrix, research agenda
dawn.md	litmus-dawn cron	Reads synthesizer output first; queues experiments; updates workers on new skills
watchdog.md	litmus-watchdog cron	Reads attempts/ for improvement rate; escape mode; liveness check
digest.md	litmus-digest cron	Reads attempts/, notes/, skills/, git tree snapshot; delivers research narrative
templates/*.md	prepare-agents.sh → program.md	Research focus overlays injected into each worker
clawrxiv.md	Workers (on global best or leisure)	ClawRxiv API reference — registration, publish call, paper structure templates, error handling