I built a platform engineer because I couldn't be arsed being one. 4 months of operational scars across 11 agents, 91 cron jobs, and 7 Telegram groups — packaged into one command.
OpenClaw agents are bloody good at brute-forcing work. But for the big stuff — upgrades across breaking changes, debugging a 46-job cron cascade, auditing fleet-wide health — they can't see the whole board.
Over 4 months, I kept solving the same operational problems. Upgrade gotchas. Cron failures with shared root causes. Context bloat creeping into agent workspaces. Eventually the knowledge compounded into something reusable — and Cyclawps became a product by accident.
These are real operations from a live fleet. Not demos — actual production runs with real numbers.
v2026.3.24 to v2026.4.2. Three intermediate releases, each with infrastructure-specific risks. Cyclawps mapped every release against the live config, identified 3 critical risks before they hit, and orchestrated the whole thing.
46 failing jobs. Cyclawps ran automated probes, categorized failures by type, and found 25 of them sharing a single root cause — skills bypassing message delivery in isolated cron sessions. One fix, single-digit errors after.
Context bloat: 117KB stray file in one workspace, 224KB in another. Domain leakage across agent memory files — advisory data showing up in platform agent context. Information staleness affecting decision quality. The kind of drift that doesn't trigger alerts but quietly degrades everything.
Type /cyclawps, talk to your platform engineer. Ask anything about your instance.
Install via npm. Get tools, CLI commands, and health checks integrated into your OpenClaw instance.
Runs on a heartbeat. Monitors your instance, catches drift, and alerts you when something needs attention.
Everything learned from running an 11-agent production instance across 4 months, packaged as references.
Every gotcha was learned the hard way. These are real incidents from a live 11-agent fleet.
Auto-updater left stale plugin paths. Gateway entered a crash loop. Config validation caught it -- eventually. Now it catches it before deploy.
Skills calling the message tool internally fail in isolated cron sessions. Entire automation fleet went offline. One root cause, 46 casualties.
66 symlinks silently failed. Every agent responded as a generic assistant for 2 days before anyone noticed. Identity files must be real files, not symlinks.
65K chars of tool results pushed real data out of the attention window. Reports showed 0 customers when the actual count was 16. Context window management is not optional.
Outbound handler failed to register after a cold start. Error caching made it permanent. All channels went silent until a full gateway restart.
30+ incidents documented. 43 gotchas extracted. Zero left to surprise you.
/cyclawps,Free. Open source. MIT licensed. No account, no API key, no sign-up.
Or ask your agent: "Set up cyclawps for me"