$ cat my-ai-handles-ops-so-i-dont-have-to.md

April 3, 2026

My AI handles ops so I don't have to

Last week I read an article about what someone was doing with their NanoClaw agent. Every night, it runs a self-reflection loop. It scores its own performance, writes lessons learned, and builds new skills from what it discovers. The AI gets smarter while you sleep.

I read that and my first thought was I want that.

I already have a pretty elaborate setup. Claude Code with persistent memory, session logs, a Neo4j knowledge graph, full-text search over months of meeting transcripts. I've been building this infrastructure for a while now, stitching things together with bash and Python and cron jobs across two machines. It works. So the natural question was whether a reflection loop would be the thing that takes it from "works" to "works and gets better on its own."

I did what any reasonable person would do. I asked Claude directly. "Hey, given everything you have access to right now, what gaps would a self-reflection loop actually fill?"

Claude's response was, paraphrasing, "mostly none." Now these AI's tend to be super confident much like a teenager who is sure their parents know nothing and will never understand them. So I asked it to do an audit to prove this claim.

The memory system already handles session-to-session continuity. When I close a terminal and open a new one tomorrow, Claude can read the session logs and pick up where we left off. The knowledge graph handles deep context. If I need to pull up a pricing conversation from two months ago, the search layer finds it. The things a reflection loop is supposed to fix (forgetting what happened, losing context, not learning from mistakes) were already covered by infrastructure I'd built piece by piece over months.

But then Claude said something that stuck with me. The real gaps weren't about intelligence. They were about operations.

Things break silently. A cron job fails at 3am and I wouldn't know until I happen to check the logs three days later. A watchdog script crashes and the service it was monitoring keeps running fine, so there's no symptom. Lock files pile up from failed jobs. Memory entries go stale because nobody reviews them. The whole system works, until one piece quietly stops working, and everything downstream from it degrades without any obvious signal.

I'd been asking the wrong question. I didn't need an AI that reflects on itself. I needed an AI that takes out the trash.

When I mapped "nightly self-reflection loop" onto my actual problems, the self-reflection part was the wrong thing to copy. What I actually needed was the nightly part. A scheduled agent that wakes up, looks around, notices what's broken, fixes what it can, and tells me about the rest.

Less navel-gazing, more keeping the lights on.

This is a distinction that matters if you're building with AI tools and trying to figure out where to invest your time. The version for clicks and likes is "autonomous AI that learns and improves itself." The useful version is "automated AI that checks whether your stuff is still running." I went with useful (probably why I'll never be an influencer).

The system has three layers, and the boring ones matter most.

The first layer was already running. I had watchdog scripts monitoring things like my Neo4j database and transcript pipeline. They'd send Slack alerts when something died. What they didn't do was write anything down. So I gave them a side effect. Every time a watchdog detects something (a service down, a backup that didn't run, a job that timed out), it appends a line to a shared signals log. Just a JSONL file. One line per event, timestamped, with a category and a message. Dead simple.

The second layer filled the gaps I'd been ignoring. Service health probes that check whether my search index and knowledge graph are actually responding, not just running. Pipeline failure detection that catches things like "the transcript pipeline ran but produced zero output." And a SessionEnd hook that fires when Claude Code exits and captures a quick summary of what happened in that session. All of this writes to the same signals log.

The third layer is the actual agent. It runs at 5am as a scheduled Claude Code session. It reads the signals log, checks service health directly, reviews memory entries for staleness, and then does what it can. Safe stuff. It clears orphaned lock files, restarts crashed services, retries failed pipeline jobs. Anything it can't safely handle on its own gets escalated into a Telegram message that I read with my coffee.

That's it. That's the whole thing.

I built this with a few principles I've learned the hard way from running 20+ scheduled jobs across two machines.

The agent is an observer that can do safe fixes. It can restart a crashed service. It can't delete data, can't modify configurations, can't decide to "improve" something. The line between "helpful automation" and "rogue process that ate my database" is drawn at destructive operations, and the agent stays on the safe side of it.

The monitoring is event-driven and cheap. The watchdog scripts are plain bash. They run on cron, check a thing, write a line to a file. No LLM calls, no API costs. The expensive part (Claude actually reading and reasoning about the signals) only happens once a day, during the 5am synthesis. You could even run the detection layer on a Raspberry Pi.

And the whole thing layers on top of what I already had. I didn't replace my watchdogs. I didn't rebuild my pipeline monitoring. I just gave everything a common place to write signals, and added a nightly job that reads them. The existing infrastructure kept working exactly as it was. It just gained a signal bus.

the boring stuff is the stuff that works

I think there's a pattern here that goes beyond my specific setup.

The most useful AI infrastructure I've built isn't the knowledge graph. It's not the search layer or the session memory system. Those are great, and I use them constantly, but they're not what keeps the whole operation from falling apart. What keeps it running is a bash script that writes a JSON line to a file when something breaks, and a nightly job that reads those lines and tells me what happened.

That's not impressive. Nobody's writing "though leadership" posts about JSONL files and cron jobs. But it's the piece that was actually missing, and it's the piece that makes everything else reliable enough to trust.

My stack right now is 20-something scheduled jobs across two machines, a knowledge graph, vector search (well, Typesense now, probs should write about that), meeting transcript pipelines, content mining, Slack integrations, session logging. All of it stitched together with bash, Python, and cron. The ops agent doesn't make any of that smarter. It just makes sure it keeps running. The AI that handles your operations isn't going to win any innovation awards. But it's the one you'll actually notice when it's not there.

LIKED THIS?

I write about AI in plain English every other Sunday. No hype, no jargon — just the stuff that actually helps.

I'M IN →

← Back to the blog