$ cat your-ai-skills-are-rotting.md
Your AI skills are rotting, let's fix that

Apparently, I'm like a crow. Instead of collecting baubles and bits, I collect skills. I had created over 25 custom skills in Claude Code. Built them over a few months as I needed them. Things like a meeting recorder, a memory system, a session handoff tool, image generators, blog publishers, transcript processors. Each one solved a real problem when I wrote it.
Then I actually looked at them. At least six were dead weight. My meeting recorder skill was still rigged up for BlackHole audio routing and manual Whisper transcription. A workflow I'd completely replaced three months ago with an automated pipeline. I had a custom memory system that Claude Code now handles natively. A session handoff tool that made more sense when context windows were smaller.
The problem wasn't building too many skills. The problem was having no feedback loop. I never knew which skills I actually reached for and which ones I'd built once and forgotten.
Measuring what gets used
The fix turned out to be small. Claude Code has a hooks system where you can run commands before or after any tool executes. I added a PostToolUse hook that fires every time a skill runs. It logs one line of JSON: the skill name, a timestamp, and which project I was in.
{"skill":"publish","ts":"2026-03-28T14:32:01Z","cwd":"/Users/lauren/blog"}
{"skill":"save-transcript","ts":"2026-03-28T15:01:44Z","cwd":"/Users/lauren/ai-brain"}
That's it. One line per invocation, appended to a file, async so it doesn't slow anything down. After a week you can run jq against it and see exactly what's earning its keep.
Raw usage data is useful but it's also just counting. The more interesting question is about the skills. What should be a skill that isn't one yet? And what skills exist that duplicate something else?
So I built a second piece. It's a weekly agent that runs every Friday afternoon. It's a claude -p one-shot session that reads the usage log and all existing skill files, checks git history for repeated patterns, and writes a review. The review includes a usage table, flags stale skills, identifies gaps, and if the signal is strong enough drafts a new skill candidate. Like I said, I'm a crow.
The first run had zero usage data (the hook had just gone in) and it still found useful things. It flagged six stale skills by cross-referencing what each skill does against what tools and pipelines I actually have running now. It noticed three skills that worked fine but weren't registered as slash commands, so I could never invoke them. And it drafted a health-check skill for my scheduled jobs that I immediately wanted.
What surprised me
The agent didn't just find unused skills. It found replaced skills. What I mean is it found things that were doing real work three months ago but got quietly superseded. My meeting recorder wasn't broken. It worked perfectly. I just stopped using it because something better came along, and the old one sat there taking up space in the skill list and adding noise every time Claude tried to figure out which skill matched what I was asking for.
That's the part that matters for anyone managing a growing set of custom tools. Skills don't announce their own retirement. They just stop being relevant, and unless you're measuring, you won't notice until the list is long enough to be confusing.
The whole thing is three files:
- A hook config in
settings.jsonthat logs skill invocations to a JSONL file - A prompt file that tells the review agent what to analyze and how to report
- A shell script that runs
claude -pwith that prompt on a schedule
I published the config and prompt as a gist: skill quality tracking for Claude Code. Copy the hook, save the prompt, schedule the runner however you handle recurring tasks. The first review runs fine even with no usage data. It audits what you have against what it can infer from your codebase.
After a few weeks of usage data, the reviews get sharper. You start seeing actual patterns. Now I can remain a crow, just a better organized one.
$ _