AI Engineering, Agent Frameworks10 min read

Inside Hermes Agent: How Self-Improving Skills Work

How Hermes Agent turns finished sessions into reusable skills, using a background review agent, on-demand skill memory, and a four-layer memory system.

Inside Hermes Agent: How Self-Improving Skills Work

Inside Hermes Agent: How Self-Improving Skill Memory Works

TL;DR: Hermes Agent, built by Nous Research, saves a successful workflow as a reusable skill after a session, then reloads it when a similar task appears. A background review agent inspects each finished session and writes or patches skills through the skill_manage tool. Skills live as Markdown files under ~/.hermes/skills/ in the agentskills.io format. A four-layer memory system keeps facts, past sessions, skills, and user modeling separate. Progressive disclosure loads only a skill's name until the full file is needed, so token cost stays roughly flat even with hundreds of skills.

Key Takeaways

  • Hermes Agent's self-improvement runs on a background review agent that inspects each finished session and updates the skill library, so learning happens without the user asking for it.
  • The review agent is biased toward action. Its prompt tells it that most sessions produce at least one skill update, even a small one.
  • A session is worth saving when it shows a non-trivial technique, a fix or workaround, a recovery from an error, or an explicit correction from the user.
  • Skills are stored as Markdown in ~/.hermes/skills/ following the agentskills.io standard, and the skill_manage tool supports create, edit, patch, and write_file operations.
  • Hermes prefers patch over a full rewrite when updating a skill, which keeps each edit small and lowers the risk of breaking a skill that already works.
  • Memory is split into four layers (prompt memory, session search, skills, and the Honcho user model) so a single context never has to hold everything at once.
  • Progressive disclosure keeps only skill names and one-line summaries in the system prompt; the full SKILL.md loads on demand through skill_view, so token use stays roughly constant as the skill count grows.

What makes Hermes Agent self-improving?

Hermes Agent is an AI agent built by Nous Research [1]. Most assistants answer a request and forget it. Hermes does the same job, then looks back at how it solved the task and saves the useful part as a skill it can run again [2].

That replay is the whole point. The next time a similar task shows up, the agent does not start from a blank slate. It loads the skill it wrote earlier and follows steps it has already proven work. The rest of this article traces that loop through the source code: how a session gets reviewed, when a skill is created, where skills and other memory live, and how a skill loads back into context without costing much on every turn.

How does the background review loop work?

The self-improvement runs on a loop that touches every session [2]. The part worth noticing is that the agent does not wait for you to tell it what it learned. After a turn ends, Hermes starts the review on its own.

In the code, this lives in background_review.py, in a function called spawn_background_review [3]. When a turn finishes, the system launches a daemon thread. That thread copies the current session context and starts a separate review agent in the background, so the review never blocks the conversation you are having.

The review agent gets a direct instruction. Here is part of _SKILL_REVIEW_PROMPT [3]:

Review the conversation above and update the skill library. Be ACTIVE: most sessions produce at least one skill update, even if small. Signals to look for: Non-trivial technique, fix, workaround, debugging path, or tool-usage pattern emerged that a future session would benefit from. Capture it.

So the review agent decides, on its own, whether the session produced a new workflow worth keeping, corrected an earlier mistake, or turned up a new way to use a tool.

How does Hermes decide a skill is worth saving?

The review agent does not save everything. It saves when the session crossed a bar [2]. The usual triggers are a task that took more than five tool calls, a recovery from an error, or an explicit correction from the user.

When the bar is met, the agent calls a built-in tool named skill_manage and writes the skill itself [2]. No human approves it. The skill is a Markdown file under ~/.hermes/skills/, in the open agentskills.io format [2]. Each file holds a name, a description, and the concrete steps and tool calls the agent should run next time.

skill_manager_tool.py shows what skill_manage can do [4]:

  • create writes a new skill, including its SKILL.md and directory structure.
  • edit rewrites the contents of an existing skill.
  • patch makes a targeted find-and-replace inside SKILL.md or a support file.
  • write_file adds or overwrites a support file, such as a reference doc, template, or script.

Skills are not frozen once written. If the agent loads a skill in a later session and finds a better path, or hits an error the skill did not predict, it can update the skill mid-task [2]. It prefers patch over edit here, because a small find-and-replace is safer and cheaper than rewriting the whole file [2]. The review prompt sets the same priority order: update the skill that is already loaded first, then an existing skill in the right category, and only then create a new one [3].

Why does Hermes split memory into four layers?

Putting everything an agent knows into one large context makes it slower and more prone to hallucination. Hermes avoids that by keeping four kinds of memory in four different places [2], each with its own trigger and job.

The split maps onto a simple idea. Facts go in prompt memory, what happened goes in session search, how to do things goes in skills, and who the user is goes in Honcho. Keeping them apart is what lets the next piece, progressive disclosure, keep the context window small.

How does Hermes reproduce a skill the same way every time?

Two things make the replay reliable. The agent loads the right skill at the right time, and the instructions around skills are strict enough that different models still behave the same way.

Start with loading. The agent never puts every skill's full text in the prompt, since that would burn the token budget fast [2]. The system prompt carries only each skill's name and a one-line summary [2]. When the agent judges a skill relevant to the task, it calls skill_view to read the full SKILL.md and any support files. Because the heavy content loads only when needed, token use stays roughly flat whether the agent has ten skills or a few hundred [2].

The strict part lives in prompt_builder.py. The system prompt is assembled in three layers: stable, context, and volatile [5]. The stable layer carries a hard rule about saving skills, called SKILLS_GUIDANCE [5]:

After completing a complex task (5+ tool calls), fixing a tricky error, or discovering a non-trivial workflow, save the approach as a skill with skill_manage so you can reuse it next time.

prompt_builder.py also injects model-specific operational guidance for providers such as OpenAI, Google, and Alibaba [5]. This sets execution discipline, like verifying before an edit and not inventing details. The result is that a skill runs the same way even when the underlying model changes.

What can you take from Hermes Agent's design?

Hermes Agent's self-improvement is not magic. It comes down to a few plain engineering decisions stacked together.

A background agent reads each finished session and pulls out the part worth keeping, without bothering the user. That part gets written as a structured skill with a description, steps, and support files, in a format other tools can read. Memory is split by type so no single context has to hold everything, and skills load only when needed, so the cost of owning many skills stays low. The agent can also patch its own skills when it learns something new mid-task.

The thread running through all of it is one move: turn the log of what happened into a reusable method for next time. That is why a Hermes Agent used for months is more capable than one booted up an hour ago. It has written itself a library.

FAQ

What is Hermes Agent?

Hermes Agent is an AI agent from Nous Research that improves itself over time. After a session it can save the workflow it used as a reusable skill, then load that skill again when a similar task comes up. The skills accumulate, so the agent gets more capable the more it is used.

How does Hermes Agent save a skill automatically?

A background review agent runs after each session, started by spawn_background_review in background_review.py. It reads the conversation, and if the session shows a non-trivial workflow, a fix, an error recovery, or a user correction, it calls the skill_manage tool to write a skill as a Markdown file under ~/.hermes/skills/. No human has to approve it.

Why does Hermes Agent prefer patch over edit when updating a skill?

A patch is a targeted find-and-replace, so it changes only the lines that need to change. A full edit rewrites the whole SKILL.md, which is slower and more likely to break a skill that already works. The review prompt also tells the agent to update the loaded skill first before creating a new one.

How does Hermes Agent keep token costs flat as it gains more skills?

Through progressive disclosure. The system prompt holds only each skill's name and a one-line summary, not its full text. The agent loads the complete SKILL.md with skill_view only when a skill is relevant, so adding more skills barely changes the per-turn token cost.

Where does Hermes Agent store its skills and memory?

Skills are Markdown files under ~/.hermes/skills/ in the agentskills.io format. Other memory is split across MEMORY.md and USER.md for global facts, a SQLite database with FTS5 for searchable session history, and the Honcho platform for an evolving model of the user.

References

[1] Nous Research. Hermes Agent Docs. https://hermes-agent.nousresearch.com/docs/

[2] Mr. Ånand. Inside Hermes Agent: How a Self-Improving AI Agent Actually Works. https://mranand.substack.com/p/inside-hermes-agent-how-a-self-improving

[3] Nous Research. hermes-agent/agent/background_review.py. https://github.com/NousResearch/hermes-agent/blob/main/agent/background_review.py

[4] Nous Research. hermes-agent/tools/skill_manager_tool.py. https://github.com/NousResearch/hermes-agent/blob/main/tools/skill_manager_tool.py

[5] Nous Research. hermes-agent/agent/prompt_builder.py. https://github.com/NousResearch/hermes-agent/blob/main/agent/prompt_builder.py

Subscribe to the newsletter

By subscribing, you agree to our Terms of Service and Privacy Policy.

About the Author

Aaron is an engineering leader, software architect, and founder with 18 years building distributed systems and cloud infrastructure. Now focused on LLM-powered platforms, agent orchestration, and production AI. He shares hands-on technical guides and framework comparisons at fp8.co.

Cite this Article

Aaron. "Inside Hermes Agent: How Self-Improving Skills Work." fp8.co, June 16, 2026. https://fp8.co/articles/Inside-Hermes-Agent-Self-Improving-Skill-Memory

Related Articles

OpenClaw vs Hermes: Context Compression Cuts Cost 75%

See how two top AI agents cut token costs ~75% using prompt caching, frozen memory, and 5-phase context compression — with real source code.

AI Engineering, Agent Frameworks

How to Build Claude Code Skills: 5 Examples (2026)

Build custom Claude Code Skills with 5 ready-to-use examples. Covers SKILL.md spec, security controls, plugin distribution, and team sharing workflows.

AI Development Tools, Developer Productivity, Claude Code

Context Engineering for AI Agents: 6 Techniques That Cut Our Costs 10x

One misplaced timestamp invalidated our entire KV cache and 10x'd our bill. Here are 6 context engineering patterns from Manus and production agent teams that prevent exactly this -- with code examples.

AI Engineering, Agent Frameworks

AI Agent Memory: Why Binding Matters More Than Recall

Discover why AI agent memory fails at binding, not recall. 500+ experiments reveal architecture patterns that fix context-action gaps.

AI Engineering, Agent Frameworks