From 600 Notes to 3,500: Semantic Search for AI Agent Memory

Mesh Memory Demo

Updated March 22, 2026: added section on workspaces, document pinning, and weighted multi-workspace search.

Two weeks ago I fixed an authentication bug. Today I can't find the note about it.

I search for "login problems". Nothing. I search for "auth". Nothing, because the note says "fixed session token validation in middleware".

Grep is useless when you don't remember the exact words you used. But that's a small problem. The real one is bigger.

I run a dozen AI agents. Each one starts fresh every session. No memory of yesterday's decisions. No context from last week's architecture change. Every morning, my first 30 minutes go to re-pasting context that existed yesterday but vanished overnight. When AI agents became central to my workflow, memory stopped being optional. I needed infrastructure that remembers.

This is why I built Mesh.

How I got here

In December I built mem-cli -- a CLI tool backed by PostgreSQL. It worked but was rough: 600 documents, basic tagging, no auto-organization. Three months later, Mesh is a different system. 3,500+ documents, auto-tagging, project markers, version tracking. It's the memory layer for everything I build. And now it's open source.

The tipping point: automatic tagging

This is what changed my behavior.

When you save a document, Mesh adds tags automatically: date:2026-02-03, source:api. Then it looks at similar existing documents and infers type, topic, and project:

type:worklog -- because other similar notes are worklogs
topic:authentication -- because the content is about auth
project:mesh -- because the project marker matches

No manual tagging. No folder hierarchies. Just save and forget. Mesh organizes it for you.

The inference isn't perfect -- 85% accuracy. Sometimes it tags a debugging note as type:decision. Sometimes it infers the wrong project. I fix maybe 5 tags a day. That's 5 manual corrections vs 30 fully manual tags.

85% automatic is better than 100% manual when you're doing it 30 times a day.

This removed the last barrier to writing things down. Before Mesh: 5-10 notes per week. After: 30+. Not because I got more disciplined. Because the friction disappeared.

How semantic search finds notes by meaning

Instead of matching exact words, match meaning.

You write "fixed session token validation in middleware". Mesh converts it into a vector -- a numerical representation of what the sentence means. When you search for "login problems", that query also becomes a vector. If the vectors are close in meaning, it's a match.

# Search by meaning
curl -X POST localhost:8000/search \
  -d '{"query": "login problems"}'
# -> finds "fixed session token validation in middleware"

# Search by tag
curl localhost:8000/bytag/topic:authentication
# -> all authentication-related documents

The best note-taking system is the one where you don't have to remember how you wrote something.

Giving AI agents persistent memory across sessions

I run a dozen AI agents. Each one starts fresh every session. The first question is always: "what did we do last time?"

Without memory, every session is a blank slate. No context, no history, no decisions from yesterday.

Last week I opened Claude Code on a project I hadn't touched in three weeks. First thing it did was mesh search "recent decisions for this project". Five seconds later it had the full context: we'd switched from Redis to PostgreSQL for the queue, the migration was half done, and there was a known bug in the retry logic. Without that search, I would have spent ten minutes re-explaining everything.

When Brin (my routing agent) starts a new session, it runs mesh find <project-guid> and immediately has every decision, every bug, every architecture note. No re-asking. No re-pasting. The agent picks up exactly where it left off.

3,500+ documents indexed. Every agent has access to the full history of every project. Serving 50+ requests per minute with zero data loss in 3 months.

Link notes to projects with MEMORY.md markers

Every project on my server has a MEMORY.md file with a unique ID:

# Memory
guid: a1b2c3d4
created: 2025-12-29

When I save a note while working in that project directory, Mesh automatically links it to that project. Later, mesh find a1b2c3d4 shows the entire history: worklogs, decisions, research notes. Everything related to that project in one query.

I have 15 projects with markers. When I switch between them, the first thing I do is mesh find <guid>. Instant context. No manual bookkeeping.

From 600 to 3,500 documents: three months of production use

The first version (mem-cli) was a weekend project. 600 documents, basic CLI. What's different now:

Scale. 600 documents in December. 3,500+ now. Search still returns in under 100ms. PostgreSQL with pgvector handles this well.

Auto-tagging. Didn't exist in v1. I was tagging everything manually. Adding auto-inference cut my tagging effort by 80%.

Version tracking. Mesh can find earlier versions of a document by comparing vectors. When a decision changes, I can trace back to the original.

Multi-agent access. In December, only I used it through CLI. Now it's an MCP server that Claude Code, Brin, and other tools query directly. The API handles 50+ requests per minute without issues.

Why multilingual-e5-small

Every embedding is computed locally by multilingual-e5-small -- a 384-dimension model that runs on CPU. No OpenAI API key. No data leaving your network.

Why this model specifically? It handles mixed-language text well (I write in English, Russian, and Ukrainian in the same document). It's small enough to run without GPU. And 384 dimensions is sufficient for document-level search -- you'd need 768 or 1536 dimensions for fine-grained passage retrieval, but for "find the decision about Redis vs PostgreSQL", smaller vectors work fine.

If you're sending your internal notes, architecture decisions, and debugging logs to a third-party embedding API, you're sending your entire engineering context to someone else's server. Mesh runs in a single Docker container. Your notes stay where they belong.

What Mesh Memory is (and what it replaces)

Mesh is infrastructure. The memory layer that sits beneath your agents, your CLI tools, your automation. It finds documents. What you do with them is up to you.

It's not RAG. Retrieval without generation. The G in RAG is where hallucinations live. You don't want an AI summarizing your architecture decisions -- you want to read the actual decision yourself.

It's not a second brain app like Obsidian or Notion. No UI for browsing, no pretty cards. It's an API that other tools call.

	Mesh Memory	Obsidian + plugins	Chroma / Pinecone	Plain grep
Self-hosted	Yes	Yes	Managed / self	Yes
Semantic search	Yes	Plugin needed	Yes	No
Auto-tagging	Yes	No	No	No
AI agent REST API	Yes	No	Yes	No
Setup time	60 seconds	30 min	15 min	0
Data stays local	Yes	Yes	Depends	Yes
Cost	Free	Free	$25+/mo	Free
Search latency (p95)	<100ms	Plugin-dependent	50-200ms	Instant
Realistic doc limit	~10K	Unlimited	Unlimited	Unlimited

Limitations

Mesh works well for what I built it for. Here's where it doesn't:

Auto-tagging is 85%, not 100%. I manually correct about 5 tags per day. If your workflow requires perfect categorization, you'll need manual review.

No relational queries. Mesh finds documents by meaning. It doesn't do "show me all decisions that led to bugs" -- that requires a graph database, not a vector store.

Embedding bias. Small models are less precise on highly specialized domains. If your notes are about quantum chemistry, multilingual-e5-small might not distinguish between related but different concepts well.

Scaling ceiling. I run 3,500 documents comfortably. Realistic limit is ~10K on a single PostgreSQL instance. Beyond that, you'd need connection pooling or sharding. For most individual developers, this is plenty.

Three months of production: what surprised me

The biggest productivity gain is not search. It's that I write things down now. Before Mesh, I rarely wrote things down because finding them later was hard. Now I write everything down because finding them later is easy.

Auto-tagging was the tipping point. Manual tagging is friction. Even typing "type:worklog" is friction when you're doing it 30 times a day. Auto-inference removed that last barrier.

Agents changed more than I did. The biggest difference isn't how I use Mesh -- it's how my agents use it. Every Claude Code session, every Brin routing decision, every Rein workflow now starts with memory context. The agents went from amnesiacs to colleagues who remember.

Updated March 22, 2026

I blended three knowledge bases at runtime. The agent became a different specialist.

After a few weeks of testing, Mesh went into production. Twelve agents got their own workspaces -- separate address spaces within the same memory. The marketing agent reads marketing documents. The architect reads architecture decisions. They share the same database but never see each other's files.

The problem it solved was simple: when everything is in one pile, agents drown in irrelevant context. A security reviewer doesn't need content plans. A content manager doesn't need firewall rules. Workspaces fixed that -- tell the agent "focus on marketing" and it switches to 28 documents about brand strategy. Tell it "focus on security" and it's a different specialist with different knowledge.

Two features made this actually flexible. First -- document pinning. Pin a document to a workspace and the agent always sees it, regardless of search results. Platform rules, style guides, brand voice -- things that should be in context every time, not just when the search happens to find them.

Second -- weighted multi-workspace search. Instead of locking an agent into one role, you blend them. Set {"seo": 0.8, "marketing": 0.2} and you get an SEO specialist who understands brand positioning. Set {"sysadmin": 0.6, "architecture": 0.4} -- a sysadmin who thinks about system design. I needed someone who could audit a landing page, suggest copy improvements, and fix the layout. So I set {"sales": 0.5, "webmaster": 0.3, "design": 0.2}. One agent, three knowledge bases, blended by weight.

Today: 8,000 documents across 21 workspaces. When a new specialist joins -- say, a legal reviewer -- I create a workspace, add the relevant documents, and it's ready. No fine-tuning. No prompt engineering. Just memory.