Developer Tooling

Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

How I hacked together a semantic memory system for AI assistants with the Model Context Protocol

JS
Jay Simons
Author
Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

How I hacked together a semantic memory system for AI assistants with the Model Context Protocol


The Problem: Assistants With Goldfish Memory

You’ve probably run into this: you’re mid-project, bouncing ideas off Cursor, Claude, or whatever AI assistant you like. After hours of hashing through architecture choices and debugging strategies, you start a fresh session and… everything’s gone.

No history. No context. No sense of continuity. You’re left re-explaining the same project details that should have been “obvious” from earlier conversations.

That’s the context window problem. AI assistants don’t actually remember anything — they just replay what’s in the current conversation buffer. Once that buffer’s gone, so is your context.

Why the Current Fixes Don’t Cut It

Tools try to patch this problem, but none of them really solve it:

  • File context injection → fine for raw code, useless for design decisions
  • Project summaries → stale as soon as the code changes
  • Chat history → bounded by token limits and resets every new session
  • Manual notes → slow, brittle, not semantic

What we actually need is memory that sticks — and more importantly, memory that understands meaning instead of just matching keywords.

Introducing MCPMem!

That’s why I built it: a Model Context Protocol (MCP) server that gives AI assistants a way to store and retrieve memories semantically.

Why it’s different

  • Stores and searches by meaning (via OpenAI embeddings)
  • Persists across sessions (your assistant actually remembers)
  • MCP-native — integrates with any MCP-capable assistant
  • Fast vector search via SQLite + sqlite-vec
  • Minimal setup, works out of the box

It’s basically a lightweight memory layer you can drop in and instantly upgrade your assistant.

Under the Hood

1. Semantic Embeddings

Every memory gets embedded with OpenAI’s text-embedding-3-small, so searches return relevant context even if the words don’t match exactly.

2. SQLite Vector Search

Memories and embeddings live in SQLite with sqlite-vec. Queries come back in milliseconds, even across thousands of entries.

3. MCP Integration

Because it’s an MCP server, assistants can call it directly as part of the flow. Store, search, and retrieve are just standard MCP commands.

How It Feels in Practice

  • Project knowledge base: store architecture decisions, bug fixes, team agreements — pull them up later with semantic queries.
  • Learning log: stash notes, patterns, gotchas — search them when you hit similar problems.
  • Team memory: assistants can keep track of past discussions, design calls, and decisions without rehashing them.

Setup

It’s dead simple:

BASH
npm install -g mcpmem
export OPENAI_API_KEY=your-key-here
mcpmem store "Remember: use strict TypeScript mode"
mcpmem search "typescript config"

Or wire it into your MCP config for Cursor/Claude:

JS
{
  "mcpServers": {
    "mcpmem": {
      "command": "npx",
      "args": ["mcpmem"],
      "env": {
        "OPENAI_API_KEY": "sk-svcacct-...",
        "OPENAI_MODEL": "text-embedding-3-small",
        "MCPMEM_DB_PATH": "/Users/johndoe/mcpmem/mcpmem.db"
      }
    }
  }
}

Cursor MCP Example

What Changed for Me

Before MCPMem

  • Explaining the same context repeatedly
  • Losing details between sessions
  • Wasting time writing notes I’d never search

After MCPMem

  • My assistant remembers context across chats
  • Semantic search brings back the right info fast
  • Project knowledge actually compounds over time

Lessons Learned

  • Semantic > keyword search. It feels like cheating once you’ve used it.
  • MCP is a surprisingly clean way to extend assistants.
  • You don’t need Pinecone or a massive vector DB — SQLite does just fine.
  • UX trumps everything. If memory isn’t seamless, you won’t use it.

Roadmap

  • Local embedding generation (no API calls needed)
  • Memory clustering and tagging
  • Import/export for team knowledge bases
  • Multi-modal memory (code, docs, images)

If you’re sick of AI tools that reset every conversation, MCPMem gives them something closer to real memory.

👉 GitHub repo
👉 NPM


Thank you for reading! Please visit my portfolio site when you have some free time!

https://yaa.bz

Also, read my blog:

https://blog.designly.biz

I post regular articles about full-stack development and systems administration.

Comments (0)

Join the discussion and share your thoughts on this post.

💬

No comments yet

Be the first to share your thoughts!

© 2025 Jay Simons. All rights reserved.