Describe the bug
MemMachine: Latest release deployed locally using docker-compose, integrated with Ollama.
When using kiro-cli 1.28.1 and mcp server, context gets polluted even with simple name + sirname retrieval
Steps to reproduce
a) In one kiro-cli session, ask to remember your name "Can you remember, that my name is Karol Piatek?"
b) in separate kiro-cli session, ask to retrieve your name "Do you remember, what is my name?"
What is retrieved by MCP server, are the uids, fields from the both short term and long term memory.
More description on the analysis of the code attaching below.
Expected behavior
Context reaches 33%, after loading all tools and for example model invokes ls.
Environment
Ubuntu 24.04 LTS, 4vCPU, 16GB of RAM, m6a.xlarge instance,
MemMachine 1.33
Additional context
The context jumped to 39%, from 10% because the search_memory tool returned a large JSON response
containing all your stored memories - both episodic (conversation history) and semantic (
profile information). Even though your question was simple, the memory search retrieved the
full memory structure with metadata, which consumed significant context space.
This is a common issue with memory systems - they can return verbose data structures even
for simple queries. The tool retrieved your name correctly but also pulled along all the
associated memory data, timestamps, IDs, and metadata that wasn't needed for the simple
question.
Root causes, in order of impact:
- top_k defaults to 20 in the MCP tool
In mcp.py line ~492:
python
async def mcp_search_memory(
query: str,
...
top_k: int = 20, # ← fetches up to 20 episodes
Every search call retrieves up to 20 results by default, even for a
simple question.
- The response returns the full SearchResult object — raw, unfiltered
_search_target_memories in service.py returns the entire SearchResult
Pydantic model, which includes:
- episodic_memory → long_term_memory.episodes[] +
short_term_memory.episodes[] + episode_summary[]
- semantic_memory → list of SemanticFeature objects
Each EpisodeResponse (extends EpisodeEntry) carries: uid, content,
producer_id, producer_role, produced_for_id, episode_type, content_type,
filterable_metadata, metadata, created_at, score. That's a lot of fields
per episode × 20 episodes.
- Both memory types are always searched
In memory.py (client SDK):
python
types=[MemoryType.Episodic, MemoryType.Semantic], # Search both types
And in mcp.py, ALL_MEMORY_TYPES is passed — so every search hits both
episodic (graph DB) and semantic (profile SQL) stores and returns results
from both.
- No response trimming before returning to the LLM
The MCP tool returns the raw SearchResult directly. There's no step that
strips metadata, scores, timestamps, or unused fields before the result
lands in the LLM context.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Quick wins to reduce context bloat:
- Lower top_k default from 20 to 5 in mcp_search_memory
- Return a simplified/summarized response from the MCP tool instead of
the raw SearchResult (e.g., just content strings + uids)
- Only search the relevant memory type when the query intent is clear (
e.g., skip semantic if it's a conversational recall)
Describe the bug
MemMachine: Latest release deployed locally using docker-compose, integrated with Ollama.
When using kiro-cli 1.28.1 and mcp server, context gets polluted even with simple name + sirname retrieval
Steps to reproduce
a) In one kiro-cli session, ask to remember your name "Can you remember, that my name is Karol Piatek?"
b) in separate kiro-cli session, ask to retrieve your name "Do you remember, what is my name?"
What is retrieved by MCP server, are the uids, fields from the both short term and long term memory.
More description on the analysis of the code attaching below.
Expected behavior
Context reaches 33%, after loading all tools and for example model invokes ls.
Environment
Ubuntu 24.04 LTS, 4vCPU, 16GB of RAM, m6a.xlarge instance,
MemMachine 1.33
Additional context
The context jumped to 39%, from 10% because the search_memory tool returned a large JSON response
containing all your stored memories - both episodic (conversation history) and semantic (
profile information). Even though your question was simple, the memory search retrieved the
full memory structure with metadata, which consumed significant context space.
This is a common issue with memory systems - they can return verbose data structures even
for simple queries. The tool retrieved your name correctly but also pulled along all the
associated memory data, timestamps, IDs, and metadata that wasn't needed for the simple
question.
Root causes, in order of impact:
In mcp.py line ~492:
python
async def mcp_search_memory(
query: str,
...
top_k: int = 20, # ← fetches up to 20 episodes
Every search call retrieves up to 20 results by default, even for a
simple question.
_search_target_memories in service.py returns the entire SearchResult
Pydantic model, which includes:
short_term_memory.episodes[] + episode_summary[]
Each EpisodeResponse (extends EpisodeEntry) carries: uid, content,
producer_id, producer_role, produced_for_id, episode_type, content_type,
filterable_metadata, metadata, created_at, score. That's a lot of fields
per episode × 20 episodes.
In memory.py (client SDK):
python
types=[MemoryType.Episodic, MemoryType.Semantic], # Search both types
And in mcp.py, ALL_MEMORY_TYPES is passed — so every search hits both
episodic (graph DB) and semantic (profile SQL) stores and returns results
from both.
The MCP tool returns the raw SearchResult directly. There's no step that
strips metadata, scores, timestamps, or unused fields before the result
lands in the LLM context.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Quick wins to reduce context bloat:
the raw SearchResult (e.g., just content strings + uids)
e.g., skip semantic if it's a conversational recall)