Skip to content

[Bug]: Context creeping, even after simple search, #1278

@plasticity-cloud

Description

@plasticity-cloud

Describe the bug

MemMachine: Latest release deployed locally using docker-compose, integrated with Ollama.

When using kiro-cli 1.28.1 and mcp server, context gets polluted even with simple name + sirname retrieval

Steps to reproduce

a) In one kiro-cli session, ask to remember your name "Can you remember, that my name is Karol Piatek?"
b) in separate kiro-cli session, ask to retrieve your name "Do you remember, what is my name?"

What is retrieved by MCP server, are the uids, fields from the both short term and long term memory.

More description on the analysis of the code attaching below.

Expected behavior

Context reaches 33%, after loading all tools and for example model invokes ls.

Environment

Ubuntu 24.04 LTS, 4vCPU, 16GB of RAM, m6a.xlarge instance,
MemMachine 1.33

Additional context

The context jumped to 39%, from 10% because the search_memory tool returned a large JSON response
containing all your stored memories - both episodic (conversation history) and semantic (
profile information). Even though your question was simple, the memory search retrieved the
full memory structure with metadata, which consumed significant context space.

This is a common issue with memory systems - they can return verbose data structures even
for simple queries. The tool retrieved your name correctly but also pulled along all the
associated memory data, timestamps, IDs, and metadata that wasn't needed for the simple
question.

Root causes, in order of impact:

  1. top_k defaults to 20 in the MCP tool

In mcp.py line ~492:
python
async def mcp_search_memory(
query: str,
...
top_k: int = 20, # ← fetches up to 20 episodes

Every search call retrieves up to 20 results by default, even for a
simple question.

  1. The response returns the full SearchResult object — raw, unfiltered

_search_target_memories in service.py returns the entire SearchResult
Pydantic model, which includes:

  • episodic_memory → long_term_memory.episodes[] +
    short_term_memory.episodes[] + episode_summary[]
  • semantic_memory → list of SemanticFeature objects

Each EpisodeResponse (extends EpisodeEntry) carries: uid, content,
producer_id, producer_role, produced_for_id, episode_type, content_type,
filterable_metadata, metadata, created_at, score. That's a lot of fields
per episode × 20 episodes.

  1. Both memory types are always searched

In memory.py (client SDK):
python
types=[MemoryType.Episodic, MemoryType.Semantic], # Search both types

And in mcp.py, ALL_MEMORY_TYPES is passed — so every search hits both
episodic (graph DB) and semantic (profile SQL) stores and returns results
from both.

  1. No response trimming before returning to the LLM

The MCP tool returns the raw SearchResult directly. There's no step that
strips metadata, scores, timestamps, or unused fields before the result
lands in the LLM context.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Quick wins to reduce context bloat:

  • Lower top_k default from 20 to 5 in mcp_search_memory
  • Return a simplified/summarized response from the MCP tool instead of
    the raw SearchResult (e.g., just content strings + uids)
  • Only search the relevant memory type when the query intent is clear (
    e.g., skip semantic if it's a conversational recall)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssues relating to MemMachine performancesecuritySecurity-related tasks that come from private reports, code scanning, and vulnerability checks.

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions