Skip to content

Fix MemoryCache negative _cacheSize drift that permanently latches a size-limited cache#129215

Open
sablancoleis wants to merge 13 commits into
dotnet:mainfrom
sablancoleis:fix/memorycache-sizelimit-negative-drift
Open

Fix MemoryCache negative _cacheSize drift that permanently latches a size-limited cache#129215
sablancoleis wants to merge 13 commits into
dotnet:mainfrom
sablancoleis:fix/memorycache-sizelimit-negative-drift

Conversation

@sablancoleis

@sablancoleis sablancoleis commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Fixes #129186. When MemoryCacheOptions.SizeLimit is set, the internal _cacheSize counter can drift negative under concurrent Set/Get/Remove on string keys. Once negative, the capacity check (ulong)... > (ulong)sizeLimit is permanently true, so every subsequent Set is silently rejected (no exception) and nothing is retained for the lifetime of the cache — CurrentEntryCount stuck at 0, every Get a miss. This is permanent, silent data loss.

Regression source

Introduced by #103931 (merged for 9.0, "Subtract prior entry size when adding entry to cache"), which fixed #36039 — allowing an entry at the size limit to be replaced by a same-or-smaller one. That PR made two changes to the replace path:

  1. It made the capacity check prior-aware (newSize -= priorEntry.Size) so a same-or-smaller replace fits at the limit. This is correct and desirable.
  2. It also moved the _cacheSize decrement of the prior entry out of the post-swap if (entryAdded) block and into the speculative capacity computation, before the TryUpdate swap.

Change (2) is the bug. Pre-#103931 (i.e., 8.x), the prior entry's size was decremented only after a successful TryUpdate, atomically tied to the swap, so a concurrent removal of the prior entry could not double-count it.

Root cause

In SetEntry, the prior entry's size is subtracted speculatively inside UpdateCacheSizeExceedsCapacity, before the entry is actually swapped in by TryUpdate. If another thread runs RemoveEntry(priorEntry) (expiration scan, explicit Remove, or eviction) in the window between that subtraction and the TryUpdate, priorEntry.Size is subtracted twice — once speculatively and once by the real removal in CoherentState.RemoveEntry. Each leaked decrement drives _cacheSize negative, and the (ulong) cast then latches the cache permanently.

Fix

Keep #103931's prior-aware capacity check (so the #36039 behavior and its test are preserved), but commit only +entry.Size to _cacheSize there. Decrement priorEntry.Size exactly once, atomically with the TryUpdate that performs the swap — restoring the pre-#103931 (8.x) ordering. On the failure path, roll back only entry.Size.

This intentionally keeps the existing CompareExchange retry loop and the "check capacity before committing" ordering, so it avoids both problems that closed #124430 (overflow-rollback corruption, and false rejections from a transiently-inflated size). The only transient window now over-counts by priorEntry.Size (positive), which is self-correcting and can never latch the cache.

Validation

Reproduced and validated against the genuine MemoryCache source compiled standalone (concurrent Set/Get/Remove storm on a small string keyspace under a generous SizeLimit, with a fresh-key retention probe):

  • Before: _cacheSize goes negative, 0/512 fresh entries retained.
  • After: no drift (size stays >= 0), 512/512 retained across repeated runs; SizeLimit eviction still enforced (a tiny limit still bounds count/size); and the 2nd call to MemoryCache.Set() with the same key erases entry if cache is full #36039 replace-at-limit behavior (ReplaceOldEntryWithSameSizeOrLessNewEntryAtSizeLimitCapacity, new value size 6/5/2 at limit 6) still passes.

Bisection (same harness): 8.0.x not affected; 9.0.x and 10.0.x affected, on both net462 and net10.0 builds — consistent with #103931 shipping in 9.0.

A single-threaded pure-replace microbenchmark (the only path this change touches) shows the difference vs the current code is within run-to-run noise (~100 ns/op either way) — per-replace cost is dominated by entry allocation and dictionary operations; the fix only restores one Interlocked.Add on the replace path. Happy to add a Set-throughput benchmark to dotnet/performance if desired (per the discussion on #111959).

Tests

Adds MemoryCacheConcurrentSizeTrackingTests with:

  • a concurrency regression test asserting the tracked size never goes negative and the cache keeps retaining entries after a Set/Get/Remove storm, and
  • a guard that SizeLimit eviction is still enforced.

Related: #36039 (behavior preserved), #103931 (regression source), #111959, #124430.

Decrement the prior entry's size exactly once, atomically with the TryUpdate
that swaps it out, instead of speculatively inside UpdateCacheSizeExceedsCapacity
before the swap. The speculative subtraction races with a concurrent RemoveEntry
of the prior entry (expiration/explicit Remove/eviction), double-counts the
decrement, drives _cacheSize negative, and permanently latches the cache into
silently rejecting all inserts. Restores the .NET 8 accounting semantics.

Fixes dotnet#129186
@cincuranet cincuranet self-assigned this Jun 10, 2026
The first revision removed priorEntry.Size from the capacity check as well as
the commit, which regressed CapacityTests.ReplaceOldEntryWithSameSizeOrLessNew
EntryAtSizeLimitCapacity (a same-or-smaller replace at the size limit was
falsely rejected). Restore the prior-aware capacity decision while still
committing only entry.Size to _cacheSize; the prior entry's size is decremented
exactly once, atomically with the TryUpdate swap, which is what fixes the race.
Back the concurrency workers with dedicated threads via
TaskCreationOptions.LongRunning instead of Task.Run so the storm cannot
saturate the shared ThreadPool and starve timing-sensitive post-eviction
callbacks in sibling tests. Sample CurrentEstimatedSize inline (dropping
the busy-spin monitor task), bound the work to a fixed iteration count,
and gate the test on PlatformDetection.IsThreadingSupported.
@sablancoleis sablancoleis force-pushed the fix/memorycache-sizelimit-negative-drift branch from 4b2cb2e to 9fddf79 Compare June 10, 2026 22:01
@sablancoleis

Copy link
Copy Markdown
Author

@dotnet-policy-service agree company="Microsoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants