Skip to content

feat(agents-server-ui): stream model reasoning into the UI#4508

Open
kevin-dp wants to merge 15 commits into
mainfrom
kevin/reasoning-content
Open

feat(agents-server-ui): stream model reasoning into the UI#4508
kevin-dp wants to merge 15 commits into
mainfrom
kevin/reasoning-content

Conversation

@kevin-dp

@kevin-dp kevin-dp commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing Thinking shimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to ▸ Thought for 12s — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.

Implementation (end-to-end)

  • Schemareasoning row gains run_id, encrypted (Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), and summary_title (extracted at write time). New reasoningDeltas collection mirrors textDeltas. Strictly additive.
  • BridgeOutboundBridge gains onReasoningStart / onReasoningDelta / onReasoningEnd, parallel to the text path. Reasoning counter added to OutboundIdSeed.
  • Adapterpi-adapter.ts routes pi-ai's thinking_start / thinking_delta / thinking_end events to the bridge. Parses a **Title**\n\n<body> heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles late thinking_delta without a preceding thinking_start, and closes an open reasoning row on message_end (e.g. provider abort).
  • Timeline — Live reasoning: Collection<EntityTimelineReasoningItem> on EntityTimelineRunRow, content built via the same delta-join pattern as EntityTimelineTextItem.content.
  • UI — New <ReasoningSection> renders above items in AgentResponseLive:
    • Live: faded markdown via Streamdown with ThinkingIndicator heading + summary title + elapsed-time ticker
    • Settled: ▸ Thought for Ns with click-to-expand. Closure duration snapshotted from Date.now() - timestamp using the same sawStreamingRef trick from the elapsed-time PR — accurate for in-session settles, stays a bare Thought for rows already settled on first mount (no real end timestamp available client-side).
    • Redacted: Anthropic safety-filter payloads render ⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.

Reference

Patterns informed by reading OpenCode's reasoning implementation:

  • 3-event streaming protocol (reasoning-start / reasoning-delta / reasoning-end)
  • ReasoningPart storage shape including encrypted for Anthropic round-trip
  • reasoningSummary() headline parser (5-line regex, OpenAI Responses only)
  • Collapsed-by-default UX with click-to-expand

Test plan

  • pnpm typecheck clean in agents-runtime + agents-server-ui
  • pnpm test outbound-bridge pi-adapter entity-timeline in agents-runtime (95 passed: 18 bridge + 21 adapter + 56 timeline)
  • pnpm test in agents-server-ui (66 passed)
  • pnpm -C packages/agents-runtime build — dist artifacts emit cleanly
  • Manual: prompt Anthropic Claude with extended-thinking enabled; verify streaming reasoning appears faded above the answer with elapsed ticker, then collapses to Thought for Ns on settle
  • Manual: multi-step tool-using turn; verify each step's reasoning renders as a separate collapsible row

Notes

  • Cached AgentResponse (the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.
  • The pre-existing runtime-dsl.test.ts 401 failures (and dispatch-policy-routing.test.ts 500 failures) reproduce identically on clean main and were not introduced by this PR.

🤖 Generated with Claude Code

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1
reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent
response now shows the reasoning text faded above the answer, with the
existing `Thinking` shimmer heading + elapsed-time ticker. Once the
reasoning settles, it collapses to `▸ Thought for 12s` — click to
expand. Multiple reasoning rows per run render independently in order
(one per LLM step in tool-using turns).

End-to-end plumbing:

- Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic
  redacted blocks must round-trip back to the model), and
  `summary_title` (extracted at write time). New `reasoningDeltas`
  collection mirrors `textDeltas` for streamed content.
- Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta`
  / `onReasoningEnd`, parallel to text.
- Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` /
  `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading
  once at write time (OpenAI Responses; no-op for others).
- Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>`
  on `EntityTimelineRunRow`, content built via delta-join.
- UI: new `<ReasoningSection>` renders above items in
  `AgentResponseLive`. Streamdown body, click-to-expand on settle,
  redacted-block placeholder for opaque Anthropic payloads.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit c813bcd.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 55.89354% with 116 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.22%. Comparing base (618810c) to head (c813bcd).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ents-server-ui/src/components/ReasoningSection.tsx 0.00% 32 Missing ⚠️
.../agents-server-ui/src/components/AgentResponse.tsx 0.00% 30 Missing ⚠️
packages/agents-runtime/src/outbound-bridge.ts 49.12% 29 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts 53.06% 23 Missing ⚠️
packages/agents/src/model-catalog.ts 92.59% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4508       +/-   ##
===========================================
- Coverage   74.83%   58.22%   -16.62%     
===========================================
  Files          54      371      +317     
  Lines        7300    40887    +33587     
  Branches     2353    11594     +9241     
===========================================
+ Hits         5463    23806    +18343     
- Misses       1820    17006    +15186     
- Partials       17       75       +58     
Flag Coverage Δ
packages/agents 71.74% <92.59%> (?)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 75.49% <ø> (?)
packages/agents-runtime 82.11% <70.11%> (?)
packages/agents-server 74.86% <ø> (+0.02%) ⬆️
packages/agents-server-ui 6.22% <0.00%> (?)
packages/electric-ax 46.42% <ø> (?)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.71% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 58.22% <55.89%> (-16.62%) ⬇️
unit-tests 58.22% <55.89%> (-16.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit c813bcd.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.

Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject

  thinking: { type: "enabled", budget_tokens: <by effort> }

into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.

@KyleAMathews KyleAMathews left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely! Could you add a screenshot of the UI to the PR body?

kevin-dp and others added 3 commits June 8, 2026 14:53
Three latent bugs in the reasoning-content branch that together made
extended thinking and the assistant's answer text fail to render:

1. **Alias collision in the timeline live query** —
   `entity-timeline.ts` had two correlated sub-queries (one for
   `items.text.content`, one for `reasoning.content`) both using
   `chunk` as the `from({...})` alias. TanStack DB silently
   mis-bound the correlation when both were active in the same run
   projection, so `items.text.content` came back as an empty string
   even though the deltas were present in `db.collections.textDeltas`.
   Reasoning won the binding; the answer didn't render at all.

   Fix: rename the inner alias to `textChunk`, and hoist the union
   row's text fields to top-level scalars (`text_key`, `text_run_id`,
   …) so the correlation references a top-level field instead of a
   nested `item.text.key` (also a source of empty joins).

2. **Anthropic thinking always-on instead of opt-in** —
   `withProviderPayloadDefaults` short-circuited for Anthropic when
   `reasoningEffort` was `auto`, so no `thinking` parameter ever
   reached the API. The OpenAI branch already defaulted `auto` to
   `minimal`; Anthropic now does the same (1024-token budget). `low`
   / `medium` / `high` scale the budget exactly as before.

3. **Anthropic `thinking` merge order** — pi-ai writes
   `thinking: { type: "disabled" }` into the request body by default.
   Our `onPayload` was merging `existingThinking` _last_, so the
   default `type: "disabled"` clobbered our `type: "enabled"` and
   the API rejected `budget_tokens` with
   `thinking.disabled.budget_tokens: Extra inputs are not permitted`.
   Spread `existingThinking` first now, then `type` + `budget_tokens`.

Tests:
- `entity-timeline.test.ts` — regression test exercises
  `createEntityTimelineQuery` end-to-end with text and reasoning rows
  in the same run; fails on the alias collision, passes with the
  rename + flat-field projection.
- `model-catalog.test.ts` — adds Anthropic-side coverage that mirrors
  the existing OpenAI tests: always-on minimal budget on `auto`,
  scaled budget on explicit effort, and `type: disabled` override
  for pre-existing `thinking` in the payload.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…eltas

The reasoning sub-collection's `content` field — projected via
`concat(toArray(<correlated delta-join>))` — went stale in the
running app after the row's status flipped to `completed`, surfacing
`content: null` in the live query even though the deltas were still
present in the local DB. The expand-thought-block view rendered an
empty body until the user navigated away and back (forcing a fresh
live-query subscription), at which point the join evaluated cleanly.

Unit tests for the same projection pattern all pass — the bug only
reproduces in the running app, against an established live-query
graph with overlapping text/reasoning subscriptions. The sub-query
itself is correct (data is there after a fresh subscription), but
something about the long-lived subscription state makes the
correlated row binding stale.

Sidestep the unreliable projection entirely:

- **Timeline query** — drop the `content` field from
  `EntityTimelineReasoningItem`. Expose `run.reasoningDeltas` as a
  parallel sub-collection (mirroring `run.reasoning`), surfacing the
  raw deltas keyed by `reasoning_id`.
- **UI** — `AgentResponseLive` subscribes to both `run.reasoning` and
  `run.reasoningDeltas`, builds a `Map<reasoning_id, content>` from
  the deltas client-side, and merges it onto the reasoning rows
  before handing them to `<ReasoningSection>`. Reactive on every
  delta arrival, no stale state.
- **State lift** — `expanded` for the collapsed "Thought for Ns"
  toggle moves from `ReasoningEntryView` (per-entry) up to
  `ReasoningSection` (keyed by `entry.key`), so the user's choice
  survives any spurious unmount of the entry view (virtualizer
  measurement passes, brief entries-empty states, etc.).

Tests:
- New regressions in `entity-timeline.test.ts` exercise the deltas
  sub-collection with the same shape as the failing production
  scenario: reasoning + text together, multi-step run-row updates,
  status transitions.

Follow-up: investigate why the original correlated sub-query goes
stale only against long-lived live-query graphs (passes in tests).
The `content` projection has been left commented-out in case we
want to restore it after fixing the underlying TanStack DB issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The original `reasoning.content` projection used
`concat(toArray(<correlated delta-join>))`, which TanStack DB compiles
to a `buildIncludesSubquery(..., 'concat')` node — a specialized
differential-dataflow operator that incrementally maintains a
string-concatenation of a child query's projection.

Unit tests of the same projection shape pass cleanly: a fresh
`createLiveQueryCollection` evaluates the join correctly on initial
preload, and again after status flips. Tests do not reproduce the
production failure mode (long-lived subscription where `content`
silently goes from populated → null after the row's status flips,
recovering only after a full live-query teardown).

Leaving a placeholder test as a marker — when we have a repro, drop
the body in here and restore the `content` field in
`entity-timeline.ts:buildEntityTimelineQuery`. The current fix
sidesteps the issue by exposing `run.reasoningDeltas` and assembling
content client-side, which is reliable but bypasses what should be
a working server-side projection.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
kevin-dp and others added 2 commits June 9, 2026 09:57
Restore the original nested-text shape on \`runItemsSource\` —
\`text: caseWhen(text.key, {...})\` and \`textContent: concat(toArray(...))\`
projected together on the union row — and undo the flat-scalar
hoist (\`text_key\`, \`text_run_id\`, \`text_order\`, \`text_status\`).
The \`textChunk\` alias on the delta-join stays, since that's the
load-bearing change that actually fixed the original \`chunk\`
alias collision with the reasoning sub-query.

When fixing the original alias-collision bug I made two changes in
one commit:

1. Renamed the text delta-join alias \`chunk\` → \`textChunk\` so it
   no longer collided with the \`chunk\` used in reasoning content.
2. Hoisted text fields to flat scalars on the union row so the join
   could move out of \`runItemsSource\`'s select and into the items
   consumer's select.

I never bisected the two. Turns out (1) alone is sufficient — the
nested \`text: caseWhen(text.key, {...})\` + co-located \`textContent\`
projection works fine once the alias collision is gone. The flat-
scalar hoist was unnecessary churn that just made the code harder
to read for no behavioral benefit.

Tested by reverting (2), running unit tests (60 still pass), and
verifying in the running app that text content still streams in
and renders correctly through a full Claude exchange.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ection

Reverts the client-side `run.reasoningDeltas` workaround in favor of
the server-side `concat(toArray(...))` projection on
`run.reasoning.content`.

Currently broken in production against `@tanstack/[email protected]` —
documented in `packages/agents-runtime/test/entity-timeline.test.ts`'s
`reasoning content remains populated after status flips to completed`
and friends. Unit tests against the projection pass cleanly; the bug
only surfaces in a long-lived stream-backed live query after the
parent row's `.update()`, with the field silently becoming `null`
even though deltas are present in the local DB. A fresh subscription
(navigate-away + back, or reload) recovers.

Holding this branch as a draft PR so the work isn't lost. Merge once
TanStack DB ships an upstream fix that makes the placeholder tests
pass against a long-lived production live query.

Diff vs `kevin/reasoning-content`:

- `entity-timeline.ts` — add `content: concat(toArray(<delta-join>))`
  back to `reasoning.select(...)`, drop the parallel
  `reasoningDeltas` sub-collection. Alias stays `reasoningChunk`
  (not the generic `chunk`) to avoid the alias-collision class of bug.
- `EntityTimelineReasoningItem` — `content: string` reinstated;
  `EntityTimelineReasoningDeltaItem` removed.
- `client.ts` — drop `EntityTimelineReasoningDeltaItem` export.
- `AgentResponseLive` — drop the `run.reasoningDeltas` subscription
  + client-side concat; `reasoningEntries` reads `content` straight
  off the projected row.
- Tests — three reasoning-content tests assert `reasoning[0].content`
  (rather than concatenating raw deltas).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit ee2b9d4
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a2ab9bdf0b4920008ba55b5
😎 Deploy Preview https://deploy-preview-4508--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Tracks down and fixes the bug that's been driving the
client-side-concat workaround in #4508 and blocking #4532.

## Root cause

TanStack DB's "includes" — fields whose value is a sub-query like
\`concat(toArray(...))\` — are deferred. A row carrying an include
arrives with the field set to \`null\` and a hidden
\`Symbol(includesRouting)\` marker describing how to compute it. The
include is only materialized when something downstream reads it
*in the right way*.

The empirical rule (figured out via DevTools probes — \`.toArray\` on
the sub-collection always showed the populated string, \`useLiveQuery\`
output had \`content: null\`):

  **An include is materialized only when it's referenced inside a
  \`caseWhen\` object body in a downstream \`.select(...)\`. A bare
  top-level reference doesn't trigger it — the include is just
  aliased forward, still deferred.**

This is why \`items.text.content\` has always worked and reasoning
hasn't. The items consumer derefs \`item.textContent\` inside the
\`text: caseWhen(item.text.key, { ..., content: item.textContent })\`
body. The reasoning consumer had \`content: concat(toArray(...))\`
(or, after the source/consumer split,
\`content: r.reasoningContent\`) at the top level of its select.
useLiveQuery handed the row to React with \`content: null\`.

## Fix

Wrap the include reference inside a \`caseWhen\` object body, mirroring
items:

\`\`\`ts
reasoning: q
  .from({ r: runReasoningSource })
  ...
  .select(({ r }) => ({
    key: r.key,
    run_id: r.run_id,
    order: r.order,
    status: r.status,
    body: caseWhen(r.key, {
      content: r.reasoningContent,
    }),
    summary_title: r.summary_title,
    encrypted: r.encrypted,
  }))
\`\`\`

\`r.key\` is always truthy on a real row, so the caseWhen is
effectively unconditional — its only purpose is being an object body
that forces the include reference to materialize.

UI reads \`entry.body?.content\` (via the type) and \`AgentResponseLive\`
maps it back into a flat \`content: string\` on \`ReasoningEntry\` so
\`ReasoningSection\`'s API is unchanged.

This drops the need for the client-side concat workaround that was
the original target of #4532.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Comment thread packages/agents-runtime/src/outbound-bridge.ts
@kevin-dp

kevin-dp commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@KyleAMathews here are some screenshots showing how it displays while it's thinking and how it displays when it's done thinking (the "Thought for 2s" block is expandable on click).

thinking-block thought-block

kevin-dp and others added 6 commits June 9, 2026 12:30
The entity-stream-db mock omitted the reasoning and reasoningDeltas
collections, so loadOutboundIdSeed crashed when reading
db.collections.reasoning.toArray under three process-wake scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
# Conflicts:
#	packages/agents-runtime/src/entity-timeline.ts
#	packages/agents-runtime/test/entity-timeline.test.ts
#	packages/agents-server-ui/src/components/AgentResponse.tsx
# Conflicts:
#	packages/agents-runtime/src/entity-timeline.ts
# Conflicts:
#	packages/agents/src/model-catalog.ts
#	packages/agents/test/model-catalog.test.ts
…m with run items (#4570)

## Summary

Two fixes for the reasoning-stream UI added in #4508 (note: that feature
is on `kevin/reasoning-content`, not yet on `main`, so this PR targets
the feature branch):

1. **No more empty thinking blocks.** Some models report that they
reasoned but never expose the tokens (e.g. OpenAI codex models) —
`pi-adapter.ts` deliberately opens a reasoning row on `thinking_start`
even when no delta ever arrives, so the UI rendered a blank live block
that settled into an empty `▸ Thought` row. `AgentResponseLive` now
filters out rows with no content client-side. Anthropic redacted rows
(`encrypted` set) are kept and still render their placeholder, and a
genuinely-streaming block appears as soon as its first delta lands.
Persistence is untouched — empty rows are still recorded (they can carry
the encrypted payload that must round-trip to the model).

2. **Reasoning blocks interleave with the response instead of stacking
at the top.** Previously all of a run's reasoning rows rendered in one
`<ReasoningSection>` above every text/tool-call item, so in multi-step
tool-using runs step-3 thinking appeared above step-1 output. Reasoning
rows already carry the same `_timeline_order` as text/tool-call rows, so
`AgentResponseLive` now merges both streams into one ordered render list
— each block renders at the position the model emitted it (think → write
→ call tool → think → …). On an order tie (legacy rows without
`_timeline_order`), reasoning sorts before output.

## Implementation

- `ReasoningSection` → `ReasoningBlock`: the component now renders a
single entry; expand/collapse state is lifted to `AgentResponseLive`
(keyed by row key) so it still survives the block unmounting/remounting,
same as before.
- `ReasoningEntry` gains an `order` field (same `TimelineOrder` space as
run items).
- New `LiveRenderEntry` union + `compareLiveRenderEntries` comparator;
item-vs-item ties keep delegating to `compareLiveRunItems`.
- The `.root` width wrapper in `ReasoningSection.module.css` is gone —
blocks are now direct children of the `AgentResponse` root, which
applies the same width treatment, so they align with text items.
- The streaming flag for the last text item now compares against
`lastItem` by identity instead of array index (the index no longer maps
1:1 once reasoning entries are interleaved).

## Test plan

- [x] `pnpm typecheck` clean in `agents-server-ui`
- [x] `pnpm test` in `agents-server-ui` (88 passed)
- [ ] Manual: codex-model run shows no empty thought block; multi-step
Anthropic extended-thinking run shows blocks interleaved between
text/tool calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Fable 5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants