feat(report): add multi-backend summarizer and LLM task grouping by leecoder · Pull Request #633 · junhoyeo/tokscale

leecoder · 2026-05-29T06:05:32Z

Summary

Add multi-backend LLM summarizer support and automatic task grouping to the tokscale report command.

Changes

Multi-backend summarizer

Support claude, codex, gemini, kiro as summarizer backends in addition to the default apple-fm
New --summarizer flag to select backend (default: apple-fm)
Batch processing with progress indicator for CLI-based backends

LLM task grouping (2nd pass)

After summarization, a second LLM pass clusters all titled sessions into 3–8 high-level task groups
Groups are displayed in the report table with sub-session titles indented below
Results cached in wiki DB — subsequent runs skip already-grouped sessions

Date-scoped operations

Summarization now respects --week, --since/--until date filters
New --rebuild flag resets cached summaries within the date range and re-summarizes from scratch

Schema changes

Add task_group column to wiki_entries table with auto-migration
Add reset_summaries_in_range and get_unsummarized_session_ids_in_range DB methods

Documentation

Add Task-Attributed Report section to README with usage examples, backend table, and sample output

Usage

tokscale report --week --summarizer claude
tokscale report --rebuild --summarizer codex
tokscale report --week --json

Summary by cubic

Adds a task-attributed usage report with multi-backend summarization and optional LLM task grouping. Introduces the tokscale report command with model/task breakdowns, daily/monthly views, and a local wiki DB cache.

New Features
- tokscale report shows model and task-group breakdowns; daily view for --week/--month, session list for today. Results cached in a local wiki DB; --rebuild re-summarizes in range. Auto-migration adds task_group.
- --summarizer backends: apple-fm (default), claude, codex, gemini, kiro; batch mode for CLI backends. A second LLM pass clusters titled sessions into 3–8 task groups. Requires a CLI backend; apple-fm skips grouping with a clear message.
- Filters/output: --today/--week/--month, --since/--until (exclusive end), --workspace, --client, --no-summarize, --json, --rebuild.
Bug Fixes
- Date range accuracy: until is exclusive using next-day 00:00 minus 1ms; DB queries now use < to match.
- Unknown --summarizer now errors; DB errors from summary/task-group updates propagate to the CLI.
- wiki-summarizer.py: validate task_category/complexity, fix missing-ID crash; correct wiki DB fallback path resolution.

^{Written for commit f00d277. Summary will update on new commits.}

vercel · 2026-05-29T06:05:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
tokscale	Ignored	Preview	Jun 1, 2026 9:01am

cubic-dev-ai

10 issues found across 8 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-core/src/content_extractor.rs">

<violation number="1" location="crates/tokscale-core/src/content_extractor.rs:262">
P2: The truncation guard uses byte length instead of character count, which incorrectly appends ellipses for non-ASCII input.</violation>
</file>

<file name="crates/tokscale-cli/src/commands/report.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/report.rs:208">
P2: Unknown `--summarizer` values return `Ok(())` instead of an error, causing silent misconfiguration and skipped summarization.</violation>

<violation number="2" location="crates/tokscale-cli/src/commands/report.rs:250">
P1: DB errors are swallowed when writing summaries, which can silently lose summarization results while reporting success.</violation>

<violation number="3" location="crates/tokscale-cli/src/commands/report.rs:327">
P2: Task grouping is silently skipped for unsupported backends (including the default `apple-fm`), so the feature can appear to run successfully while never producing task groups.</violation>

<violation number="4" location="crates/tokscale-cli/src/commands/report.rs:347">
P2: DB errors are swallowed when saving `task_group`, so grouping can silently fail while the command reports success.</violation>

<violation number="5" location="crates/tokscale-cli/src/commands/report.rs:727">
P2: End-of-day filtering truncates the last 999ms, so some sessions at the end of the `--until` day are incorrectly excluded.</violation>
</file>

<file name="crates/tokscale-core/src/wiki.rs">

<violation number="1" location="crates/tokscale-core/src/wiki.rs:116">
P2: Fallback config path uses an unexpanded `~`, which can write the wiki DB to an unintended location.</violation>

<violation number="2" location="crates/tokscale-core/src/wiki.rs:375">
P2: The `until` filter is inclusive in `query_entries` but exclusive in other range methods, causing inconsistent date-scoped behavior.</violation>
</file>

<file name="scripts/wiki-summarizer.py">

<violation number="1" location="scripts/wiki-summarizer.py:129">
P2: Validate `task_category`/`complexity` against allowed values before storing. Right now any model output string is accepted, even when it violates the declared schema.</violation>

<violation number="2" location="scripts/wiki-summarizer.py:136">
P2: The recovery path can crash because `session['session_id']` is dereferenced inside the exception handler. A malformed session then raises a second `KeyError` and aborts summarization.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai

6 issues found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-cli/src/commands/optimize.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/optimize.rs:195">
P2: `partial_cmp(...).unwrap()` on `f64` can panic if `total_cost` is NaN. Use the safer pattern already established in the codebase.</violation>

<violation number="2" location="crates/tokscale-cli/src/commands/optimize.rs:455">
P2: Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</violation>
</file>

<file name="crates/tokscale-cli/src/main.rs">

<violation number="1" location="crates/tokscale-cli/src/main.rs:763">
P1: `--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:683">
P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-05-29T06:58:26Z

+                week,
+                month,
+            });
+            if optimize && result.is_ok() {


P1: --optimize currently runs during --json report output, appending human-formatted text and corrupting JSON output for automation.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/tokscale-cli/src/main.rs, line 763: <comment>`--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</comment> <file context> @@ -745,6 +759,38 @@ fn main() -> Result<()> { week, month, + }); + if optimize && result.is_ok() { + let _ = commands::optimize::run_optimize(commands::optimize::OptimizeOptions { + json: false, </file context>

Suggested change

if optimize && result.is_ok() {

if optimize && !json && result.is_ok() {

cubic-dev-ai · 2026-05-29T06:58:26Z

+    println!("  {}", "─".repeat(68));
+    for m in report.model_insights.iter().take(8) {
+        let model_display: String = if m.model.len() > 28 {
+            format!("{}…", &m.model[..27])


P2: Truncating String with [..27] can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/tokscale-cli/src/commands/optimize.rs, line 455: <comment>Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</comment> <file context> @@ -0,0 +1,554 @@ + println!(" {}", "─".repeat(68)); + for m in report.model_insights.iter().take(8) { + let model_display: String = if m.model.len() > 28 { + format!("{}…", &m.model[..27]) + } else { + m.model.clone() </file context>

Suggested change

format!("{}…", &m.model[..27])

format!("{}…", m.model.chars().take(27).collect::<String>())

cubic-dev-ai · 2026-05-29T06:58:26Z

+Tokscale can analyze your usage patterns and generate actionable recommendations to reduce costs and improve productivity.
+
+```bash
+# Standalone optimization analysis (defaults to today)


P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At README.md, line 683: <comment>The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</comment> <file context> @@ -674,6 +675,65 @@ tokscale report --workspace my-project --client opencode +Tokscale can analyze your usage patterns and generate actionable recommendations to reduce costs and improve productivity. + +```bash +# Standalone optimization analysis (defaults to today) +tokscale optimize + </file context>

Suggested change

# Standalone optimization analysis (defaults to today)

# Standalone optimization analysis (uses all cached sessions by default)

- Group sessions by model and task title in summary tables - Show daily breakdown for --week/--month, session list for --today - Integrate Apple FM summarizer for session classification - Add wiki DB for caching session summaries

- Support claude, codex, gemini, kiro as summarizer backends (in addition to apple-fm) - Add 2nd LLM pass to cluster sessions into high-level task groups - Add --summarizer flag to select backend, --rebuild to reset cached summaries - Scope summarization to date range (--week, --since/--until) - Add task_group column to wiki DB with migration - Update README with Task-Attributed Report documentation

…agate DB errors - Return error instead of Ok(()) for unknown --summarizer values - Propagate DB errors from update_summary and update_task_group - Clarify skip message when task grouping backend is unsupported (apple-fm)

- Fix end-of-day filtering: use next_day_00:00 - 1ms instead of 23:59:59 - Fix wiki.rs fallback path: use dirs::home_dir() instead of literal '~/.config' - Fix until filter inconsistency: query_entries now uses '<' (exclusive) matching other range methods - Validate task_category/complexity against allowed values in wiki-summarizer.py - Fix recovery path crash: use session.get('session_id') instead of session['session_id'] in exception handler

- content_extractor: use chars().count() instead of byte len() for truncation guard - wiki.rs: rename from_str to parse to avoid clippy::should_implement_trait - report.rs: use div_ceil() instead of manual ceiling division - usage/copilot.rs: use strip_prefix() instead of manual prefix stripping - usage/minimax.rs: remove redundant closure - usage/zai.rs: fix reference-to-reference pattern - usage/mod.rs: extract type alias for complex type, use .ok() instead of manual match

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-cli/src/main.rs">

<violation number="1" location="crates/tokscale-cli/src/main.rs:763">
P1: `--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</violation>
</file>

<file name="crates/tokscale-cli/src/commands/optimize.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/optimize.rs:455">
P2: Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:683">
P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</violation>
</file>

<file name=".opencode/skill/deploy.md">

<violation number="1" location=".opencode/skill/deploy.md:49">
P2: `docker compose exec` allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use `-T` to disable pseudo-TTY allocation.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic-dev-ai · 2026-06-01T03:07:00Z

+  --instance-ids "i-078fe82953c3047b5" \
+  --document-name "AWS-RunShellScript" \
+  --region ap-northeast-2 \
+  --parameters 'commands=["export HOME=/home/ubuntu && cd /home/ubuntu/tokscale/self-host && docker compose exec app npx drizzle-kit push --force"]' \


P2: docker compose exec allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use -T to disable pseudo-TTY allocation.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At .opencode/skill/deploy.md, line 49: <comment>`docker compose exec` allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use `-T` to disable pseudo-TTY allocation.</comment> <file context> @@ -0,0 +1,61 @@ + --instance-ids "i-078fe82953c3047b5" \ + --document-name "AWS-RunShellScript" \ + --region ap-northeast-2 \ + --parameters 'commands=["export HOME=/home/ubuntu && cd /home/ubuntu/tokscale/self-host && docker compose exec app npx drizzle-kit push --force"]' \ + --timeout-seconds 120 \ + --output json </file context>

cubic-dev-ai Bot reviewed May 29, 2026

View reviewed changes

leecoder force-pushed the feat/report-task-grouping branch 3 times, most recently from 2cf54a9 to 36c4ce5 Compare May 29, 2026 09:47

leecoder added 5 commits June 1, 2026 11:58

leecoder force-pushed the feat/report-task-grouping branch from 36c4ce5 to d6c5432 Compare June 1, 2026 03:00

cubic-dev-ai Bot reviewed Jun 1, 2026

View reviewed changes

leecoder force-pushed the feat/report-task-grouping branch 2 times, most recently from 7657640 to f00d277 Compare June 1, 2026 09:01

	if optimize && result.is_ok() {
	if optimize && !json && result.is_ok() {

	format!("{}…", &m.model[..27])
	format!("{}…", m.model.chars().take(27).collect::<String>())

	# Standalone optimization analysis (defaults to today)
	# Standalone optimization analysis (uses all cached sessions by default)

Conversation

leecoder commented May 29, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Multi-backend summarizer

LLM task grouping (2nd pass)

Date-scoped operations

Schema changes

Documentation

Usage

Summary by cubic

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leecoder commented May 29, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented May 29, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading