Skip to content

feat(report): add multi-backend summarizer and LLM task grouping#633

Open
leecoder wants to merge 5 commits into
junhoyeo:mainfrom
leecoder:feat/report-task-grouping
Open

feat(report): add multi-backend summarizer and LLM task grouping#633
leecoder wants to merge 5 commits into
junhoyeo:mainfrom
leecoder:feat/report-task-grouping

Conversation

@leecoder
Copy link
Copy Markdown
Contributor

@leecoder leecoder commented May 29, 2026

Summary

Add multi-backend LLM summarizer support and automatic task grouping to the tokscale report command.

Changes

Multi-backend summarizer

  • Support claude, codex, gemini, kiro as summarizer backends in addition to the default apple-fm
  • New --summarizer flag to select backend (default: apple-fm)
  • Batch processing with progress indicator for CLI-based backends

LLM task grouping (2nd pass)

  • After summarization, a second LLM pass clusters all titled sessions into 3–8 high-level task groups
  • Groups are displayed in the report table with sub-session titles indented below
  • Results cached in wiki DB — subsequent runs skip already-grouped sessions

Date-scoped operations

  • Summarization now respects --week, --since/--until date filters
  • New --rebuild flag resets cached summaries within the date range and re-summarizes from scratch

Schema changes

  • Add task_group column to wiki_entries table with auto-migration
  • Add reset_summaries_in_range and get_unsummarized_session_ids_in_range DB methods

Documentation

  • Add Task-Attributed Report section to README with usage examples, backend table, and sample output

Usage

tokscale report --week --summarizer claude
tokscale report --rebuild --summarizer codex
tokscale report --week --json

Summary by cubic

Adds a task-attributed usage report with multi-backend summarization and optional LLM task grouping. Introduces the tokscale report command with model/task breakdowns, daily/monthly views, and a local wiki DB cache.

  • New Features

    • tokscale report shows model and task-group breakdowns; daily view for --week/--month, session list for today. Results cached in a local wiki DB; --rebuild re-summarizes in range. Auto-migration adds task_group.
    • --summarizer backends: apple-fm (default), claude, codex, gemini, kiro; batch mode for CLI backends. A second LLM pass clusters titled sessions into 3–8 task groups. Requires a CLI backend; apple-fm skips grouping with a clear message.
    • Filters/output: --today/--week/--month, --since/--until (exclusive end), --workspace, --client, --no-summarize, --json, --rebuild.
  • Bug Fixes

    • Date range accuracy: until is exclusive using next-day 00:00 minus 1ms; DB queries now use < to match.
    • Unknown --summarizer now errors; DB errors from summary/task-group updates propagate to the CLI.
    • wiki-summarizer.py: validate task_category/complexity, fix missing-ID crash; correct wiki DB fallback path resolution.

Written for commit f00d277. Summary will update on new commits.

Review in cubic

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
tokscale Ignored Ignored Preview Jun 1, 2026 9:01am

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-core/src/content_extractor.rs">

<violation number="1" location="crates/tokscale-core/src/content_extractor.rs:262">
P2: The truncation guard uses byte length instead of character count, which incorrectly appends ellipses for non-ASCII input.</violation>
</file>

<file name="crates/tokscale-cli/src/commands/report.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/report.rs:208">
P2: Unknown `--summarizer` values return `Ok(())` instead of an error, causing silent misconfiguration and skipped summarization.</violation>

<violation number="2" location="crates/tokscale-cli/src/commands/report.rs:250">
P1: DB errors are swallowed when writing summaries, which can silently lose summarization results while reporting success.</violation>

<violation number="3" location="crates/tokscale-cli/src/commands/report.rs:327">
P2: Task grouping is silently skipped for unsupported backends (including the default `apple-fm`), so the feature can appear to run successfully while never producing task groups.</violation>

<violation number="4" location="crates/tokscale-cli/src/commands/report.rs:347">
P2: DB errors are swallowed when saving `task_group`, so grouping can silently fail while the command reports success.</violation>

<violation number="5" location="crates/tokscale-cli/src/commands/report.rs:727">
P2: End-of-day filtering truncates the last 999ms, so some sessions at the end of the `--until` day are incorrectly excluded.</violation>
</file>

<file name="crates/tokscale-core/src/wiki.rs">

<violation number="1" location="crates/tokscale-core/src/wiki.rs:116">
P2: Fallback config path uses an unexpanded `~`, which can write the wiki DB to an unintended location.</violation>

<violation number="2" location="crates/tokscale-core/src/wiki.rs:375">
P2: The `until` filter is inclusive in `query_entries` but exclusive in other range methods, causing inconsistent date-scoped behavior.</violation>
</file>

<file name="scripts/wiki-summarizer.py">

<violation number="1" location="scripts/wiki-summarizer.py:129">
P2: Validate `task_category`/`complexity` against allowed values before storing. Right now any model output string is accepted, even when it violates the declared schema.</violation>

<violation number="2" location="scripts/wiki-summarizer.py:136">
P2: The recovery path can crash because `session['session_id']` is dereferenced inside the exception handler. A malformed session then raises a second `KeyError` and aborts summarization.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread crates/tokscale-cli/src/commands/report.rs Outdated
Comment thread crates/tokscale-core/src/content_extractor.rs Outdated
Comment thread crates/tokscale-cli/src/commands/report.rs Outdated
Comment thread crates/tokscale-cli/src/commands/report.rs Outdated
Comment thread crates/tokscale-cli/src/commands/report.rs Outdated
Comment thread crates/tokscale-cli/src/commands/report.rs
Comment thread crates/tokscale-core/src/wiki.rs Outdated
Comment thread crates/tokscale-core/src/wiki.rs Outdated
Comment thread scripts/wiki-summarizer.py Outdated
Comment thread scripts/wiki-summarizer.py Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-cli/src/commands/optimize.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/optimize.rs:195">
P2: `partial_cmp(...).unwrap()` on `f64` can panic if `total_cost` is NaN. Use the safer pattern already established in the codebase.</violation>

<violation number="2" location="crates/tokscale-cli/src/commands/optimize.rs:455">
P2: Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</violation>
</file>

<file name="crates/tokscale-cli/src/main.rs">

<violation number="1" location="crates/tokscale-cli/src/main.rs:763">
P1: `--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:683">
P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread crates/tokscale-cli/src/main.rs Outdated
week,
month,
});
if optimize && result.is_ok() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: --optimize currently runs during --json report output, appending human-formatted text and corrupting JSON output for automation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/tokscale-cli/src/main.rs, line 763:

<comment>`--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</comment>

<file context>
@@ -745,6 +759,38 @@ fn main() -> Result<()> {
                 week,
                 month,
+            });
+            if optimize && result.is_ok() {
+                let _ = commands::optimize::run_optimize(commands::optimize::OptimizeOptions {
+                    json: false,
</file context>
Suggested change
if optimize && result.is_ok() {
if optimize && !json && result.is_ok() {

Comment thread crates/tokscale-cli/src/commands/optimize.rs Outdated
Comment thread crates/tokscale-cli/src/commands/optimize.rs Outdated
Comment thread crates/tokscale-cli/src/main.rs Outdated
println!(" {}", "─".repeat(68));
for m in report.model_insights.iter().take(8) {
let model_display: String = if m.model.len() > 28 {
format!("{}…", &m.model[..27])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Truncating String with [..27] can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/tokscale-cli/src/commands/optimize.rs, line 455:

<comment>Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</comment>

<file context>
@@ -0,0 +1,554 @@
+    println!("  {}", "─".repeat(68));
+    for m in report.model_insights.iter().take(8) {
+        let model_display: String = if m.model.len() > 28 {
+            format!("{}…", &m.model[..27])
+        } else {
+            m.model.clone()
</file context>
Suggested change
format!("{}…", &m.model[..27])
format!("{}…", m.model.chars().take(27).collect::<String>())

Comment thread README.md Outdated
Tokscale can analyze your usage patterns and generate actionable recommendations to reduce costs and improve productivity.

```bash
# Standalone optimization analysis (defaults to today)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At README.md, line 683:

<comment>The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</comment>

<file context>
@@ -674,6 +675,65 @@ tokscale report --workspace my-project --client opencode
+Tokscale can analyze your usage patterns and generate actionable recommendations to reduce costs and improve productivity.
+
+```bash
+# Standalone optimization analysis (defaults to today)
+tokscale optimize
+
</file context>
Suggested change
# Standalone optimization analysis (defaults to today)
# Standalone optimization analysis (uses all cached sessions by default)

@leecoder leecoder force-pushed the feat/report-task-grouping branch 3 times, most recently from 2cf54a9 to 36c4ce5 Compare May 29, 2026 09:47
leecoder added 5 commits June 1, 2026 11:58
- Group sessions by model and task title in summary tables
- Show daily breakdown for --week/--month, session list for --today
- Integrate Apple FM summarizer for session classification
- Add wiki DB for caching session summaries
- Support claude, codex, gemini, kiro as summarizer backends (in addition to apple-fm)
- Add 2nd LLM pass to cluster sessions into high-level task groups
- Add --summarizer flag to select backend, --rebuild to reset cached summaries
- Scope summarization to date range (--week, --since/--until)
- Add task_group column to wiki DB with migration
- Update README with Task-Attributed Report documentation
…agate DB errors

- Return error instead of Ok(()) for unknown --summarizer values
- Propagate DB errors from update_summary and update_task_group
- Clarify skip message when task grouping backend is unsupported (apple-fm)
- Fix end-of-day filtering: use next_day_00:00 - 1ms instead of 23:59:59
- Fix wiki.rs fallback path: use dirs::home_dir() instead of literal '~/.config'
- Fix until filter inconsistency: query_entries now uses '<' (exclusive) matching other range methods
- Validate task_category/complexity against allowed values in wiki-summarizer.py
- Fix recovery path crash: use session.get('session_id') instead of session['session_id'] in exception handler
- content_extractor: use chars().count() instead of byte len() for truncation guard
- wiki.rs: rename from_str to parse to avoid clippy::should_implement_trait
- report.rs: use div_ceil() instead of manual ceiling division
- usage/copilot.rs: use strip_prefix() instead of manual prefix stripping
- usage/minimax.rs: remove redundant closure
- usage/zai.rs: fix reference-to-reference pattern
- usage/mod.rs: extract type alias for complex type, use .ok() instead of manual match
@leecoder leecoder force-pushed the feat/report-task-grouping branch from 36c4ce5 to d6c5432 Compare June 1, 2026 03:00
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/tokscale-cli/src/main.rs">

<violation number="1" location="crates/tokscale-cli/src/main.rs:763">
P1: `--optimize` currently runs during `--json` report output, appending human-formatted text and corrupting JSON output for automation.</violation>
</file>

<file name="crates/tokscale-cli/src/commands/optimize.rs">

<violation number="1" location="crates/tokscale-cli/src/commands/optimize.rs:455">
P2: Truncating `String` with `[..27]` can panic on non-ASCII model names due to invalid UTF-8 boundary slicing.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:683">
P2: The new optimize example claims it defaults to today, but the command actually uses all sessions unless a date flag is provided.</violation>
</file>

<file name=".opencode/skill/deploy.md">

<violation number="1" location=".opencode/skill/deploy.md:49">
P2: `docker compose exec` allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use `-T` to disable pseudo-TTY allocation.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread .opencode/skill/deploy.md Outdated
--instance-ids "i-078fe82953c3047b5" \
--document-name "AWS-RunShellScript" \
--region ap-northeast-2 \
--parameters 'commands=["export HOME=/home/ubuntu && cd /home/ubuntu/tokscale/self-host && docker compose exec app npx drizzle-kit push --force"]' \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: docker compose exec allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use -T to disable pseudo-TTY allocation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .opencode/skill/deploy.md, line 49:

<comment>`docker compose exec` allocates a TTY by default; in non-interactive environments like AWS SSM this can fail with "the input device is not a TTY". Use `-T` to disable pseudo-TTY allocation.</comment>

<file context>
@@ -0,0 +1,61 @@
+  --instance-ids "i-078fe82953c3047b5" \
+  --document-name "AWS-RunShellScript" \
+  --region ap-northeast-2 \
+  --parameters 'commands=["export HOME=/home/ubuntu && cd /home/ubuntu/tokscale/self-host && docker compose exec app npx drizzle-kit push --force"]' \
+  --timeout-seconds 120 \
+  --output json
</file context>

@leecoder leecoder force-pushed the feat/report-task-grouping branch 2 times, most recently from 7657640 to f00d277 Compare June 1, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant