| description | Guide for reducing token consumption in agentic workflows — DataOps, gh-proxy, inline sub-agents, caveman experiments, and audit-based measurement. |
|---|
Apply these in order — each check can halve costs:
- Cheap triage first: classify duplicates, stale items, low-value events, and known cases before escalating
- Frontier model as planner: use frontier models for planning, synthesis, ambiguous decisions, and final judgment — not bulk extraction
- DataOps: Move data fetching into
steps:— agent reads compact JSON, not raw API responses - gh-proxy: Set
tools.github.mode: gh-proxy— skips Docker MCP server startup and extra tool definitions - cli-proxy: Mount additional MCP servers as CLIs via
cli-proxy: true— agent pipes output throughjqbefore it enters context - Sub-agents: Delegate repetitive per-item tasks to
model: smallsub-agents (~10–20× cheaper) - Prompt size: Strip redundant instructions, examples, and pleasantries from the prompt body
- Dynamic context: Inject only required fields —
${{ github.event.issue.number }}not the full event payload - Pull context on demand: query logs/data only after a hypothesis forms; avoid preloading large raw dumps into the initial prompt
- Prompt caching: Put stable instructions before dynamic content to maximize cache hits
- Context hygiene: keep the orchestrator context compact; prefer short worker summaries over raw output
- Cadence: If the result is not time-sensitive, schedule less often (
hourly→daily,daily→weekly) - Batching: Prefer scheduled batch processing over reactive events when delayed processing is acceptable
- Telemetry: Configure
observability.otlpso token usage and run phases are measurable outside individual run logs - AgenticOps: Add
copilot-token-audit/copilot-token-optimizerworkflows so the repository keeps finding waste automatically - Measure first: Back every change with an
experiments:field andmetric: "aic"before promoting
Using a more capable frontier model can reduce total cost when the workflow architecture prevents unnecessary invocations and keeps expensive context narrow.
Guidance:
- use the frontier model for planning, hypothesis selection, synthesis, ambiguous decisions, and final judgment
- do not spend frontier-model turns on repetitive extraction, duplicate detection, or broad first-pass scanning
- add a cheap triage stage for known/duplicate/stale/low-value events and stop with
noopor another safe output when escalation is unnecessary - escalate to the frontier model only when triage is uncertain or the case is genuinely new/high-value
- cap sub-agent fan-out so escalations cannot recurse or expand without bound
Do not claim that frontier models are always cheaper. Cost wins come from architecture and selective execution, not model tier alone.
Avoid front-loading large raw context into the initial prompt when data can be fetched on demand.
Prefer:
- deterministic pre-steps that materialize compact files under
/tmp/gh-aw/ gh+ filtering commands (jq,grep, focused selectors) before context is exposed to the model- query interfaces and pre-aggregated summaries instead of full API payloads
- directed tool calls issued after the agent forms a hypothesis
Warning on anchoring: if authors preselect raw logs or large payloads too early, the model may over-focus on that material and miss the actual cause elsewhere.
gh aw audit <run-id> --jsonKey fields in the output:
agent_usage.aic— AI Credits (AIC), the normalized cost metric (1 AIC = $0.01; accounts for model price differences and cache discounts)agent_usage.input_tokens/agent_usage.output_tokens— raw token countsagent_usage.cache_read_tokens/agent_usage.cache_write_tokens— tokens served from the prompt cache
Equivalent via MCP: audit tool with run_id: <run-id>.
Treat optimization as successful only when quality remains acceptable. A quality regression is a failure even if AI Credits decrease.
gh aw audit <base-run-id> <optimized-run-id> --json
# Or compare multiple variants at once:
gh aw audit <base-run-id> <variant-a-run-id> <variant-b-run-id> --jsonThe diff highlights changes in AI credits, tool calls, and safe outputs between runs. Equivalent via MCP: audit tool with run_ids_or_urls: ["<base-run-id>", "<optimized-run-id>"].
gh aw audit <run-id> downloads all artifacts into logs/run-<run-id>/. Read the per-call breakdown from there:
gh aw audit <run-id>
cat logs/run-<run-id>/firewall-audit-logs/api-proxy-logs/token-usage.jsonlEach line is one API call with model, input_tokens, output_tokens, cache_read_tokens, and cache_write_tokens — useful for finding the most expensive calls.
The single biggest optimization. Replace agentic data fetching with deterministic shell commands in steps:. Shell steps run outside the AI sandbox (no tokens) and produce structured output the agent reads directly.
---
engine: copilot
tools:
github:
mode: gh-proxy
toolsets: [default, pull_requests]
---
Fetch all open PRs in ${{ github.repository }}, compute the merge rate, identify authors with the most contributions, and create a weekly summary discussion.---
engine: copilot
tools:
github:
mode: gh-proxy
bash: ["*"]
steps:
- name: Fetch and aggregate PR data
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
mkdir -p /tmp/gh-aw/data
gh pr list --repo "${{ github.repository }}" \
--state all --limit 100 \
--json number,title,state,author,createdAt,mergedAt,additions,deletions \
> /tmp/gh-aw/data/prs.json
jq '{
total: length,
merged: [.[] | select(.state=="MERGED")] | length,
open: [.[] | select(.state=="OPEN")] | length,
top_authors: ([.[].author.login] | group_by(.) | map({author:.[0], count:length}) | sort_by(-.count) | .[0:5])
}' /tmp/gh-aw/data/prs.json > /tmp/gh-aw/data/stats.json
safe-outputs:
create-discussion:
title-prefix: "[weekly-pr] "
category: "General"
close-older-discussions: true
---
Read the pre-computed stats at `/tmp/gh-aw/data/stats.json` and `/tmp/gh-aw/data/prs.json`.
Create a concise weekly PR summary discussion.Shell steps run outside the AI sandbox (zero tokens); the agent only reads compact aggregated JSON.
Best practices:
- Write one JSON file per data source; use
jqto pre-aggregate - Store files under
/tmp/gh-aw/— this directory is available to the agent - Document file locations and schema in the prompt body so the agent doesn't need to explore
See also: DataOps pattern docs
tools:
github:
mode: gh-proxy # ✅ preferred — pre-authenticated gh CLI, no MCP server startup
toolsets: [default]The agent reads GitHub data with gh issue list, gh pr view, etc., and can pipe through jq before the data enters context. The alternative mode: local starts a Docker-based MCP server with startup latency and verbose tool results.
When a workflow uses additional MCP servers (e.g., a custom Notion or Slack MCP), cli-proxy: true mounts each server as a standalone CLI tool on PATH:
tools:
cli-proxy: true
github:
mode: gh-proxy
my-custom-mcp:
...With cli-proxy, the agent calls my-custom-mcp <tool> <args> from bash and can pipe output through jq or grep to extract only the fields it needs — instead of receiving the full MCP tool response in the conversation context.
Summary:
| Mode | Docker startup | Extra tool definitions | Agent output processing |
|---|---|---|---|
mode: local + MCP tools |
Yes | Yes | Tool result (full JSON) |
mode: gh-proxy + bash |
No | No | Agent pipes through jq |
cli-proxy: true + bash |
Yes (once) | Reduced | Agent pipes through jq |
Sub-agents with model: small cost 10–20× less than the parent model. Use them for classification, one-sentence summarization, structured extraction, and scoring; reserve the large model for synthesis.
steps: → deterministic shell (zero AI tokens)
sub-agents: → small model per item (cheap, parallelizable)
main agent: → synthesizes compact sub-agent results (one high-quality pass)
---
engine: copilot
tools:
cli-proxy: true
github:
mode: gh-proxy
bash: ["*"]
steps:
- name: Split issues into per-item files
run: |
mkdir -p /tmp/gh-aw/issues
gh issue list --repo "${{ github.repository }}" \
--state open --limit 50 \
--json number,title,body,labels \
| jq -c '.[]' \
| while IFS= read -r issue; do
num=$(echo "$issue" | jq -r '.number')
echo "$issue" > /tmp/gh-aw/issues/issue-${num}.json
done
---
## Step 1 — classify each issue
For every `/tmp/gh-aw/issues/issue-*.json`, use the `classifier` agent.
Write output to `/tmp/gh-aw/issues/cat-<number>.json`.
## Step 2 — synthesize
Read all `cat-*.json` files and create a triage report grouped by category.
## agent: `classifier`
---
description: Classifies a GitHub issue into a single category
model: small
---
Read the JSON file provided. Return only:
`{"number": <n>, "category": "bug|feature|question|docs|security|other"}`
Nothing else.Why this saves tokens: sub-agents run on the cheap small model; the main agent only reads compact {"number":…, "category":…} JSON; sub-agent dispatches can run in parallel.
Sub-agent model aliases:
| Alias | Use when |
|---|---|
small |
Classification, extraction, one-sentence summaries, scoring |
large |
Complex reasoning, multi-step synthesis, code generation |
inherited |
Sub-agent needs same capability as the parent (default) |
Always use aliases, not model IDs — aliases resolve to the best available model per provider.
See also: Inline Sub-Agents
Use an A/B experiment to compare a verbose prompt against a stripped-down minimal one. If the minimal variant produces equally useful output, adopt it permanently.
experiments:
prompt_style: [verbose, minimal]{{#if experiments.prompt_style == "verbose" }}
Please analyze all of the open issues in this repository and provide a comprehensive, detailed report covering: the number of open issues, any significant trends or patterns you observe, the most frequently occurring labels, the oldest unresolved issues, a prioritized list of the most critical items, and any recommendations for the team.
{{#else}}
List open issues by priority. Top 5 critical items. Be brief.
{{/if}}Measure AI Credits (AIC) in each variant's run summary or via gh aw audit. If the minimal variant uses fewer AI Credits at acceptable quality, promote it as the baseline.
Declare an experiment before making any prompt or configuration change, and compare before/after cost and quality. Run ≥ 20 cycles per variant for statistical significance on high-frequency workflows.
experiments:
optimization_v1:
variants: [control, optimized]
description: "DataOps refactor — move issue fetching to steps:"
metric: "aic"
issue: "123"Reference the active variant in the prompt:
{{#if experiments.optimization_v1 == "optimized" }}
Read the pre-fetched data from `/tmp/gh-aw/data/`.
{{#else}}
Fetch open issues from ${{ github.repository }} using the GitHub tools.
{{/if}}After enough runs:
- Compare variants using
gh aw audit <control-run-id> <optimized-run-id> - Inspect
aic,input_tokens,output_tokens,cache_read_tokens, andcache_write_tokens - Validate output quality and decision accuracy against the control run
- If the optimized variant wins on cost and quality, rewrite the baseline prompt and remove the
experiments:field
Key experiment dimensions for token optimization:
| Dimension | Example variants |
|---|---|
| Prompt verbosity | verbose / concise / minimal |
| Data source | agentic-fetch / dataops-steps |
| Model tier | Run separate workflows for each engine |
| Sub-agent usage | single-agent / with-subagents |
| Tool mode | mcp-local / gh-proxy |
See also: A/B Testing Experiments
The cheapest run is the one you don't execute. If a workflow doesn't need near-real-time feedback, run it less often and batch items in one pass.
hourly→daily on weekdaysfor team-facing summaries or auditsdaily→weeklyfor trend reports, optimization reviews, and backlog hygieneevery N hours→ a daily or weekly batch when the workflow only produces guidance or reports
Reactive triggers (issues:, pull_request:, comment commands) suit immediate feedback. Otherwise prefer schedule: daily on weekdays and batch work. Typical batch-friendly tasks: triage summaries, stale backlog review, token audits, security digests. Combine with cache-memory or repo-memory to track processed items so each run only handles new ones.
Export telemetry automatically and add workflows that keep finding token waste over time.
Add workflow-level OpenTelemetry export so each run emits token and phase data to your observability backend:
observability:
otlp:
endpoint: ${{ secrets.GH_AW_OTEL_ENDPOINT }}
headers: ${{ secrets.GH_AW_OTEL_HEADERS }}Setup, agent, and conclusion spans carry token usage attributes — compare runs over time and validate optimizations post-rollout. See Frontmatter syntax.
copilot-token-audit— scheduled audit of token usage across workflowscopilot-token-optimizer— scheduled follow-up that identifies one expensive workflow and proposes concrete savings
Loop: export OTEL → summarize usage → open optimization issues → re-measure. See .github/workflows/ for examples.
Prompt caching is automatic via the AWF gateway. Cached input tokens are weighted at 0.1 versus 1.0 for uncached input — repeated context (system prompt, shared preamble) costs ~10× less when cached.
To maximize cache hits:
- Keep stable content at the top of the prompt — instructions that don't change between runs (role, output format, schema) before dynamic content (issue body, event context).
- Use
cache-memoryfor workflows that re-read the same large knowledge base across runs; avoids duplicate context every turn. - Minimize dynamic context — inject only the fields the agent needs:
${{ github.event.issue.number }}instead of the full event payload.
Two top-level frontmatter fields enforce AI Credit budgets directly, independent of the techniques above. Both accept an integer or a K/M short-form string (e.g. 100M, 500K). Typical workflow range: 100 to 2500.
max-ai-credits:— Per-run AI credit budget enforced by the AWF firewall/API proxy (default1000). The agent is steered to stay within budget; set a negative value to disable enforcement and steering.max-daily-ai-credits:— Per-user 24-hour guardrail. At activation, gh-aw sums the triggering user's AI credits across their runs of this workflow over the last 24 hours and blocks execution once the total exceeds the threshold. Enabled by default with a system default threshold; set-1to disable, or an explicit value to override the default.
max-ai-credits: 100M # per-run cap (short-form string)
max-daily-ai-credits: 500M # per-user 24h cap; -1 disablesFor custom or private models, the top-level models: frontmatter field supplies pricing in the same structure as models.json (keyed providers.<provider>.models.<model>.cost with input/output/cache_read/cache_write per-token costs). Entries are merged with the built-in models.json at runtime — they override matching models and fill gaps for unknown ones — so AI Credit accounting stays accurate for models gh-aw does not price by default.
| Topic | File |
|---|---|
| Inline sub-agents syntax | subagents.md |
| A/B experiments | experiments.md |
| Persistent memory | memory.md |
| DataOps pattern | DataOps guide |
| Audit command reference | cli-commands.md |
| Frontmatter syntax | syntax.md |