Surgical, fidelity-preserving DOCX editing for AI agents — and for you.
One deterministic core that edits OOXML directly (unzip → patch XML → rezip), exposed through three thin faces: an MCP server, a Python package (docxengine), and a JS/TS package (@docxengine/core). Agents see a token-efficient, Markdown-like projection with content-hash-anchored paragraph IDs — never raw XML.
Quickstart · Concepts · Tool reference · Architecture · MCP server · Docs · Roadmap
- Overview
- Features
- Why DocxEngine
- What DocxEngine is not
- Architecture
- The agent view
- Getting started
- Documentation
- Repository layout
- Roadmap & status
- Contributing
- Community & support
- License
Every mainstream DOCX library has a disqualifying gap for agent use: python-docx has no tracked-changes support (open since 2016), docx-js is generation-focused, docxtemplater is template-bound, Pandoc round-trips are lossy, and LibreOffice headless is heavyweight. The only approach that preserves tracked changes, comments, and footnotes is editing the OOXML directly — the same strategy Anthropic's docx skill and the strongest MCP servers converged on.
DocxEngine packages that strategy as a reusable engine:
- A deterministic core (no LLM inside) that models the OPC/ZIP package, patches the XML DOM, coalesces split runs, writes real
w:ins/w:delredlines, and validates every edit against OOXML before saving — so Word never silently "repairs" your file. - An agent-computer interface of ~16 high-leverage, namespaced tools (
docx_search,docx_replace,docx_revision, …) with structured, corrective errors and idempotent semantics. - Stable addressing via content-hash anchors (
P12#a7b2) — becausew14:paraIdis not spec-guaranteed stable across Word save cycles and is absent from docs written by non-Word tools. - A verification loop: render-to-PDF/PNG previews (via a pluggable LibreOffice adapter) so agents can self-check their edits.
- Fidelity-preserving surgical edits — replace, insert, delete, and rewrite paragraphs in arbitrary existing documents without disturbing tracked changes, comments, footnotes, styles, or media.
- Real redlines — first-class tracked-change writing (
track_changes: true, author: "..."), plus accept/reject filtered by author or date. - Token-efficient reading — outline first, then paginated, Markdown-like projections with only salient formatting; raw OOXML is never shown by default.
- Hash-anchored addressing — every paragraph gets a
P{index}#{hash}anchor validated before each edit; edits return fresh anchors so agents never re-list mid-batch. - Always-on validation gate — ID uniqueness, orphaned relationships, dangling footnotes, and content-type errors are caught before save, with auto-repair where safe.
- Comments, tables, styles, sections, lists, media, fields, templates — the full capability surface is implemented: threaded comments with resolve state, style-definition edits, mustache template merge with loops, Markdown↔docx conversion, and field-code insertion.
- Triple distribution — MCP server (stdio + Streamable HTTP),
pip install docxengine,npm install @docxengine/core; the published JSON Schemas plug into any framework, with a thin OpenAI function-calling adapter included. - One conformance-tested contract — the Python and TypeScript implementations are kept honest by a shared JSON tool contract and a cross-implementation conformance corpus.
Agents are a new class of end-user, and tools must be designed for them rather than wrapped from existing APIs (SWE-agent, NeurIPS 2024). Raw OOXML is distracting context; agents can't "see" the rendered page; and naive find-and-replace fails because Word fragments text across run boundaries. DocxEngine applies the resulting design principles end to end:
| Principle | How DocxEngine applies it |
|---|---|
| Simple, few, high-leverage tools | ~16 namespaced tools across 5 groups, not a 1:1 API wrapper |
| Guarded actions | every edit is hash-validated and OOXML-validated before it lands |
| Token economy | outline → windowed reads, concise/detailed formats, ~25k-token response cap |
| Feedback loops | structured corrective errors + render-based visual self-check |
| Determinism | the core contains no LLM; the same call on the same document yields the same bytes |
- Not a renderer. Fields, TOC entries, and page numbers only materialize when Word or LibreOffice renders; the engine inserts and updates field codes and tells agents so explicitly.
- Not a template DSL.
docx_template_fillcovers mustache-style merge with loops and conditions, but DocxEngine's center of gravity is arbitrary surgical edits of existing documents. - Not a python-docx/docx-js wrapper. Those libraries drop the document features this project exists to preserve; they appear at most in narrow create paths.
- Not Word automation. No COM, no Office.js host, no GUI — server-side and offline by design.
┌──────────────────────────────────────────────────────────────┐
│ Integration faces (thin) │
│ 1. MCP server (stdio + streamable-HTTP) │
│ 2. Python package (docxengine) — JSON-in/JSON-out + native │
│ 3. JS/TS package (@docxengine) — JSON-in/JSON-out + native │
│ + OpenAI function-calling adapter (thin) │
├──────────────────────────────────────────────────────────────┤
│ Core engine (deterministic, no LLM) │
│ • OPC/ZIP package model • Style cascade resolver │
│ • XML DOM patcher • Numbering resolver │
│ • Run-coalescing find/replace• Tracked-change writer │
│ • Content-hash anchor index • Comment/footnote part manager │
│ • Markdown projector (read) • OOXML validator + repairer │
│ • Render adapter (LibreOffice/Word) for verification │
└──────────────────────────────────────────────────────────────┘
v1 ships parallel Python and TypeScript implementations against a shared JSON contract (spec/) and a shared conformance corpus — pure-pip and pure-npm installs with zero native toolchain. A Rust/WASM core unification is a v2 evaluation. The full reasoning, including the addressing design and tool surface, is in ARCHITECTURE.md.
Agents never see raw OOXML. Reads return a Markdown-like projection annotated with stable anchors and only the formatting that matters:
[P1#a7b2 H1] Master Services Agreement
[P2#f3c1] This Agreement is entered into as of {{EffectiveDate}}...
[P3#b2c4 H2] 1. Definitions
[P4#d4e5] "Confidential Information" means... [comment:C1 by J.Doe]
[T1 3×4 @after:P5] | Term | Value | ... |
[P12#e7f8 List:ol L1] First obligation
A typical edit flow:
→ docx_revision {"doc_id":"d1","op":"accept","filter":{"author":"Jane Doe"}}
← {"accepted":12,"remaining_by_author":{"Bob":3},"note":"Resolved <w:ins>/<w:del> for Jane Doe; Bob's 3 revisions untouched."}See Concepts for anchors, projection, and the validation gate, and the tool reference for all tools.
Not yet published to PyPI/npm — install from source. All 24 tools work today; see the Quickstart and examples/.
git clone https://github.com/ruwadgroup/docxengine.git && cd docxengine
# Python (+ the MCP server entry point)
pip install -e python
# JS/TS
pnpm install && pnpm --dir js build
# MCP (Claude Desktop / any MCP client) — stdio
docxengine-mcp
# Claude Code
claude mcp add docx -- docxengine-mcpOver MCP the engine is file-first: tools take a file path and every edit is validated and saved back automatically — no handles to track, no save step. The Python/JS packages keep an in-memory doc_id/bytes handle (the right fit for embedding, including browser JS); see the SDK docs.
| Lane | What you'll find |
|---|---|
| Start | Installation, quickstart flows, core concepts |
| Core | OOXML pitfalls, anchors, projection, tracked changes, validation, rendering |
| Tools | The full agent-computer interface, group by group, plus error design |
| MCP | Transports, resources, session state, scaling |
| SDKs | Python & JS packages, framework adapters |
| Conformance | Round-trip fidelity corpus, agent task benchmark |
| Research | Prior art, key findings, competitive landscape |
| Reference | Glossary, tool schemas, error codes |
Start at docs/README.md.
docxengine/
├── spec/ # Language-agnostic JSON tool contract (the source of truth)
├── python/ # docxengine — Python implementation (pip)
├── js/ # @docxengine/core — TypeScript implementation (npm)
├── conformance/ # Shared corpus + cross-implementation harness
├── examples/ # End-to-end agent flows
├── docs/ # Design docs, tool reference, guides
└── .github/ # CI, release, security scanning, templates
Phases 0–2 complete; current phase: 3 — Hardening. All 24 tools are implemented and conformance-tested in both languages: 455 Python tests, 342 TS tests, 31/31 cross-implementation parity cases, and a 10-task agent benchmark passing end-to-end over the file-first MCP server with zero tool errors and zero Word-repair events. Remaining: benchmark comparisons against the python-docx and raw-XML baselines, fuzzing, large-document streaming, and cross-renderer fidelity. Full plan with decision thresholds: ROADMAP.md.
Contributions are welcome — especially conformance corpus documents, OOXML edge-case reports, and benchmark tasks. Read CONTRIBUTING.md for the ground rules (the invariants), development setup, and commit conventions (Conventional Commits with enforced scopes).
- Bugs & features — GitHub issues (structured templates)
- Security reports — privately, per SECURITY.md
- Governance — GOVERNANCE.md
Apache-2.0. DocxEngine optionally shells out to external renderers/converters under their own licenses — see LICENSING.md.