Skip to content

ruwadgroup/docxengine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DocxEngine

Surgical, fidelity-preserving DOCX editing for AI agents — and for you.

One deterministic core that edits OOXML directly (unzip → patch XML → rezip), exposed through three thin faces: an MCP server, a Python package (docxengine), and a JS/TS package (@docxengine/core). Agents see a token-efficient, Markdown-like projection with content-hash-anchored paragraph IDs — never raw XML.

License: Apache-2.0 CI Python ≥3.12 Node ≥22 MCP Conventional Commits Release

Quickstart · Concepts · Tool reference · Architecture · MCP server · Docs · Roadmap


Table of contents

Overview

Every mainstream DOCX library has a disqualifying gap for agent use: python-docx has no tracked-changes support (open since 2016), docx-js is generation-focused, docxtemplater is template-bound, Pandoc round-trips are lossy, and LibreOffice headless is heavyweight. The only approach that preserves tracked changes, comments, and footnotes is editing the OOXML directly — the same strategy Anthropic's docx skill and the strongest MCP servers converged on.

DocxEngine packages that strategy as a reusable engine:

  • A deterministic core (no LLM inside) that models the OPC/ZIP package, patches the XML DOM, coalesces split runs, writes real w:ins/w:del redlines, and validates every edit against OOXML before saving — so Word never silently "repairs" your file.
  • An agent-computer interface of ~16 high-leverage, namespaced tools (docx_search, docx_replace, docx_revision, …) with structured, corrective errors and idempotent semantics.
  • Stable addressing via content-hash anchors (P12#a7b2) — because w14:paraId is not spec-guaranteed stable across Word save cycles and is absent from docs written by non-Word tools.
  • A verification loop: render-to-PDF/PNG previews (via a pluggable LibreOffice adapter) so agents can self-check their edits.

Features

  • Fidelity-preserving surgical edits — replace, insert, delete, and rewrite paragraphs in arbitrary existing documents without disturbing tracked changes, comments, footnotes, styles, or media.
  • Real redlines — first-class tracked-change writing (track_changes: true, author: "..."), plus accept/reject filtered by author or date.
  • Token-efficient reading — outline first, then paginated, Markdown-like projections with only salient formatting; raw OOXML is never shown by default.
  • Hash-anchored addressing — every paragraph gets a P{index}#{hash} anchor validated before each edit; edits return fresh anchors so agents never re-list mid-batch.
  • Always-on validation gate — ID uniqueness, orphaned relationships, dangling footnotes, and content-type errors are caught before save, with auto-repair where safe.
  • Comments, tables, styles, sections, lists, media, fields, templates — the full capability surface is implemented: threaded comments with resolve state, style-definition edits, mustache template merge with loops, Markdown↔docx conversion, and field-code insertion.
  • Triple distribution — MCP server (stdio + Streamable HTTP), pip install docxengine, npm install @docxengine/core; the published JSON Schemas plug into any framework, with a thin OpenAI function-calling adapter included.
  • One conformance-tested contract — the Python and TypeScript implementations are kept honest by a shared JSON tool contract and a cross-implementation conformance corpus.

Why DocxEngine

Agents are a new class of end-user, and tools must be designed for them rather than wrapped from existing APIs (SWE-agent, NeurIPS 2024). Raw OOXML is distracting context; agents can't "see" the rendered page; and naive find-and-replace fails because Word fragments text across run boundaries. DocxEngine applies the resulting design principles end to end:

Principle How DocxEngine applies it
Simple, few, high-leverage tools ~16 namespaced tools across 5 groups, not a 1:1 API wrapper
Guarded actions every edit is hash-validated and OOXML-validated before it lands
Token economy outline → windowed reads, concise/detailed formats, ~25k-token response cap
Feedback loops structured corrective errors + render-based visual self-check
Determinism the core contains no LLM; the same call on the same document yields the same bytes

What DocxEngine is not

  • Not a renderer. Fields, TOC entries, and page numbers only materialize when Word or LibreOffice renders; the engine inserts and updates field codes and tells agents so explicitly.
  • Not a template DSL. docx_template_fill covers mustache-style merge with loops and conditions, but DocxEngine's center of gravity is arbitrary surgical edits of existing documents.
  • Not a python-docx/docx-js wrapper. Those libraries drop the document features this project exists to preserve; they appear at most in narrow create paths.
  • Not Word automation. No COM, no Office.js host, no GUI — server-side and offline by design.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Integration faces (thin)                                      │
│  1. MCP server (stdio + streamable-HTTP)                       │
│  2. Python package  (docxengine)   — JSON-in/JSON-out + native │
│  3. JS/TS package   (@docxengine)  — JSON-in/JSON-out + native │
│     + OpenAI function-calling adapter (thin)                   │
├──────────────────────────────────────────────────────────────┤
│  Core engine (deterministic, no LLM)                           │
│   • OPC/ZIP package model      • Style cascade resolver        │
│   • XML DOM patcher            • Numbering resolver            │
│   • Run-coalescing find/replace• Tracked-change writer         │
│   • Content-hash anchor index  • Comment/footnote part manager │
│   • Markdown projector (read)  • OOXML validator + repairer    │
│   • Render adapter (LibreOffice/Word) for verification         │
└──────────────────────────────────────────────────────────────┘

v1 ships parallel Python and TypeScript implementations against a shared JSON contract (spec/) and a shared conformance corpus — pure-pip and pure-npm installs with zero native toolchain. A Rust/WASM core unification is a v2 evaluation. The full reasoning, including the addressing design and tool surface, is in ARCHITECTURE.md.

The agent view

Agents never see raw OOXML. Reads return a Markdown-like projection annotated with stable anchors and only the formatting that matters:

[P1#a7b2  H1]            Master Services Agreement
[P2#f3c1]                This Agreement is entered into as of {{EffectiveDate}}...
[P3#b2c4  H2]            1. Definitions
[P4#d4e5]                "Confidential Information" means... [comment:C1 by J.Doe]
[T1  3×4 @after:P5]      | Term | Value | ... |
[P12#e7f8  List:ol L1]   First obligation

A typical edit flow:

→ docx_revision {"doc_id":"d1","op":"accept","filter":{"author":"Jane Doe"}}
← {"accepted":12,"remaining_by_author":{"Bob":3},"note":"Resolved <w:ins>/<w:del> for Jane Doe; Bob's 3 revisions untouched."}

See Concepts for anchors, projection, and the validation gate, and the tool reference for all tools.

Getting started

Not yet published to PyPI/npm — install from source. All 24 tools work today; see the Quickstart and examples/.

git clone https://github.com/ruwadgroup/docxengine.git && cd docxengine

# Python (+ the MCP server entry point)
pip install -e python

# JS/TS
pnpm install && pnpm --dir js build

# MCP (Claude Desktop / any MCP client) — stdio
docxengine-mcp

# Claude Code
claude mcp add docx -- docxengine-mcp

Over MCP the engine is file-first: tools take a file path and every edit is validated and saved back automatically — no handles to track, no save step. The Python/JS packages keep an in-memory doc_id/bytes handle (the right fit for embedding, including browser JS); see the SDK docs.

Documentation

Lane What you'll find
Start Installation, quickstart flows, core concepts
Core OOXML pitfalls, anchors, projection, tracked changes, validation, rendering
Tools The full agent-computer interface, group by group, plus error design
MCP Transports, resources, session state, scaling
SDKs Python & JS packages, framework adapters
Conformance Round-trip fidelity corpus, agent task benchmark
Research Prior art, key findings, competitive landscape
Reference Glossary, tool schemas, error codes

Start at docs/README.md.

Repository layout

docxengine/
├── spec/            # Language-agnostic JSON tool contract (the source of truth)
├── python/          # docxengine — Python implementation (pip)
├── js/              # @docxengine/core — TypeScript implementation (npm)
├── conformance/     # Shared corpus + cross-implementation harness
├── examples/        # End-to-end agent flows
├── docs/            # Design docs, tool reference, guides
└── .github/         # CI, release, security scanning, templates

Roadmap & status

Phases 0–2 complete; current phase: 3 — Hardening. All 24 tools are implemented and conformance-tested in both languages: 455 Python tests, 342 TS tests, 31/31 cross-implementation parity cases, and a 10-task agent benchmark passing end-to-end over the file-first MCP server with zero tool errors and zero Word-repair events. Remaining: benchmark comparisons against the python-docx and raw-XML baselines, fuzzing, large-document streaming, and cross-renderer fidelity. Full plan with decision thresholds: ROADMAP.md.

Contributing

Contributions are welcome — especially conformance corpus documents, OOXML edge-case reports, and benchmark tasks. Read CONTRIBUTING.md for the ground rules (the invariants), development setup, and commit conventions (Conventional Commits with enforced scopes).

Community & support

License

Apache-2.0. DocxEngine optionally shells out to external renderers/converters under their own licenses — see LICENSING.md.

About

AI-optimized DOCX manipulation engine: deterministic OOXML core, MCP server, and framework-agnostic Python/JS tools.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors