xtract.bot
POST /api/docx-to-html

Convert a Microsoft Word .docx to semantic HTML. Headings, lists, tables, bold / italic, and embedded images are preserved; complex Word features are dropped.

Reads a .docx Word document and returns it as clean HTML. Headings, paragraphs, lists (ordered + unordered), tables, bold, italic, underline, and links are preserved. Embedded images are inlined as data URIs. Word-specific concepts that do not have a clean HTML equivalent (track changes, comments, complex page-layout, footnotes, math equations) are dropped silently. The output is plain semantic HTML you can paste into a CMS or feed to a Markdown converter.

Inputs

NameTypeDefaultDescription
docx*fileInput.docx bytes.
inlineImagesbooleanfalseWhen true, embed images inline as data: URLs. When false, images are dropped.

Response

Modes: json, binary. Cache: yes (24h TTL).