xtract.bot
POST /api/html-sanitize

Sanitize untrusted HTML: strip `<script>`, event handlers, and dangerous attributes. Whitelist-based — only known-safe elements and attributes survive.

Takes potentially-unsafe HTML (from a rich-text editor, an email body, a CMS, etc.) and returns markup that is safe to render. The sanitiser is whitelist-based: only known-safe elements survive (paragraphs, headings, lists, tables, common inline formatting, links, images), and only known-safe attributes are preserved on each. `<script>`, `<style>`, `<iframe>`, `onerror=`, `onclick=`, `javascript:` URLs etc. are all removed. Use this anywhere you embed user-supplied HTML in your own page.

Inputs

NameTypeDefaultDescription
html*stringUntrusted HTML to sanitize.
presetenum (strict | rich)"strict"`strict` (default) for user-generated content; `rich` for article-style content (allows headings, code blocks, tables).
allowedTagsstringOptional JSON object mapping tag → array of allowed attributes. When set, replaces the preset's tag list entirely.
stripDisallowedbooleanfalseWhen true, drop disallowed tags entirely (with their content for <script>/<style>). When false (default), escape them as text.

Response

Modes: json. Cache: yes (24h TTL).

Code samples

Built from the strip-script example.


curl -X POST https://api.xtract.bot/api/html-sanitize \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "X-Account-Id: $XTRACT_ACCOUNT_ID" \
  -H "X-Api-Key: $XTRACT_API_KEY" \
  -d '{
  "html": "<p>Hello <script>alert(\"xss\")</script> world!</p>",
  "preset": "strict"
}'