xtract.bot
POST /api/document-prepare-for-ocr

Prepare a document scan for OCR: deskew, binarize, denoise, contrast-stretch. Improves recognition rates on noisy phone-camera captures and skewed scans.

Cleans up a document image so the OCR engine has the best possible chance. Apply this before `document-ocr` for noisy inputs — phone-camera captures, photographed receipts, faxed pages. The pipeline is: - Deskew (auto-detect rotation up to ±15° and straighten). - Convert to greyscale. - Adaptive threshold to clean black-on-white binary. - Despeckle (remove isolated stray pixels). Output is a PNG ready to feed straight into `document-ocr`.

Inputs

NameTypeDefaultDescription
image*fileInput document image.
deskewbooleantrueRotate to correct skew.
maxAnglenumber (0…90)15Cap on detected skew magnitude (degrees).
thresholdModeenum (otsu | adaptive)"otsu"Binarisation strategy.
denoisebooleantrueLight Gaussian blur before threshold.

Response

Modes: binary, json. Cache: yes (24h TTL).