PDF → text
Try it interactively →POST /api/pdf-extract-textExtract plain text from a PDF over HTTP. Multi-page documents, encrypted-but-readable PDFs, and embedded fonts are all handled. Image-only / scanned PDFs return empty (use OCR for those).
Inputs
| Name | Type | Default | Description |
|---|---|---|---|
| pdf* | file | — | PDF document bytes. |
| keepLayout | boolean | false | Pass `-layout` to to preserve visual layout. |
| quiet | boolean | true | Pass `-q` to suppress non-fatal stderr from. |
Response
Modes: text, json. Cache: yes (24h TTL).
Code samples
Built from the hello example.
# Download or substitute the example input:
# curl -O https://xtract.bot/examples/pdf-extract-text/hello.pdf
PDF=$(base64 -w0 < hello.pdf)
curl -X POST https://api.xtract.bot/api/pdf-extract-text \
-H "Content-Type: application/json" \
-H "Accept: text/plain" \
-H "X-Account-Id: $XTRACT_ACCOUNT_ID" \
-H "X-Api-Key: $XTRACT_API_KEY" \
-d '{
"keepLayout": false,
"quiet": true,
"pdf": "'"$PDF"'"
}'