xtract.bot
POST /api/pdf-inspect

Inspect a PDF without parsing its contents: page count, page dimensions, title, author, subject, keywords, creator, producer, creation and modification dates.

Returns metadata about a PDF without extracting its content: - Page count and per-page dimensions (width × height in points). - Document info dictionary: title, author, subject, keywords, creator (the application that produced the source), producer (the library that wrote the PDF), creation / modification dates. - Encryption flags and permission bits. - Whether the PDF is linearized (web-optimised). Useful for content-management workflows that need to display metadata without paying the cost of full text extraction.

Inputs

NameTypeDefaultDescription
pdf*filePDF document bytes.

Response

Modes: json. Cache: yes (24h TTL).

Code samples

Built from the hello example.

# Download or substitute the example input:
#   curl -O https://xtract.bot/examples/pdf-inspect/hello.pdf
PDF=$(base64 -w0 < hello.pdf)

curl -X POST https://api.xtract.bot/api/pdf-inspect \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "X-Account-Id: $XTRACT_ACCOUNT_ID" \
  -H "X-Api-Key: $XTRACT_API_KEY" \
  -d '{
  "pdf": "'"$PDF"'"
}'