(Re-)Fetching Processing Results and Artifacts#

Even though the POST /processing/{workflow} endpoint already returns processing results in its response, there might be a need to re-retrieve individual results of previously submitted Documents later on.

For this reason we provide a set of additional GET /processing/results/{processing_id}/... endpoints that provide access to each available result of a successful processing request individually.

Hint

The collection of endpoints on this page return single processing results and other workflow artifacts. If you want to re-retrieve the full response as returned by the POST /processing/{workflow_key}endpoint, please have a look at this section.

Common Signature#

Parameters#

All of the available sub-result endpoints require the following common mandatory parameter:

Field	Type	Description
`{processing_id}`	UUID	It is passed as a part of the query path and identifies a successful processing request. This is the same `processing_id` that is returned in the response body of `POST /processing/{workflow}`.

Responses#

All available sub-result endpoints respond with the following HTTP-status-codes in line with the POST /processing/{workflow}, with one exception: Instead of HTTP 201 the result endpoints respond with HTTP 200 when successfully returning the result.

HTTP-Status-Code	Description
200	Processing result is available and returned.
202	Processing request is still in progress.
401	Credentials could not be validated.
402	Reached a processing limit based on the account's plan.
403	Credentials could not be authorized for this processing request.
404	No such workflow.
413	Uploaded document is too large or has too many pages.
422	Validation Error (most likely regarding the request payload).
429	Temporary rate limit exceeded or uploaded document too large.
500	Processing request failed to complete.

Please check the section about advanced response handling for further details about how to avoid or handle the different response codes.

Hint

Not only the semantics, also the response schemas for all the HTTP-Status-Codes above 200 are identical to the POST /processing/{workflow} to provide a maximum of consistency. On success (200), the response schema depends on the specific requested sub-result.

Hint

Please note that the GET /processing/results/{processing_id}/... endpoint described on this page slightly differs from the POST /processing/{workflow} by returning an HTTP code 200 instead of HTTP 201 for the regular success response.

Individually Requestable Results#

OCR#

GET /processing/results/{processing_id}/ocr

Retrieves the ocr processing result.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to OCR Format.

HOCR#

GET /processing/results/{processing_id}/hocr

Retrieves the ocr information for a given process instance, formatted as html, containing a visual representation of the same information as provided by the OCR result.

Response Details#

On success, this endpoint returns a text/html document.

Extractions#

GET /processing/results/{processing_id}/extractions

Retrieves the extractions processing result. The kind of available extractions formats depends on the workflow that extracted it and potentially also the detected class of the submitted document. See also Extractions Format

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Extractions Format.

List of Generated Page Images#

GET /processing/results/{processing_id}/page-images

Each document - independent of the original upload format, will usually be converted into a standard image format that is suitable to be processed page by page by our AI. This endpoint retrieves the page-images processing result, which contains a list of paths where each of these page-images can be downloaded. See GET /processing/results/{processing_id}/page-images/{page_num}.

Response Details#

On success, this endpoint returns an application/json document of the following form:

Response JSON

{
  "pages": [
    "/processing/results/{processing_id}/page-images/1",
    "/processing/results/{processing_id}/page-images/2",
    ...
  ]
}

Individual Page Images#

GET /processing/results/{processing_id}/page-images/{page_num}

Retrieves a single page image generated during processing, specified by the given page number. This endpoint corresponds to the download paths returned by GET /processing/results/{processing_id}/page-images.

Additional Parameters#

Field	Type	Description
`{page_num}`	int	It is passed as a part of the query path and identifies the page number for which the page image should be retrieved.
`width`	int (optional)	It can passed as a query parameter to scale the image down to the provided width maintaining aspect ratio. If left empty, and height is provided, the image will be scaled with respect to `height`.
`height`	int (optional)	It can passed as a query parameter to scale the image down to the provided height maintaining aspect ratio. If left empty, and width is provided, the image will be scaled with respect to `width`.

Hint

If the given width and height don't match the original aspect ratio, the image will be scaled with respect to the larger dimension while keeping the aspect ratio.

Response Details#

On success, this endpoint returns a file of Content-Type: image/*, representing the respective document page, as it has been used for further processing.

Thumbnail#

GET /processing/results/{processing_id}/thumbnail

Retrieves a thumbnail that is automatically generated from the first page image of the document.

Response Details#

On success, this endpoint returns a file of Content-Type: image/*.

Document Splitting#

GET /processing/results/{processing_id}/document-splitting

Retrieves the document splitting result generated. Is only available as part of workflows that implement document splitting.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Document Splitting Format.

Split PDFs#

GET /processing/results/{processing_id}/split-pdfs

Generated as part of the document splitting process, providing URL maps to download the split PDFs. Similar to the document splitting result, this result is only available as part of workflows that implement document splitting.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Split PDF Format.

Individual Split PDFs#

GET /processing/results/{processing_id}/split-pdfs/{index}

Retrieves a PDF file for a sub-document based on the results of the document splitting process, specified by the given split point index. This endpoint corresponds to the download paths returned by GET /processing/results/{processing_id}/split-pdfs.

Additional Parameters#

Field	Type	Description
`return_pages_with_ocr_data`	bool (optional)	Defines whether instead of returning split original PDF pages, the pages should be rendered with OCR overlay information. If set to `true`, the pages will be rendered based on the pre-processing results of cropping and OCR. To customize image quality and archival compliance of the rendered pages, this parameter must be set to `true`. The default value is `false`.
`jpeg_quality`	int (optional)	Defines the quality of the pdf images. The higher the value, the better the image quality and bigger the output file size. Valid range is from 1 to 100. Only applicable if `return_pages_with_ocr_data` is `true`.
`pdfa_compliant`	bool (optional)	Defines if the PDF should be made PDF/A compliant. Only applicable if `return_pages_with_ocr_data` is `true`.
`image_height`	int (optional)	The size of the rendered images' short side in pixels. For portrait orientation images, this corresponds to the width. If `image_height` is not given, it will be calculated from `dpi`, `page_height` and `page_width`. If `page_height` and `page_width` are not given, `image_height` will be derived from the original image aspect ratio. Only applicable if `return_pages_with_ocr_data` is `true`.
`image_width`	int (optional)	The size of the rendered images' short side in pixels. For portrait orientation images, this corresponds to the width. If `image_height` is not given, it will be calculated from `dpi`, `page_height` and `page_width`. If `page_height` and `page_width` are not given, `image_height` will be derived from the original image aspect ratio. Only applicable if `return_pages_with_ocr_data` is `true`.
`page_height`	float (optional)	The size of the PDF pages' long side in millimeters. For portrait orientation pages, this corresponds to the height. If specified, `page_width` will be required. Only applicable if `return_pages_with_ocr_data` is `true`.
`page_width`	float (optional)	The size of the PDF pages' short side in millimeters. For portrait orientation pages, this corresponds to the width. If specified, `page_height` will be required. Only applicable if `return_pages_with_ocr_data` is `true`.
`dpi`	int (optional)	The resolution of the rendered PDF document in pixels per inch. The default value is 220. Higher values result in better image quality but also increase file size. Only applicable if `return_pages_with_ocr_data` is `true`.

Hint

For self-trained splitting models that are configured to crop uploaded documents, keep in mind the following special behavior of the endpoint when the return_pages_with_ocr_data parameter is set to the default value false:

If you have trained your splitting model to split scanned pages that contain e.g. two receipts on a single page into two sub-documents, then GET /processing/results/{processing_id}/split-pdfs/{index} will return the same original PDF page containing both receipts for both sub-document indices. Please consider using the endpoint with the setting true for return_pages_with_ocr_data if your use case requires different PDFs for sub-documents that are cropped from the same page.

Response Details#

On success, this endpoint returns an application/pdf response.

Info

The GET /processing/results/{processing_id}/sub-pdfs and GET /processing/results/{processing_id}/sub-pdfs/{page_num} endpoints have been deprecated in favor of their split-pdfs counterparts. You can find more information about the deprecated endpoints in the section about compatibility APIs for existing applications.

More to come... #

Not yet implemented.

We're currently working on building additional results endpoints for more processing results and artifacts, stay tuned!