Skip to content

(Re-)Fetching Processing Results and Artifacts#

Even though the POST /processing/{workflow} endpoint already returns processing results in its response, there might be a need to re-retrieve individual results of previously submitted Documents later on.

For this reason we provide a set of additional GET /processing/results/{processing_id}/... endpoints that provide access to each available result of a successful processing request individually.

Hint

The collection of endpoints on this page return single processing results and other workflow artifacts. If you want to re-retrieve the full response as returned by the POST /processing/{workflow_key}endpoint, please have a look at this section.

Common Signature#

Parameters#

All of the available sub-result endpoints require the following common mandatory parameter:

FieldTypeDescription
{processing_id} UUID It is passed as a part of the query path and identifies a successful processing request. This is the same processing_id that is returned in the response body of POST /processing/{workflow}.

Responses#

All available sub-result endpoints respond with the following HTTP-status-codes in line with the POST /processing/{workflow}, with one exception: Instead of HTTP 201 the result endpoints respond with HTTP 200 when successfully returning the result.

HTTP-Status-Code Description
200 Processing result is available and returned.
202 Processing request is still in progress.
401 Credentials could not be validated.
402 Reached a processing limit based on the account's plan.
403 Credentials could not be authorized for this processing request.
404 No such workflow.
413 Uploaded document is too large or has too many pages.
422 Validation Error (most likely regarding the request payload).
429 Temporary rate limit exceeded or uploaded document too large.
500 Processing request failed to complete.

Please check the section about advanced response handling for further details about how to avoid or handle the different response codes.

Hint

Not only the semantics, also the response schemas for all the HTTP-Status-Codes above 200 are identical to the POST /processing/{workflow} to provide a maximum of consistency. On success (200), the response schema depends on the specific requested sub-result.

Hint

Please note that the GET /processing/results/{processing_id}/... endpoint described on this page slightly differs from the POST /processing/{workflow} by returning an HTTP code 200 instead of HTTP 201 for the regular success response.

Individually Requestable Results#

OCR#

GET /processing/results/{processing_id}/ocr

Retrieves the ocr processing result.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to OCR Format.

HOCR#

GET /processing/results/{processing_id}/hocr

Retrieves the ocr information for a given process instance, formatted as html, containing a visual representation of the same information as provided by the OCR result.

Response Details#

On success, this endpoint returns a text/html document.

Extractions#

GET /processing/results/{processing_id}/extractions

Retrieves the extractions processing result. The kind of available extractions formats depends on the workflow that extracted it and potentially also the detected class of the submitted document. See also Extractions Format

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Extractions Format.

List of Generated Page Images#

GET /processing/results/{processing_id}/page-images

Each document - independent of the original upload format, will usually be converted into a standard image format that is suitable to be processed page by page by our AI. This endpoint retrieves the page-images processing result, which contains a list of paths where each of these page-images can be downloaded. See GET /processing/results/{processing_id}/page-images/{page_num}.

Response Details#

On success, this endpoint returns an application/json document of the following form:

Response JSON
{
  "pages": [
    "/processing/results/{processing_id}/page-images/1",
    "/processing/results/{processing_id}/page-images/2",
    ...
  ]
}

Individual Page Images#

GET /processing/results/{processing_id}/page-images/{page_num}

Retrieves a single page image generated during processing, specified by the given page number. This endpoint corresponds to the download paths returned by GET /processing/results/{processing_id}/page-images.

Additional Parameters#

FieldTypeDescription
{page_num} int It is passed as a part of the query path and identifies the page number for which the page image should be retrieved.
width int (optional) It can passed as a query parameter to scale the image down to the provided width maintaining aspect ratio. If left empty, and height is provided, the image will be scaled with respect to height.
height int (optional) It can passed as a query parameter to scale the image down to the provided height maintaining aspect ratio. If left empty, and width is provided, the image will be scaled with respect to width.

Hint

If the given width and height don't match the original aspect ratio, the image will be scaled with respect to the larger dimension while keeping the aspect ratio.

Response Details#

On success, this endpoint returns a file of Content-Type: image/*, representing the respective document page, as it has been used for further processing.

Thumbnail#

GET /processing/results/{processing_id}/thumbnail

Retrieves a thumbnail that is automatically generated from the first page image of the document.

Response Details#

On success, this endpoint returns a file of Content-Type: image/*.

Document Splitting#

GET /processing/results/{processing_id}/document-splitting

Retrieves the document splitting result generated. Is only available as part of workflows that implement document splitting.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Document Splitting Format.

Split PDFs#

GET /processing/results/{processing_id}/split-pdfs

Generated as part of the document splitting process, providing URL maps to download the split PDFs. Similar to the document splitting result, this result is only available as part of workflows that implement document splitting.

Response Details#

On success, this endpoint returns an application/json document. For details about its shape, please refer to Split PDF Format.

Individual Split PDFs#

GET /processing/results/{processing_id}/split-pdfs/{index}

Retrieves a PDF file for a sub-document based on the results of the document splitting process, specified by the given split point index. This endpoint corresponds to the download paths returned by GET /processing/results/{processing_id}/split-pdfs.

Additional Parameters#

FieldTypeDescription
return_pages_with_ocr_data bool (optional) Defines whether instead of returning split original PDF pages, the pages should be rendered with OCR overlay information. If set to true, the pages will be rendered based on the pre-processing results of cropping and OCR. To customize image quality and archival compliance of the rendered pages, this parameter must be set to true. The default value is false.
jpeg_quality int (optional) Defines the quality of the pdf images. The higher the value, the better the image quality and bigger the output file size. Valid range is from 1 to 100. Only applicable if return_pages_with_ocr_data is true.
pdfa_compliant bool (optional) Defines if the PDF should be made PDF/A compliant. Only applicable if return_pages_with_ocr_data is true.
image_height int (optional) The size of the rendered images' short side in pixels. For portrait orientation images, this corresponds to the width. If image_height is not given, it will be calculated from dpi, page_height and page_width. If page_height and page_width are not given, image_height will be derived from the original image aspect ratio. Only applicable if return_pages_with_ocr_data is true.
image_width int (optional) The size of the rendered images' short side in pixels. For portrait orientation images, this corresponds to the width. If image_height is not given, it will be calculated from dpi, page_height and page_width. If page_height and page_width are not given, image_height will be derived from the original image aspect ratio. Only applicable if return_pages_with_ocr_data is true.
page_height float (optional) The size of the PDF pages' long side in millimeters. For portrait orientation pages, this corresponds to the height. If specified, page_width will be required. Only applicable if return_pages_with_ocr_data is true.
page_width float (optional) The size of the PDF pages' short side in millimeters. For portrait orientation pages, this corresponds to the width. If specified, page_height will be required. Only applicable if return_pages_with_ocr_data is true.
dpi int (optional) The resolution of the rendered PDF document in pixels per inch. The default value is 220. Higher values result in better image quality but also increase file size. Only applicable if return_pages_with_ocr_data is true.

Hint

For self-trained splitting models that are configured to crop uploaded documents, keep in mind the following special behavior of the endpoint when the return_pages_with_ocr_data parameter is set to the default value false:

If you have trained your splitting model to split scanned pages that contain e.g. two receipts on a single page into two sub-documents, then GET /processing/results/{processing_id}/split-pdfs/{index} will return the same original PDF page containing both receipts for both sub-document indices. Please consider using the endpoint with the setting true for return_pages_with_ocr_data if your use case requires different PDFs for sub-documents that are cropped from the same page.

Response Details#

On success, this endpoint returns an application/pdf response.

Info

The GET /processing/results/{processing_id}/sub-pdfs and GET /processing/results/{processing_id}/sub-pdfs/{page_num} endpoints have been deprecated in favor of their split-pdfs counterparts. You can find more information about the deprecated endpoints in the section about compatibility APIs for existing applications.

More to come... 🚀#

Not yet implemented.

We're currently working on building additional results endpoints for more processing results and artifacts, stay tuned!