Skip to content

Fine-tune Invoice Extraction#

Fine-tune Invoice Extraction lets you customise our generic invoice extraction model for the vendors you typically receive — so it becomes the master-of-your-trades. Your frequently received invoices will be processed with near-perfect accuracy, while the remaining invoices still benefit from the generalisability of the generic model.

When to use this workflow#

  • Use Fine-tune Invoice Extraction when your documents are invoices and you want to lift accuracy on the vendors you frequently receive without giving up coverage on the long tail.
  • Use Train-your-own Extraction Model instead when you need entities outside of our invoice schema, or for any non-invoice extraction use case.
  • Use GenAI Extraction instead when you want minimal setup (no annotation required).

At a glance#

Output extractions (invoice schema), ocr
Annotation Required
Training Required
Returns confidence per field Yes
Cost 10 credits/page · 15 credits/document

Creating the workflow#

You can start the workflow creation either by clicking Train Your Own Model + on the API Hub and selecting Fine-tune Invoice Extraction, or directly via the wizard.

Train Your Own Model button Fine-tune Invoice Extraction selector

The wizard guides you through the following steps:

  1. Define the templates for which you want to fine-tune the model and upload your files accordingly. Define templates
  2. Annotate your data. The generic model already extracts entities — all you have to do is correct any errors. Annotate
  3. Train the model on your uploaded and corrected data to get your fine-tuned model. You can then view the per-entity performance analysis to understand your model. Train

You can check this blog post to see the steps in detail.

Your workflow identifier is a UUID generated during the creation process. Once training is complete you can call it like any other workflow at the processing endpoint:

POST /processing/{your_workflow_identifier}

Processing your documents#

You can upload documents directly from the Dashboard tab. They can also be uploaded from the Uploads tab, which lists all documents that have been processed through this workflow.

OpenAPI Documentation#

For API usage, have a look at the Documentation tab on the workflow dashboard, where you can find an OpenAPI documentation customised for your workflow.

OpenAPI documentation

The relevant endpoint is:

POST /processing/{workflow_key}

It is used to process a document with the fine-tuned model. The workflow key is the UUID of your workflow, which you can find in the URL of the dashboard.

The workflow returns extractions and ocr. The structure of extractions follows our standard invoice schema (same as the prebuilt Invoice Extraction workflow):

For document_type = invoice

All supported fields
  • schema_version: integer (possible values: [2])
  • document_type: string (possible values: ['invoice'])
  • customer: Customer
    • name: StringExtraction
    • address: StringExtraction
    • address_struct: Address
      • address_line_1: StringExtraction
      • address_line_2: StringExtraction
      • city: StringExtraction
      • zip: StringExtraction
      • country: CountryExtraction
    • vat_id: StringExtraction
    • tax_number: StringExtraction
    • eori_number: StringExtraction
    • customer_number: StringExtraction
    • banking_information: array of BankingInformation
      • validation_problem: boolean (deprecated)
      • note: string (deprecated)
      • confidence: number (deprecated)
      • bbox_refs: array of Reference
        • page_num: integer
        • bbox_id: integer
      • iban: StringExtraction
      • bic: StringExtraction
  • vendor: Vendor
    • name: StringExtraction
    • address: StringExtraction
    • address_struct: Address
      • address_line_1: StringExtraction
      • address_line_2: StringExtraction
      • city: StringExtraction
      • zip: StringExtraction
      • country: CountryExtraction
    • vat_id: StringExtraction
    • tax_number: StringExtraction
    • eori_number: StringExtraction
    • register_id: StringExtraction
    • banking_information: array of BankingInformation
      • validation_problem: boolean (deprecated)
      • note: string (deprecated)
      • confidence: number (deprecated)
      • bbox_refs: array of Reference
        • page_num: integer
        • bbox_id: integer
      • iban: StringExtraction
      • bic: StringExtraction
    • phone: StringExtraction
    • fax: StringExtraction
    • url: StringExtraction
    • e_mail: StringExtraction
  • currency: CurrencyExtraction
  • date: DateExtraction
  • due_date: DateExtraction
  • service_period: one of:
    • DateExtraction
    • Period
      • start_date: DateExtraction
      • end_date: DateExtraction
  • number: StringExtraction
  • order_numbers: array of StringExtraction
  • order_confirmation_numbers: array of StringExtraction
  • delivery_note_numbers: array of StringExtraction
  • payment_methods: array of PaymentMethodExtraction
  • net_amount: FloatExtraction
  • tax_amount: FloatExtraction
  • additional_cost: FloatExtraction
  • gross_amount: FloatExtraction
  • tax_calculation: array of TaxCalculation
    • tax_code: StringExtraction
    • tax_rate: FloatExtraction
    • tax_amount: FloatExtraction
    • net_amount: FloatExtraction
    • gross_amount: FloatExtraction
    • early_payment_date: DateExtraction
    • discount_percentage: FloatExtraction

      The percentage is only returned if it is written on the document, i.e. it is not calculated.

    early_payment_benefit: array of EarlyPaymentBenefit

    • discount_amount: FloatExtraction (deprecated)

      Use new_amount instead.

    • new_amount: FloatExtraction

  • line_item: array of LineItem

    • pos_id: StringExtraction
    • article_id: StringExtraction
    • ean: StringExtraction
    • description: StringExtraction
    • quantity: FloatExtraction
    • unit_of_measure: StringExtraction
    • service_period: one of:
      • DateExtraction
      • Period
        • start_date: DateExtraction
        • end_date: DateExtraction
    • discount: FloatExtraction
    • additional_cost: FloatExtraction
    • tax_rate: FloatExtraction
    • tax_code: StringExtraction
    • unit_price: FloatExtraction
    • total_price: FloatExtraction
    • order_number: StringExtraction
    • order_confirmation_number: StringExtraction
    • delivery_note_number: StringExtraction
  • number_of_line_items: FloatExtraction
  • due_payable_amount: FloatExtraction
  • discount_amount: FloatExtraction
  • payment_reference: StringExtraction
  • barcodes: array of StringExtraction
  • key_value_pairs: array of KeyValuePair
    • key: StringExtraction
    • value: StringExtraction

Note

For a reference of the structure of each of the extractions objects see Extracted Values. Also, for accessing individual processing results or artifacts, have a look at Fetch Processing Results and Artifacts.

Important

The structure of extractions might contain optional paths. See this and this part of the documentation.

Code Snippets#

Along with the workflow-specific OpenAPI documentation, you can find code snippets for different programming languages to help you get started with the API.

Code snippets

Confidence and Human-in-the-loop#

Along with each extracted value, the API returns a confidence score between 0 and 1. The closer the value is to 1, the more certain the model is about the extraction. Use these confidence values to drive a Human-in-the-loop process: send high-confidence extractions straight through to your downstream system and route lower-confidence ones to a reviewer.

You can use the natif.ai Stand-Alone Interface or integrate our verification API endpoint to mark documents as verified. See Verification for details. For extraction workflows the relevant endpoint is:

POST /processing/results/{processing_id}/extractions/verification

Feedback#

The feedback API helps you improve your workflow iteratively by feeding processed documents back into the training data. Submit feedback on a processing result via:

POST /processing/feedback/{processing_id}

You can provide a description and a tag to categorise the document under a template (typically the vendor name):

{
  "description": "Extraction was off for this vendor",
  "tag": "Acme Corp"
}

The document is then added to the Training Data view under the corresponding template, where you can annotate it and include it in the next training run.

Credit cost#

A Freemium account allows for up to 100 pages per month, where the cost is 10 credits per page, and 15 credits per document.

Note

A document is usually a bundle of 10 pages.

Previewing Workflow Updates#

natif.ai is constantly improving the model architectures and baselines for custom workflows, which sometimes requires (beneficial) updates to existing workflows. In order to not interfere with productive usage of your workflow, natif.ai will inform you in advance by email about such updates and will provide a preview version of the upcoming workflow update for you to try out before the automatic migration.

Please refer to the preview endpoint documentation to make use of the endpoint to test the upcoming version of your workflow for production usage and provide feedback to us.