Skip to content

Train-your-own Classifier#

Our platform allows you to build your own custom classification API tailored to your needs, without requiring deep knowledge of AI. This allows you to automate business processes such as input management (i.e., sorting and routing incoming mail to the appropriate departments), reducing the need to personally handle such manual tasks.

API hub

To set up your custom model for training, click on the Train Your Own Model Now button and select Custom Classifier or, alternatively, click here. This brings you to the model creation wizard where you:

  • Give a name and optionally the description and thumbnail to your custom classifier model. TYOC Step 1
  • Define the labels to classify your documents into step 1
  • Define document specification for the kinds of documents you will be uploading. You can also specify the relevant pages for classification. step 2 Finally, create your workfow by clicking on Create Workflow. You may of course step back and change settings, once the workflow is created, the settings can be changed later unless explicitly specified.

step 2e

You can check this blog post to see the steps involved in preparing and training the model.

Once training is complete, this workflow will provide you with classifications and ocr. Your workflow identifier is a UUID that is automatically generated during the creation process, i.e. you can call the workflow like any other workflow using the processing endpoint i.e. POST /processing/{your_workflow_identifier}.

Supported return values#

As automatically included in the response JSON, unless otherwise specified via include query parameters.

Training data#

The number of training data required for a task depends on the difficulty of the task. Harder tasks need more training data to obtain good accuracies. If your task contains less than 5 classes, then we recommend to upload 25 samples per class. If your task contains more classes, then we recommend to upload 50 samples per class. If you think your task is difficult/complicated or if you find the accuracy is not good enough, you can try to double the size of your training data. In general, our classifiers work better with more data.

training-data

Classification confidence#

Along with the classification result for each document, we also give a confidence value for the classification result. Based on our analysis, we recommend human review for classification results with a confidence of less than 80%. This corresponds to an error rate of around 3% after manual correction. Of course, the complexity of your use case has a strong influence on the quality of the model and the reliability of the confidence values, so we recommend that you manually check which exact values suit your needs. About how to use the confidence values to select samples for human review/correction, please refer to the verification page.

Credit cost#

A Freemium account allows for up to 100 pages per month, where the cost is 15 credits per page, and 30 credits per document.

Note

A document is usually a bundle of 10 pages.