Classify Documents with Document Classification

Requires Customer/Partner License

Beginner

15 min.

Machine Learning, Beginner, SAP AI Services, Cloud, Document Classification, Tutorial, SAP Business Technology Platform, Artificial Intelligence

Find out how to classify the documents you uploaded to Document Classification using your machine learning model.

You will learn

How to classify documents using your Document Classification machine learning model
How to evaluate the performance of your model

Juliana MoraisDecember 11, 2023

Created by

May 18, 2020

Contributors

Based on your deployed machine learning model you can now classify documents. For more information, see Document Classification.

Step 1
As your machine learning model is now deployed it can be used to classify documents.

Click the cell below the heading Classification. The below code reproduces the stratification process mentioned in the previous tutorial. The code picks out the documents that are used for testing and were not used for training the model. It will then send these documents to the service for classification. Click Run to start the classification.

Once the classification is done, the result for every tested document is printed as in the image below. The result includes predictions for every characteristic, in this case only Language. Moreover, the service assigns a probability to every possible value of that characteristic. In the highlighted example below German is predicted with a probability of 0.88.
Step 2
To actually compare the predictions with the true value, you can print out the ground truth (true value) in addition to the predictions made by the service. In this case, the ground truth is the actual language of the document.

Click the next cell and click Run. This prints out the ground truth and the prediction for one document.

In the example above, the ground truth for the language is German. The service predicted German with a probability of 0.85 and it was able to eliminate English and Other completely. The service only confused it with Both (German and English language) but only with a very small probability. In other words, the service made a very accurate prediction here.
Step 3
Another tool to measure the overall performance of the model is the confusion matrix. The matrix shows how many documents were wrongly classified, for example, how many documents were confused into a wrong class by the service.

Click the cell below the heading Confusion Matrix and click Run to create the matrix.

The operation outputs a matrix as in the image below.

The y-axis of the matrix is labeled with the true languages of the documents whereas the x-axis is labeled with the predictions that the service made. In an ideal setting all values would lie on the diagonal as that would mean that the service predicted the true value every time. This case comes pretty close as the service only confused one document in this example. The documents true value is Both (English and German text) but it was classified as German.

Nevertheless, this is a great result and it is also reflected in the great accuracy of the model mentioned in the previous tutorial.

You have now successfully classified documents and evaluated the performance of your machine learning model. Feel free to modify this notebook to classify your own documents or to use a different dataset.
Step 4
What is the ground truth?
Fraction of predictions made by a machine learning model that are correct
The true value of the document
The value that the service predicted

Classify documents
Display the classification and ground truth
Measure performance in a confusion matrix
Test yourself

Classify Documents with Document Classification

What is the ground truth?

Developer Products

Trials & Downloads

Site Information