Use Machine Learning to Extract Information from Documents with Swagger UI
- How to call and test Document Information Extraction
- How to access and use Swagger UI (User Interface)
- How to extract information from files with Document Information Extraction
The core functionality of Document Information Extraction is to automatically extract structured information from documents using machine learning. When you finish this tutorial, you will get field value predictions for the documents you upload to Document Information Extraction.
- Step 1
You will use Swagger UI, via any web browser, to call the Document Information Extraction APIs. Swagger UI allows developers to effortlessly interact and try out every single operation an API exposes for easy consumption. For more information, see Swagger UI.
In the service key you created for Document Information Extraction in the previous tutorial: Use Trial to Set Up Account for Document Information Extraction and Get Service Key or Use Free Tier to Set Up Account for Document Information Extraction and Get Service Key, you should find (outside the
uaa
section of the service key) an entry calledurl
and another entry calledswagger
(as highlighted in the image below).-
To access the Document Information Extraction Swagger UI, add the
swagger
value (/document-information-extraction/v1
) to theurl
value, paste it in any web browser and press Enter. -
To be able to use the Swagger UI endpoints you need to authorize yourself. In the top right corner, click Authorize.
-
Get the
access_token
value created in the previous tutorial: Get OAuth Access Token for Document Information Extraction Using Any Web Browser, then add Bearer in front of it, and enter in the Value field.Bearer <access_token>
-
Click Authorize, and then click Close.
Choose the correct value that needs to be entered when authorizing the Swagger UI for the Document Information Extraction service.
Log in to complete tutorial -
- Step 2
Use the GET /capabilities endpoint to see the list of document fields and enrichment data for each document type you can process with Document Information Extraction.
Click the endpoint name to expand it, click Try it out, and then Execute.
You should receive a response like the following:
If you get an error response code 401 (Unauthorized), your token is probably incorrect. Check if you have added the word
Bearer
before the token and if the token value is complete and has been properly copied from theaccess_token
value you received in the previous tutorial: Get OAuth Access Token for Document Information Extraction via Web Browser.Log in to complete tutorial - Step 3
When you create a service instance for Document Information Extraction, a
default
client is automatically created. A client is used in most of the endpoints to distinguish and separate data.Trial users can only create one client. To see your list of clients:
Expand the GET /clients endpoint.
Click Try it out.
Enter a maximum number of clients to be listed in the limit field.
Click Execute.
You should receive a response like the following:
Log in to complete tutorial - Step 4
Use the DELETE /clients endpoint to delete the
default
client.Expand the DELETE /clients endpoint.
Click Try it out.
Enter in the payload field the client id (
default
) you want to delete.Click Execute.
You should receive a response like the following:
Log in to complete tutorial - Step 5
Use the POST /clients endpoint to create your own client. The
clientId
value created here will be used in other service endpoints.Expand the POST /clients endpoint.
Click Try it out.
Enter your
clientId
andclientName
values in the payload field in the format you see in Examples for payload parameter (c_00
andclient 00
, for example).Click Execute.
You should receive a response like the following:
You can repeat step 3 to see the
clientId
andclientName
of the client you have just created.CAUTION:
Be aware of the following Document Information Extraction trial account limitations:
Maximum 40 uploaded document pages per week (the documents can have more than 1 page)
Maximum 1 created
clientId
Maximum 10 created enrichment
dataIds
Log in to complete tutorial - Step 6
Document Information Extraction uses a globally pre-trained machine learning model that currently obtains better accuracy results with invoices and payment advices in the languages listed in Supported Languages and Countries. The team is working to support additional document types and languages in the near future.
Use the POST /document/jobs endpoint to upload to the service any document file in PDF or single-page PNG and JPEG format that has content in headers and tables, such as an invoice.
As an alternative to uploading your own document to the service, you can use any of the following sample invoice files (right click on the link, then click Save link as to download the files locally):
Do the following:
-
Expand the POST /document/jobs endpoint.
-
Click Try it out.
-
Upload a document file.
-
In options, enter the list of fields to be extracted from the uploaded file (
documentNumber
,taxId
,purchaseOrderNumber
,shippingAmount
,netAmount
,senderAddress
,senderName
,grossAmount
, for example), the client you created in step 5 (c_00
, for example), and the document type (invoice
, for example). In this case, you can use the following:JSONCopy{ "extraction":{ "headerFields":[ "documentNumber", "taxId", "purchaseOrderNumber", "shippingAmount", "netAmount", "senderAddress", "senderName", "grossAmount", "currencyCode", "receiverContact", "documentDate", "taxAmount", "taxRate", "receiverName", "receiverAddress" ], "lineItemFields":[ "description", "netAmount", "quantity", "unitPrice", "materialNumber" ] }, "clientId":"c_00", "documentType":"invoice", "receivedDate":"2020-02-17", "enrichment":{ "sender":{ "top":5, "type":"businessEntity", "subtype":"supplier" }, "employee":{ "type":"employee" } } }
-
Click Execute.
After you have clicked Execute, you should receive a response like the following:
Copy the
id
from the Response body to see the result of the extraction in the next step.Log in to complete tutorial -
- Step 7
You can now use the GET /document/jobs/{
id
} endpoint to receive the prediction.Expand the GET /document/jobs/{
id
} endpoint.Click Try it out.
Set
extractedValues
totrue
to get the extracted values.Enter the
id
received in the POST /document/jobs endpoint as theid
.Click Execute.
You should receive a response like the following:
In the response, you will find some general information about the document you uploaded. In
headerFields
, such asdocumentDate
andtaxAmount
, and inlineItems
, such asdescription
andquantity
, you will find the prediction for the extracted fields.The prediction is made with a probability indicated by the confidence field which represents how certain the model is about its prediction. A confidence of 1 means that the model is 100% sure about its prediction.
If the status of the document (indicated by the status field) is PENDING instead of DONE, then it means that the service is still extracting some fields and the returned JSON file does not yet contain all the requested fields.
You have now successfully used our machine learning model to get field value predictions for the document you uploaded to the Document Information Extraction service.
You can repeat step 4 and delete the client you created in step 5.
Congratulations, you have completed this tutorial.
Log in to complete tutorial