Use Machine Learning to Enrich Business Data with Swagger UI
- How to create, update, list and delete enrichment data using the business entity
You can also use Document Information Extraction to enrich the information extracted from documents with your own master data records. You can, for example, match enrichment data entities, such as supplier IDs, with the document Extracted Header Fields, such as sender names.
When enriching data with Document Information Extraction, you use 2 types of entities that you find in business documents. The business entity
represents different kinds of organizations with which you deal as a company. It can represent, for example, suppliers and customers. The employee entity
represents an employee in the company.
When you finish this tutorial, you will have explored all Data API functionalities to create, update, list and delete enrichment data using the business entity
type. See Enrichment Data API documentation.
- Step 1
After performing step 1 of the tutorial Use Machine Learning to Extract Information from Documents with Swagger UI to access and authorize the Document Information Extraction Swagger UI, you need to create a client.
When you create a service instance for Document Information Extraction, a
default
client is automatically created. A client is used in most of the endpoints to distinguish and separate data.Trial users can only create one client. To see your list of clients:
-
Expand the GET /clients endpoint.
-
Click Try it out.
-
Enter a maximum number of clients to be listed in the limit field.
-
Click Execute.
You should receive a response like the following:
Log in to complete tutorial -
- Step 2
Use the DELETE /clients endpoint to delete the
default
client.Expand the DELETE /clients endpoint.
Click Try it out.
Enter in the payload field the client id (
default
) you want to delete.Click Execute.
You should receive a response like the following:
Log in to complete tutorial - Step 3
Use the POST /clients endpoint to create your own client. The
clientId
value created here will be used in other service endpoints.Expand the POST /clients endpoint.
Click Try it out.
Enter your
clientId
andclientName
values in the payload field in the format you see in Examples for payload parameter (c_29
andclient 29
, for example).Click Execute.
You should receive a response like the following:
You can repeat step 1 to see the
clientId
andclientName
of the client you have just created.CAUTION:
Be aware of the following Document Information Extraction trial account limitations:
Maximum 40 uploaded document pages per week (the documents can have more than 1 page)
Maximum 1 created
clientId
Maximum 10 created enrichment
dataIds
Log in to complete tutorial - Step 4
Use the POST /data/jobs endpoint to add your own master data records to the database to enrich the information extracted from documents.
Expand the POST /data/jobs endpoint.
Click Try it out.
Define the data in the
payload
field, so that the system knows which extracted field (using, for example, supplier IDs from master data) should be enriched.JSONCopy{ "value":[ { "id":"BE0001", "name":"Sliced Invoices", "address1":"Suite 5A-1204 123 Somewhere Street Your City AZ 12345", "bankAccount":"DE32245443233324", "taxId":"DE123456788" }, { "id":"BE0002", "name":"Sliced", "address1":"Suite 9A-1204 123 Somewhere Boulevard Your City AZ 32323", "bankAccount":"DE32245443233325", "taxId":"DE123456789" } ] }
Choose the enrichment data
type
businessEntity
.Enter your
clientId
(created in the previous step).When you choose the enrichment data
type
business entity, you have the option to choose asubtype
(supplier
,customer
orcompanyCode
). In this example, choosesupplier
.Click Execute.
What just happened?
In this example, in the
payload
field, several master data records (name, ID and address, for example) from 2 different suppliers (Sliced Invoices and Sliced) are provided, so this additional information can be added to the document extracted fields prediction when the information matches.You should receive a response like the following with status PENDING:
Copy the
id
from the Response body to see the result of the enrichment data status in the next step.Which of the following values you can choose in the subtype parameter to create enrichment data when using the business entity?
Log in to complete tutorial - Step 5
Use the GET /data/jobs/{
id
} endpoint to see the status of the uploaded enrichment data.Expand the GET /data/jobs/{
id
} endpoint.Click Try it out.
Enter the
id
received in the POST /data/jobs endpoint as theid
.Click Execute.
You should receive a response like the following with status SUCCESS:
What just happened?
The
refreshedAt
parameter tells when the enrichment data job was refreshed for the last time. When the response is null, it means that the enrichment data has not yet been refreshed.
Enrichment data is refreshed automatically every 4 hours. It might take up to 4 hours until the enrichment data prediction is available in the response.Log in to complete tutorial - Step 6
Set data activation to manual, instead of using the default automatic refresh of enrichment data that takes place every 4 hours.
If you have already performed this step in the previous tutorial: Use Machine Learning to Enrich Employee Data with Swagger UI, you can skip it now. Set it to done and move directly to step 7.
-
Expand the POST /configuration endpoint.
-
Click Try it out.
-
Enter the following in the
payload
field:JSONCopy{ "value": { "manualDataActivation":"true" } }
-
Click Execute.
You should receive a response like the following:
Log in to complete tutorial -
- Step 7
Create a data activation job record to see new or updated enrichment data in the extraction results. Only activated enrichment data will be added to the extraction results.
Expand the POST /data/activation endpoint.
Click Try it out.
Click Execute.
You should receive a response like the following:
If you have already used this endpoint recently, you should receive a response like the following:
Wait until next data activation is possible to perform this step once again before moving to step 8.
Log in to complete tutorial - Step 8
Document Information Extraction uses a globally pre-trained machine learning model that currently obtains better accuracy results with invoices and payment advices in the languages listed in Supported Languages and Countries. The team is working to support additional document types and languages in the near future.
When enrichment data has been uploaded and fits to a certain prediction it is added to the results from the GET /document/jobs/{
id
} endpoint. To have the enrichment data in the prediction, you need to have the following part in the query of the POST /document/jobs endpoint (it is usually already there by default):JSONCopy"enrichment": { "sender": { "top": 5, "type": "businessEntity", "subtype": "supplier" }, "employee": { "type": "employee" } }
Do the following:
-
Expand the POST /document/jobs endpoint.
-
Click Try it out.
-
Right click Sample Invoice 1, then click Save link as to download locally the document file for this enrich business data example.
You can also upload to the service and enrich any document file in PDF or single-page PNG and JPEG format that has content in headers and tables, such as an invoice. In this case, make sure the data you define in the
payload
field, in step 4, matches your document fields. -
Upload the document file you want to enrich.
-
In options, enter the list of fields to be extracted from the uploaded file (
documentNumber
,taxId
,purchaseOrderNumber
,shippingAmount
,netAmount
,senderAddress
,senderName
,grossAmount
, for example), the client id you created in step 3 (c_29
, for example), the document type (invoice
, for example),receivedDate
(2020-02-17, for example), the enrichment data typebusinessEntity
and subtypesupplier
.JSONCopy{ "extraction":{ "headerFields":[ "documentNumber", "taxId", "purchaseOrderNumber", "shippingAmount", "netAmount", "senderAddress", "senderName", "grossAmount", "currencyCode", "receiverContact", "documentDate", "taxAmount", "taxRate", "receiverName", "receiverAddress" ], "lineItemFields":[ "description", "netAmount", "quantity", "unitPrice", "materialNumber" ] }, "clientId":"c_29", "documentType":"invoice", "receivedDate":"2020-02-17", "enrichment":{ "sender":{ "top":5, "type":"businessEntity", "subtype":"supplier" }, "employee":{ "type":"employee" } } }
-
Click Execute.
This is how the request should look like:
And that’s how the response looks like:
Copy the
id
from the Response body to get enrichment data prediction in the next step.Log in to complete tutorial -
- Step 9
When enrichment data has been uploaded and fits to a certain prediction it is added to the results from the GET /document/jobs/{
id
} endpoint.Enrichment data is refreshed automatically every 4 hours. It might take up to 4 hours until the enrichment data prediction is available in the response. If the enrichment data prediction is NOT available in the response in your first try, perform again (some hours later) steps 6 and 7. Do not perform steps 9 and 10 before you see the enrichment data prediction in the response in step 7.
Expand the GET /document/jobs/{
id
} endpoint.Click Try it out.
Set
returnNullValues
andextractedValues
totrue
.Enter the
id
received in the POST /document/jobs endpoint as theid
.Click Execute.
You should receive a response like the following:
What just happened?
In this example, in the response, one of the extracted fields is the sender name Sliced Invoices. This information is enriched with the supplier ID enrichment data created in step 4. The prediction suggests the supplier ID BE0001 (from sender name Sliced Invoices) with higher probability than the supplier ID BE0002 (from sender name Sliced).
You have now successfully used the business entity to get enrichment data predictions for the document you uploaded to Document Information Extraction.
Log in to complete tutorial - Step 10
To see a list of the enrichment data entries you have created:
Expand the GET /data endpoint.
Click Try it out.
Choose the enrichment data
type
businessEntity
.Enter your
clientId
.Choose the enrichment data
subtype
supplier
.Click Execute.
You should receive a response like the following:
Log in to complete tutorial - Step 11
To delete enrichment data which has been uploaded before:
Expand the DELETE /data endpoint.
Click Try it out.
Define the data in the
payload
field, so that the system knows which data entry (using, for example, the data entry ID) should be deleted.JSONCopy{ "value":[ { "id":"BE0001" }, { "id":"BE0002" } ] }
Choose the enrichment data
type
businessEntity
.Enter your
clientId
.Choose the enrichment data
subtype
supplier
.Click Execute.
You should receive a response like the following:
You can repeat step 2 and delete the client you created in step 3.
Congratulations, you have completed this tutorial.
Log in to complete tutorial