Create Schema for Purchase Order Documents
- How to create a schema for purchase order documents
- How to add standard and custom data fields for the header and line item information of purchase order documents
The core functionality of Document Information Extraction is to automatically extract structured information from documents using machine learning. The service supports extraction from the following standard document types out of the box: invoices, payment advices, and purchase orders. You can customize the information extracted from these document types by creating a schema and adding the specific information that you have in your documents. Additionally, you can add completely new document types.
If you are new to the Document Information Extraction UI, first try out the tutorial: Use Machine Learning to Extract Information from Documents with Document Information Extraction UI.
- Step 1
-
Open the Document Information Extraction UI, as described in the tutorial: Use Trial to Set Up Account for Document Information Extraction and Go to Application or Use Free Tier to Set Up Account for Document Information Extraction and Go to Application.
If you HAVE NOT just used the Set up account for Document Information Extraction booster to create a service instance for Document Information Extraction and subscribe to the Document Information Extraction UI, observe the following:
- To access the Schema Configuration and Template features, ensure that you use the
blocks_of_100
plan to create the service instance for Document Information Extraction Trial.
- And make sure you’re assigned to the role collection:
Document_Information_Extraction_UI_Templates_Admin_trial
(orDocument_Information_Extraction_UI_Templates_Admin
if you’re using the free tier option). For more details about how to assign role collections, see step 2 in the tutorial: Use Trial to Subscribe to Document Information Extraction Trial UI, or step 3 in the tutorial: Use Free Tier to Subscribe to Document Information Extraction UI.
- After assigning new role collections, Log Off from the UI application to see all features you’re now entitled to try out.
- To access the Schema Configuration and Template features, ensure that you use the
-
In the left navigation pane, click Schema Configuration.
Here, you find the SAP schemas. The Document Information Extraction UI provides preconfigured SAP schemas for the following standard document types:
- Purchase order
- Payment advice
- Invoice
In addition, there’s an SAP schema for custom documents (
SAP_OCROnly_schema
). You can use SAP schemas unchanged to upload documents.NOTE: You can’t edit or delete original SAP schemas. Always create a copy and then edit the default fields, as required.
CAUTION:
When using the free tier option for Document Information Extraction or a trial account, be aware of the technical limits listed in Free Tier Option and Trial Account Technical Constraints.
-
- Step 2
To create your own schema, click Create.
In the dialog that appears, enter a name for your schema,
purchase_order_schema
, for example. Note that the name cannot include blanks. Further, selectPurchase Order
as your Document Type.Click Create to create the schema.
Now, your schema shows up in the list. Access the schema by clicking on the row.
- Step 3
A schema contains a list of header fields and line item fields representing the target information that you want to extract from a particular type of document. You must select a schema when you add documents to the Document Information Extraction UI.
You can either create your own schema from scratch or use a preconfigured SAP schema. If you don’t want to configure your own schema, you can select the appropriate SAP schema unedited when you add a document on the Document Information Extraction UI. No configuration is needed when you use SAP schemas in this way. Alternatively, you can copy a suitable SAP schema and edit the default fields in line with your needs.
Document Information Extraction already includes a number of fields that it can extract. See here which header fields are supported and here which line item fields are supported. Additionally, you can define custom fields. In the next step, you’ll learn about both.
The image below shows an example of a purchase order. All the fields that you define in your schema in this tutorial are highlighted. The header fields represent all information outside the table that occurs once. The line item fields represent all information within the table that occurs for each product. You can, of course, extend or reduce the information that you want to extract.
Choose the appropriate example of a line item field.
- Step 4
To define your first header field, click Add to the right of the heading
Header Fields
.For each field, you have to enter a name, a data type, a setup type, and optionally a default extractor and a description. The available data types are
string
,number
,date
,discount
,currency
, andcountry/region
.The available setup types are
auto
andmanual
. The setup typeauto
supports extraction using the service’s machine learning models. You must specify a default extractor (standard fields supported by Document Information Extraction) for this setup type. It can only be used in schemas created for standard document types. The setup typemanual
supports extraction using a template. It’s available in schemas created for standard or custom document types.If you’d like to find out more about setup types and how they relate to document types, extraction methods, and default extractors, see Setup Types.
As your first header field, add the purchase order number, which identifies your document.
-
Enter an appropriate name for your field,
purchaseOrderNumber
, for example. -
Select
string
for theData Type
. Note that a document number is astring
, even though it consists of numbers, as it is an arbitrary combination of numbers without meaning. In contrast, price is an example of the data typenumber
. -
As all business documents have a unique identification, Document Information Extraction already includes a standard field. Select
auto
for theSetup Type
and then selectdocumentNumber
for theDefault Extractor
. -
Click Save to add the header field.
The field now displays in your list of header fields, where you again find all the information that you’ve just entered. You can edit or delete the field by clicking the respective icons on the right.
You’ve now added your first header field that uses a default extractor from Document Information Extraction. Next, you’ll add your first custom header field,
Click Add again to open the dialog.
-
Enter an appropriate name for your field,
purchaseOrderStatus
, for example. -
Select
string
for theData Type
. -
As Document Information Extraction offers no equivalent field, select
manual
for theSetup Type
. Click Save to add the field.
You’ve now added your first custom field. Go ahead and add the header fields shown in the table and image below. Pay attention to which fields have a default extractor and which don’t. Feel free to extend or reduce the list of header fields.
Field Name Data Type Setup Type Default Extractor purchaseOrderNumber
string auto documentNumber
purchaseOrderStatus
string manual none vendor
string auto senderName
vendorSite
string auto senderAddress
shipTo
string auto shipToAddress
orderType
string manual none terms
string auto paymentTerms
orderCurrency
string auto currencyCode
entryDate
date auto documentDate
shipDate
date auto deliveryDate
cancelDate
date manual none totalCostNet
number auto netAmount
totalCostGross
number auto grossAmount
totalVatAmount
number manual none NOTE: The Document Information Extraction UI also includes a feature that allows you to group schema fields by category. To use this feature, you must first activate it under UI Settings. For simplicity’s sake, we haven’t included the feature in this tutorial. If you’d like to find out more about it, see Schema Field Categories.
-
- Step 5
Next, you need to define the line item fields. As your first line item field, add the SKU (stock keeping unit) that uniquely identifies an article.
Click Add to the right of the headline
Line Item Fields
.In the dialog proceed as follows:
-
Enter an appropriate name for your field,
skuNumber
, for example. -
Select
string
for theData Type
. -
Select
manual
for theSetup Type
and click Save to add the field.
The field now displays in your list of line item fields, where you find all the information again that you’ve just entered.
You’ve now added your first line item field. Go ahead and add the line item fields shown in the table and image below. Pay attention to which fields have a default extractor and which don’t. Feel free to extend or reduce the list of line item fields.
Field Name Data Type Setup Type Default Extractor skuNumber
string manual none description
string auto description
upcNumber
string manual none quantity
number auto quantity
unitPriceNet
number manual none unitPriceGross
number manual none vatRate
number manual none totalCost
number manual none -
- Step 6
Once you’ve added all header and line item fields, you need to activate the schema so that you can use it to extract information from documents. Right now, the schema has the status
DRAFT
, indicating that it cannot be used yet.To activate the schema, click Activate.
Now, the status of your schema changes to
ACTIVE
. To make changes to your schema, you have to Deactivate it first.Congratulations, you’ve created and activated your own schema for purchase order documents.
In the next tutorial: Create Template for Purchase Order Documents, you’ll create a template that uses your schema, and associate documents with your template to show the Document Information Extraction service where each field is located in the document.