Skip to Content

Use Trial to Extract Information from Custom Documents with Generative AI and Document Information Extraction

Learn how to use Document Information Extraction with generative AI to automate the extraction of information from custom document types using large language models (LLMs).
You will learn
  • How to create and activate your own schema for custom documents
  • How to define the fields that you want to extract from a custom document
  • How to upload a custom document to the Document Information Extraction UI
  • How to get extraction results using the schema you’ve created and LLMs
Juliana-MoraisJuliana MoraisJanuary 17, 2025
Created by
Juliana-Morais
November 7, 2023
Contributors
Steve-Rizza
Juliana-Morais

Prerequisites

In this tutorial, you’ll create a schema and define the fields that you want to extract from custom document types using LLMs. You’ll then use your schema to get field value predictions for various documents that you upload to the Document Information Extraction UI, including delivery notes and birth certificates.

  • Step 1

    Before you upload a custom document for extraction, you’ll create a corresponding schema. In this tutorial, we provide sample files and settings for the following custom documents:

    • Delivery note

    • Résumé

    • Birth certificate

    • Work contract

    In the first example, you’ll use a delivery note. After working through this example, you can go on and try out the other custom document types covered in Step 5.

    1. Open the Document Information Extraction UI, as described in the tutorial: Use Trial to Set Up Account for Document Information Extraction and Go to Application.

    2. In the left navigation pane, click Schema Configuration.

      LLM
    3. To create your own schema, click Create.

      LLM
    4. In the dialog that opens, enter a name for your own schema – for example, delivery_note_schema. Note that the name can’t include blanks. Next, select Custom as your Document Type and Document as the OCR Engine Type.

    5. Click Create to create the schema.

      LLM
    6. Your schema now appears in the list. Access the schema by clicking on it.

      LLM
  • Step 2

    To add your first header field, click Add.

    LLM

    You must enter a field name and data type for each custom field. The available data types are string, number, date, discount, currency, and country/region. Default extractors aren’t available for custom documents. You can also optionally add a field label (user-friendly name) and a description.

    A description is an optional entry. It can be useful if you want to include an explanation or some additional context for a field.

    You can also use a description for other purposes, such as categorizing fields. For example, in the description of the field limitedContract in work contracts, you could specify yes, if the contract is limited and no if the contract is not. Or you could specify that the line item field skillType in a résumé can be technical or language.

    See Step 5 for examples of schemas that use the description field.

    As your first header field, add the number of the delivery note.

    1. Enter the name for your field – for example, deliveryNoteNumber.

    2. Select string as the Data Type.

    3. Use auto as the Setup Type and click Save.

    Note that when you use the setup type auto without a default extractor, LLMs are used to extract the information from the document. The setup type manual supports extraction using a template. For more details of this approach, take a look at the tutorial mission: Shape Machine Learning to Process Custom Business Documents.

    LLM

    The field now appears in your list of header fields, where you can see all the information that you’ve just entered. You can edit or delete the field by clicking the respective icons on the right.

    LLM

    Click Add again to open the Add Data Field dialog.

    1. Enter the name for your second header field – for example, purchaseOrderNumber.

    2. Select string as the Data Type.

    3. Use auto as the Setup Type and click Save.

    LLM

    Now, go ahead and add the remaining header fields and line item fields shown in the table and image below. Pay attention to the different data types and notice that the last three fields are line item fields (not header fields). Feel free to extend or reduce the list of fields.

    Field Type Field Name Data Type Setup Type
    header field deliveryNoteNumber string auto
    header field purchaseOrderNumber string auto
    header field deliveryDate date auto
    line item field materialNumber string auto
    line item field quantity number auto
    line item field unitOfMeasure string auto
    LLM

    NOTE: The Document Information Extraction UI also includes a feature that allows you to group schema fields by category. To use this feature, you must first activate it under UI Settings. For simplicity’s sake, we haven’t included the feature in this tutorial. If you’d like to find out more about it, see Schema Field Categories.

    Choose the setup type that you use when adding a field to a schema to make sure LLMs are used to extract the information from custom documents.

  • Step 3

    Once you’ve added all your fields, you need to activate the schema so that you can use it to extract information from documents. Right now, the schema has the status DRAFT, indicating that it can’t be used yet.

    To activate the schema, click Activate.

    LLM

    Now, the status of your schema changes to ACTIVE. To make changes to your schema, you must first Deactivate it.

    LLM

    Congratulations, you’ve now created and activated your own schema for delivery note documents.

  • Step 4
    1. Access Document from the navigation on the left of the screen, then click + to upload the delivery note document.

      LLM
    2. On the Select Document screen, choose Custom for the Document Type.

    3. Select the Schema you created (delivery_note_schema).

    4. Right-click on the link, then click Save link as to download the delivery note sample document locally.

    5. Drag and drop the file directly or click + to upload the sample document.

      LLM
    6. Click Confirm.

      The document status changes from PENDING to DONE.

      LLM
    7. Access the document by clicking on it. You now see the page preview of the document file you uploaded, and the information extracted from the delivery note header fields and line items using LLMs and the schema that you created.

      LLM

    Congratulations, you’ve now successfully extracted information from a delivery note document using the schema configuration feature from Document Information Extraction and LLMs.

  • Step 5

    You can now repeat the steps previouly described for the following documents (using the suggested fields or your own fields):

    Create the header fields shown in the table and image below. Don’t forget to add a description for degree, employer, and jobTitle (as in the image). Feel free to extend or reduce the list of fields.

    Field Type Field Name Data Type Setup Type
    header field firstName string auto
    header field lastName string auto
    header field degree string auto
    header field employer string auto
    header field jobTitle string auto
    LLM

    Create the header fields shown in the table and image below. Pay attention to the different data types and don’t forget to add a description for name (as in the image). Feel free to extend or reduce the list of fields.

    Field Type Field Name Data Type Setup Type
    header field name string auto
    header field birthDate date auto
    header field motherName string auto
    header field fatherName string auto
    header field registrationNumber string auto
    LLM

    Create the header fields shown in the table and image below. Pay attention to the different data types and don’t forget to add a description for all fields (as in the image). Feel free to extend or reduce the list of fields.

    Field Type Field Name Data Type Setup Type
    header field companyName string auto
    header field employeeName string auto
    header field limitedContract string auto
    header field salary number auto
    header field startDate date auto
    LLM

    Congratulations, you’ve completed this tutorial. Feel free to repeat the steps using your own custom documents.

Back to top