Skip to Content

GenAI Grounding Evaluations with SAP AI Core

This guide describes how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) in a Retrieval-Augmented Generation (RAG) scenario, with a specific focus on groundedness evaluation.
You will learn
  • How to configure a grounding evaluation workflow in SAP AI Core.
  • How to upload and manage RAG-based test datasets that include retrieved context.
  • How to define grounding-specific evaluation metrics for assessing LLM responses.
  • How to execute grounding evaluations and analyze the grounding results.
I321506Smita NaikApril 1, 2026
Created by
I321506
April 1, 2026
Contributors
I321506

Prerequisites

  1. BTP Account
    Set up your SAP Business Technology Platform (BTP) account.
    Create a BTP Account
  2. For SAP Developers or Employees
    Internal SAP stakeholders should refer to the following documentation: How to create BTP Account For Internal SAP Employee, SAP AI Core Internal Documentation
  3. For External Developers, Customers, or Partners
    Follow this tutorial to set up your environment and entitlements: External Developer Setup Tutorial, SAP AI Core External Documentation
  4. Create BTP Instance and Service Key for SAP AI Core
    Follow the steps to create an instance and generate a service key for SAP AI Core:
    Create Service Key and Instance
  5. AI Core Setup Guide
    Step-by-step guide to set up and get started with SAP AI Core:
    AI Core Setup Tutorial
  6. An Extended SAP AI Core service plan is required, as the Generative AI Hub is not available in the Free or Standard tiers. For more details, refer to
    SAP AI Core Service Plans
  7. Orchestration Deployment
    Ensure at least one orchestration deployment is ready to be consumed during this process.
    Refer to this tutorial understand the basic consumption of GenAI models using orchestration.
  8. Basic Knowledge
    Familiarity with the orchestration workflow is recommended
  9. Install Dependencies
    Install the required Python packages using the requirements.txt file provided.
    Download requirements.txt

In RAG-based enterprise applications, model responses must be grounded in trusted data sources such as enterprise documents, knowledge bases, or curated repositories. SAP AI Core’s evaluation capabilities allow you to systematically measure grounding quality, retrieval relevance, and alignment of generated responses with source content.

💡 Right-click the link above and choose “Save link as…” to download it directly.

Below are the Steps to Run a GenAI Evaluation in SAP AI Core

  • Step 1

    This tutorial uses a structured evaluation dataset named emanual.csv Placed inside the folder DATASET_RAG

    You can access the DATASET_RAG.zip from the GitHub repository.

    NOTE: If you download the ZIP file, extract it and navigate to the DATASET_RAG folder. Place the entire folder in your designated location for further use.

    Dataset

    It leverages the publicly available emanual.csv, which contains commonly asked emanual questions. Each entry includes:

    - topic (user query)
    - answer
    - context
    

    How it works

    • A query and its retrieved context are sent to the model.

    • The model generates a grounded response.

    • The grounding metrics evaluate if the output faithfully uses the provided context.

  • Step 2

    For hands-on execution and end-to-end reference, use the accompanying Evaluation Grounding Notebook. It includes complete Python code examples that align with each step of this tutorial — from dataset preparation and artifact registration to configuration creation, execution, and result retrieval.

    💡 Even though this tutorial provides stepwise code snippets for clarity, the notebook contains all required imports, object initializations, and helper functions to run the flow seamlessly in one place.

    To use the notebook:
    - Download and open notebook in your preferred environment (e.g., VS Code, JupyterLab).
    - Configure your environment variables such as AICORE_BASE_URL, AICORE_AUTH_TOKEN, and object store credentials .
    - Execute each cell in order to reproduce the complete Evaluation Grounding workflow demonstrated in this tutorial.

  • Step 3

    Important Note: Please note that for using the document grounding service, your request must contain the document grounding label set to true. Therefore, existing resource groups without the label won’t work.

  • Step 4
  • Step 5

    You can upload the orchestration run files, grounding test datasets, and any optional metric definitions to SAP AI Core using the Tracking API. To upload these files, you must first register an object store secret containing your object store credentials

    ⚠️ Important Note (Must Read)

    • You must create an object store secret named default to store output artifacts from orchestration runs. This is mandatory.
    • For input artifacts, you may create additional object store secrets with different names if needed.
    • If a secret named default is not configured, orchestration runs will fail due to missing output target setup.
  • Step 6

    In the next step, we create a secret that enables grounding by adding on the “labels” config. This generic secret needs to be created to provide details of the hyperscaler and bucket details so that grounding service will know how to retrieve data from it.

  • Step 7

    Before running grounding evaluations, you must create a grounding pipeline in SAP AI Core. This pipeline is responsible for reading documents from your object store, processing them, and preparing them for retrieval.

  • Step 8
  • Step 9
  • Step 10
  • Step 11
Back to top