GenAI Grounding Evaluations with SAP AI Core
This guide describes how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) in a Retrieval-Augmented Generation (RAG) scenario, with a specific focus on groundedness evaluation.
You will learn
- How to configure a grounding evaluation workflow in SAP AI Core.
- How to upload and manage RAG-based test datasets that include retrieved context.
- How to define grounding-specific evaluation metrics for assessing LLM responses.
- How to execute grounding evaluations and analyze the grounding results.
Prerequisites
- BTP Account
Set up your SAP Business Technology Platform (BTP) account.
Create a BTP Account - For SAP Developers or Employees
Internal SAP stakeholders should refer to the following documentation: How to create BTP Account For Internal SAP Employee, SAP AI Core Internal Documentation - For External Developers, Customers, or Partners
Follow this tutorial to set up your environment and entitlements: External Developer Setup Tutorial, SAP AI Core External Documentation - Create BTP Instance and Service Key for SAP AI Core
Follow the steps to create an instance and generate a service key for SAP AI Core:
Create Service Key and Instance - AI Core Setup Guide
Step-by-step guide to set up and get started with SAP AI Core:
AI Core Setup Tutorial - An Extended SAP AI Core service plan is required, as the Generative AI Hub is not available in the Free or Standard tiers. For more details, refer to
SAP AI Core Service Plans - Orchestration Deployment
Ensure at least one orchestration deployment is ready to be consumed during this process.
Refer to this tutorial understand the basic consumption of GenAI models using orchestration. - Basic Knowledge
Familiarity with the orchestration workflow is recommended - Install Dependencies
Install the required Python packages using the requirements.txt file provided.
Download requirements.txt
In RAG-based enterprise applications, model responses must be grounded in trusted data sources such as enterprise documents, knowledge bases, or curated repositories. SAP AI Core’s evaluation capabilities allow you to systematically measure grounding quality, retrieval relevance, and alignment of generated responses with source content.
💡 Right-click the link above and choose “Save link as…” to download it directly.
Below are the Steps to Run a GenAI Evaluation in SAP AI Core