Using Evaluation Service available in SAP AI Core
Beginner
45 min.
This tutorial demonstrates how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) using two different approaches **Prompt Registry** and **Orchestration Registry**. It guides you through dataset preparation, environment setup, configuration creation, execution, and result analysis in a unified and simplified workflow.
You will learn
- How to prepare and organize datasets for evaluation.
- How to choose between Prompt Registry and Orchestration Registry approaches.
- How to configure and run evaluations in SAP AI Core.
- How to analyze and interpret aggregated evaluation results.
Prerequisites
- Setup Environment:
Ensure your instance and AI Core credentials are properly configured according to the steps provided in the initial tutorial - Orchestration Deployment:
Ensure at least one orchestration deployment is ready to be consumed during this process.
Refer to this tutorial understand the basic consumption of GenAI models using orchestration. - Basic Knowledge: Familiarity with the orchestration workflow is recommended
- Install Dependencies: Install the required Python packages using the requirements.txt file provided.
Download requirements.txt
π‘ Right-click the link above and choose βSave link asβ¦β to download it directly.
It extends the Quick Start tutorial and is intended for Application Developers and Data Scientists who already know the basics of GenAI workflows in SAP AI Core.
Below are the Steps to Run a GenAI Evaluation in SAP AI Core
Pre-Read
The structure of the input data should be as follows:
Root
βββ PUT_YOUR_PROMPT_TEMPLATE_HERE
| βββ prompt_template.json
β
βββ PUT_YOUR_DATASET_HERE
β βββ medicalqna_dataset.csv
|
βββ PUT_YOUR_CUSTOM_METRIC_HERE
βββ custom-llm-metric.json
βββ custom-llm-metric.jsonl
Dataset and Configuration:
To run this evaluation, All required input files must be placed inside the folder structure provided in the repository:
You can download or clone the complete folder from the link below and place your files inside the respective folders Download / Open Full Folder Structure
1. **Prompt Template Configuration (`PUT_YOUR_PROMPT_TEMPLATE_HERE`)**
* Place one or more prompt template configurations as JSON files in this folder.
2. **Test Dataset (`PUT_YOUR_DATASET_HERE`)**
* The test dataset should be a CSV, JSON, or JSONL file containing prompt variables, ground truth references, and other data required for evaluation.
3. **Custom Metrics (`PUT_YOUR_CUSTOM_METRIC_HERE`)**
* (Optional) You can provide custom metric definitions in a single JSON or JSONL file. For JSONL, each line should be a JSON object defining one metric. For JSON, it should be an array of metric-definition objects.