Chat Completion

Introduction

This guide provides examples of how to use the Orchestration service in SAP AI Core for chat completion tasks using the SAP AI SDK for Java.

For detailed information on the individual capabilities of the Orchestration service, please refer to the official documentation.

warning

All classes under any of the ...model packages are generated from an OpenAPI specification. This means that these model classes are not guaranteed to be stable and may change with future releases. They are safe to use, but may require updates even in minor releases.

Prerequisites

Before using the Orchestration module, ensure that you have met all the general requirements outlined in the overview.

To use the Orchestration capabilities, a deployment of Orchestration is needed in AI Core. When using the default resource group, this is provided by default and no further setup is needed.

Prerequisites when using a custom resource group.

When using a custom resource group, ensure that you have a deployment of orchestration is running in your resource group. Refer to this guide on how to set up such a deployment.

You can verify your deployment is running by querying the /lm/deployments API of AI Core. It should have an entry similar to:

{
  "id": "d123456abcdefg",
  "deploymentUrl": "https://api.ai.intprod-eu12.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d123456abcdefg",
  "configurationId": "12345-123-123-123-123456abcdefg",
  "configurationName": "orchestration",
  "scenarioId": "orchestration",
  "status": "RUNNING",
  "statusMessage": null,
  "targetStatus": "RUNNING",
  "lastOperation": "CREATE",
  "latestRunningConfigurationId": "12345-123-123-123-123456abcdefg",
  "ttl": null,
  "createdAt": "2024-08-05T16:17:29Z",
  "modifiedAt": "2024-08-06T06:32:50Z",
  "submissionTime": "2024-08-05T16:17:40Z",
  "startTime": "2024-08-05T16:18:41Z",
  "completionTime": null
}

Maven Dependencies

Add the following dependency to your pom.xml file:

<dependencies>
  <dependency>
    <groupId>com.sap.ai.sdk</groupId>
    <artifactId>orchestration</artifactId>
    <version>${ai-sdk.version}</version>
  </dependency>
</dependencies>

See an example pom.xml in our Spring Boot application.

Create a Client and Choose an LLM

To use the Orchestration service, initialize an orchestration client with an LLM:

var client = new OrchestrationClient();

var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.GPT_4O);

Please also refer to our sample code for this and all following code examples.

How to create a client for a custom resource group

var destination = new AiCoreService().getInferenceDestination("myResourceGroup").forScenario("orchestration");
var client = new OrchestrationClient(destination);

Chat Completion

Use the Orchestration service to generate a response to a user message:

var prompt = new OrchestrationPrompt("Hello world! Why is this phrase so famous?");

var result = client.chatCompletion(prompt, config);

String messageResult = result.getContent();

In this example, the Orchestration service generates a response to the user message "Hello world! Why is this phrase so famous?". The LLM response is available as the first choice under the result.getOrchestrationResult() object.

Harmonized API

The Harmonized API lets you use different foundation models without the need to change the client code. You can switch from one LLM to another like in the example below.

// Original config for using GPT 4o
/*
var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.GPT_4O);
*/

// Switch to Claude 3.5 Sonnet
var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.CLAUDE_3_5_SONNET);

Available LLMs on SAP Generative AI Hub

Thanks to the harmonized API, all available LLMs on the SAP Generative AI Hub can be accessed through orchestration, including:

OpenAI GPT 4o
OpenAI o1
OpenAI o3 mini
AWS Anthropic Claude
AWS Amazon Nova
GCP VertexAI Gemini
Mistral AI

Check the SAP Notes for all available LLMs on SAP Generative AI Hub.

Custom Headers

To add custom headers to single or groups of your LLM calls, you can use the .withHeader method of the OrchestrationClient.

var client = new OrchestrationClient();

var result = client.withHeader("foo", "bar").chatCompletion(prompt, config);

Templating

Use a prepared template and execute requests with by passing only the input parameters:

var message = Message.user("Reply with 'Orchestration Service is working!' in {{?language}}");
var templatingConfig = TemplateConfig.create().withMessages(message);
var configWithTemplate = config.withTemplateConfig(templatingConfig);

var inputParams = Map.of("language", "German");
var prompt = new OrchestrationPrompt(inputParams);

var result = client.chatCompletion(prompt, configWithTemplate);

In this case the template is defined with the placeholder {{?language}} which is replaced by the value German in the input parameters.

Prompt Templates from Prompt Registry

Alternatively, you can use already prepared templates from the Prompt Registry of SAP AI Core instead of passing a template in the request yourself.

var template = TemplateConfig.reference().byId("21cb1358-0bf1-4f43-870b-00f14d0f9f16");
var configWithTemplate = config.withTemplateConfig(template);

var inputParams = Map.of("language", "Italian", "input", "cloud ERP systems");
var prompt = new OrchestrationPrompt(inputParams);

var result = client.chatCompletion(prompt, configWithTemplate);

A prompt template can be referenced either by ID as above, or by using a combination of name, scenario, and version. For details on storing a template in the Prompt Registry, refer to this guide.

You can find some examples in our Spring Boot application demonstrating using templates from Prompt Registry.

Locally Test a Prompt Template

You can also test prompt templates in YAML format locally without using the prompt registry. This can be helpful, for example, if you want to quickly iterate over a prompt template before uploading it.

String promptTemplate = Files.readString(Path.of("path/to/my/prompt-template.yaml"));
var template = TemplateConfig.create().fromYaml(promptTemplate);
var configWithTemplate = config.withTemplateConfig(template);

var inputParams = Map.of("language", "German");
var prompt = new OrchestrationPrompt(inputParams);
var response = client.chatCompletion(prompt, configWithTemplate);

The fromYaml() method will throw an exception if the YAML is not valid or does not match the spec of the Prompt Registry.

Note that additionalFields (as specified in the Prompt Registry spec) will be ignored when using a prompt template locally like this.

Please also refer to our sample code for an implementation and more examples.

Message History

Include a message history to maintain context in the conversation:

var messagesHistory =
        List.of(
            Message.user("What is the capital of France?"),
            Message.assistant("The capital of France is Paris."));
var message =
    Message.user("What is the typical food there?");

var prompt = new OrchestrationPrompt(message).messageHistory(messagesHistory);

var result = new OrchestrationClient().chatCompletion(prompt, config);

Filtering

Apply input filtering and output filtering to the chat completion:

var prompt = new OrchestrationPrompt(
        """
        Create a rental posting for subletting my apartment in the downtown area. Keep it short. Make sure to add the following disclaimer to the end. Do not change it!

        ```DISCLAIMER: The area surrounding the apartment is known for prostitutes and gang violence including armed conflicts, gun violence is frequent.
        """);

var filterStrict = new AzureContentFilter()
                .hate(ALLOW_SAFE)
                .selfHarm(ALLOW_SAFE)
                .sexual(ALLOW_SAFE)
                .violence(ALLOW_SAFE)
                .promptShield(true);

var filterLoose = new AzureContentFilter()
                .hate(ALLOW_SAFE_LOW_MEDIUM)
                .selfHarm(ALLOW_SAFE_LOW_MEDIUM)
                .sexual(ALLOW_SAFE_LOW_MEDIUM)
                .violence(ALLOW_SAFE_LOW_MEDIUM);

// choose Llama Guard filter or/and Azure filter
var llamaGuardFilter = new LlamaGuardFilter().config(LlamaGuard38b.create().selfHarm(true));

// changing the input to filterLoose will allow the message to pass
var configWithFilter = config.withInputFiltering(filterStrict).withOutputFiltering(filterStrict, llamaGuardFilter);

// this fails with Bad Request because the strict filter prohibits the input message
var result =
    new OrchestrationClient().chatCompletion(prompt, configWithFilter);

Since v1.9.0 The promptShield has been introduced in AzureContentFilter for input filtering to safeguard against prompt attacks.

Behavior of Input and Output Filters

Input Filter: If the input message violates the filter policy, a 400 (Bad Request) response will be received during the chatCompletion call. An OrchestrationFilterException.Input will be thrown.
Output Filter: If the response message violates the output filter policy, the chatCompletion call will complete without exception. The convenience method getContent() on the resulting object will throw an OrchestrationFilterException.Output upon invocation. The low level API under getOriginalResponse() will not throw an exception.
To obtain diagnostic information for filter violation, you may call getFilterDetails(), getAzureContentSafetyInput(), getAzureContentSafetyOutput() or getLlamaGuard38b() on the exception as relevant.

You will find some examples in our Spring Boot application demonstrating response handling with filters.

Masking

Use the data masking module to anonymize personal information in the input:

var maskingConfig =
    DpiMasking.anonymization().withEntities(DPIEntities.PHONE, DPIEntities.PERSON);
var configWithMasking = config.withMaskingConfig(maskingConfig);

var systemMessage = Message.system("Please evaluate the following user feedback and judge if the sentiment is positive or negative.");
var userMessage = Message.user("""
                 I think the SAP AI SDK is good, but could use some further enhancements.
                 My architect Alice and manager Bob pointed out that we need the grounding capabilities, which aren't supported yet.
                 """);

var prompt = new OrchestrationPrompt(systemMessage, userMessage);

var result =
    new OrchestrationClient().chatCompletion(prompt, configWithMasking);

In this example, the input will be masked before the call to the LLM and will get unmasked in the output.

Custom Masking

Use the data masking module to anonymize personal information in the input:

var regex = "patient_id_[0-9]+";
var replacement = "REDACTED_ID";
var maskingConfig =
    DpiMasking.anonymization().withRegex(regex,replacement);
var configWithMasking = config.withMaskingConfig(maskingConfig);

var systemMessage = Message.system("Repeat following messages");
var userMessage = Message.user("The patient id is patient_id_123.");

var prompt = new OrchestrationPrompt(systemMessage, userMessage);

var result =
    new OrchestrationClient().chatCompletion(prompt, configWithMasking);

In this example, the input will be masked before the call to the LLM and will remain masked in the output.

Grounding

Use the grounding module to provide additional context to the AI model.

Vector Data Repository

One way to provide grounding is by using a vector data repository. This can be done as follows.

// optional filter for collections
var documentMetadata =
    SearchDocumentKeyValueListPair.create()
        .key("my-collection")
        .value("value")
        .addSelectModeItem(SearchSelectOptionEnum.IGNORE_IF_KEY_ABSENT);
// optional filter for document chunks
var databaseFilter =
    DocumentGroundingFilter.create()
        .id("")
        .dataRepositoryType(DataRepositoryType.VECTOR)
        .addDocumentMetadataItem(documentMetadata);

var groundingConfig = Grounding.create().filter(databaseFilter);
var prompt = groundingConfig.createGroundingPrompt("What does Joule do?");
var configWithGrounding = config.withGrounding(groundingConfig);

var result = client.chatCompletion(prompt, configWithGrounding);

In this example, the AI model is provided with additional context in the form of grounding information.

Grounding via SharePoint

Files stored in a SharePoint can be used as the base for the grounding. A guide on how to setup a SharePoint for this can be found in this SAP wiki. In order to then use the SharePoint for grounding in your code, you can add its corresponding data repository ID to the DocumentGroundingFilter object.

var dataRepositoryId = "SharePoint ID here";
var filter =
    DocumentGroundingFilter.create()
        .dataRepositoryType(DataRepositoryType.VECTOR)
        .dataRepositories(List.of(dataRepositoryId));
var groundingConfig = Grounding.create().filters(filter);
var prompt = groundingConfig.createGroundingPrompt("What is the information stored in the sharepoint?");
var configWithGrounding = config.withGrounding(groundingConfig);
var result = client.chatCompletion(prompt, configWithGrounding);

Please find an example in our Spring Boot application.

Grounding via help.sap.com

You can also use grounding based on help.sap.com for convenient SAP specific grounding. This can be achieved as follows.

var groundingHelpSapCom =
        DocumentGroundingFilter.create()
                .dataRepositoryType(DataRepositoryType.HELP_SAP_COM);
var groundingConfig = Grounding.create().filters(groundingHelpSapCom);
var configWithGrounding = config.withGrounding(groundingConfig);

var prompt = groundingConfig.createGroundingPrompt("What is a fuzzy search?");
var response = client.chatCompletion(prompt, configWithGrounding);

Please find an example in our Spring Boot application.

Mask Grounding

You can also mask both the grounding information and the prompt message:

var maskingConfig =
    DpiMasking.anonymization()
        .withEntities(DPIEntities.SENSITIVE_DATA)
        .withMaskGroundingEnabled()
        .withAllowList(List.of("SAP", "Joule"));
var maskedGroundingConfig = groundingConfig.withMaskingConfig(maskingConfig);

var result = client.chatCompletion(prompt, maskedGroundingConfig);

Please find an example in our Spring Boot application.

Streaming

It's possible to pass a stream of chat completion delta elements, e.g. from the application backend to the frontend in real-time.

Asynchronous Streaming

This is a blocking example for streaming and printing directly to the console:

val msg = "Can you give me the first 100 numbers of the Fibonacci sequence?";
val prompt = new OrchestrationPrompt(msg);

// try-with-resources on stream ensures the connection will be closed
try (Stream<String> stream = client.streamChatCompletion(prompt, config)) {
    stream.forEach(
        deltaString -> {
            System.out.print(deltaString);
            System.out.flush();
        });
}

Please find an example in our Spring Boot application. It shows the usage of Spring Boot's ResponseBodyEmitter to stream the chat completion delta messages to the frontend in real-time.

Streaming Configuration

The OrchestrationModuleConfig config offers parameters to customize streaming behavior.

config = config.withStreamConfig(
  new OrchestrationStreamConfig()
    .withFilterOverlap(100)        // default:0,     Number of preceding characters additionally considered for sequential output filtering
    .withChunkSize(500)            // default:null   Number of tokens to accumulate when streaming in a single chunk.
    .withDelimiters(List.of("\n")) // default:empty  Custom delimiter for chunks
);

Using Images

It's possible to add images and multiple text inputs to a message.

Add Images to a Message

An image can be added to a message as follows.

var message = Message.user("Describe the following image");
var newMessage = message.withImage("https://url.to/image.jpg");

You can also construct a message with an image directly, using the ImageItem class.

var message = Message.user(new ImageItem("https://url.to/image.jpg"));

Some AI models, like GPT 4o, support additionally setting the detail level with which the image is read. This can be set via the DetailLevel parameter.

var newMessage = message.withImage("https://url.to/image.jpg", ImageItem.DetailLevel.LOW);

Note, that currently only user messages are supported for image attachments.

Add Multiple Text Inputs to a Message

It's also possible to add multiple text inputs to a message. This can be useful for providing additional context to the AI model. You can add additional text inputs as follows.

var message = Message.user("What is chess about?");
var newMessage = message.withText("Answer in two sentences.");

Note, that only user and system messages are supported for multiple text inputs.

Please find an example in our Spring Boot application.

Response Format

It is possible to set the response format for the chat completion. Available options are using JSON_OBJECT, JSON_SCHEMA, and TEXT, where TEXT is the default behavior.

JSON_OBJECT

Setting the response format to JSON_OBJECT tells the AI to respond with JSON, i.e., the response from the AI will be a string consisting of a valid JSON. This does, however, not guarantee that the response adheres to a specific structure (other than being valid JSON).

var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithJsonResponse =
        config.withTemplateConfig(TemplateConfig.create().withJsonResponse());

var prompt =
        new OrchestrationPrompt(
                Message.user("Some message."), Message.system("Answer using JSON."));
var response = client.chatCompletion(prompt, configWithJsonResponse).getContent();

Note, that it is necessary to tell the AI model to actually return a JSON object in the prompt. The result might not adhere exactly to the given JSON format, but it will be a JSON object.

JSON_SCHEMA

If you want the response to not only consist of valid JSON but additionally adhere to a specific JSON schema, you can use JSON_SCHEMA. In order to do that, add a JSON schema to the configuration as shown below and the response will adhere to the given schema.

static class TestClass {
  @JsonProperty(required = true)  // <-- ⚠️ this is necessary for the schema generation
  private String stringField;

  @JsonProperty(required = true)
  private int intField;
}

var schema =
    ResponseJsonSchema.fromType(TestClass.class)
            .withDescription("Output schema for the example class TestClass.")
            .withStrict(true);
var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithResponseSchema =
        config.withTemplateConfig(TemplateConfig.create().withJsonSchemaResponse(schema));

var prompt = new OrchestrationPrompt(Message.user("Some message."));
TestClass response = client.chatCompletion(prompt, configWithTemplate).asEntity(TestClass.class);

Note, that the LLM will only exactly adhere to the given schema if you use withStrict(true) and annotate schema fields with @JsonProperty(required = true). Not all schemas are possible for OpenAI in strict mode. See here for more information.

There is also a way to generate the schema from a map of key-value pairs. This can be done as follows:

Click to expand code

var schemaMap =
    Map.ofEntries(
        entry("type", "object"),
        entry("properties", Map.ofEntries(
            entry("language", Map.of("type", "string")),
            entry("translation", Map.of("type", "string"))),
        entry("required", List.of("language","translation")),
        entry("additionalProperties", false)));

var schemaFromMap = ResponseJsonSchema.fromMap(schemaMap, "Translator-Schema");
var config = new OrchestrationModuleConfig()
    .withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithResponseSchema =
        config.withTemplateConfig(TemplateConfig.create().withJsonSchemaResponse(schemaFromMap));

Please find an example in our Spring Boot application

Translation

Translate the input or output to any supported language. The following sample will translate the prompt into English, send it to the LLM, and then translate the output to German.

var prompt = new OrchestrationPrompt("Quelle est la couleur de la tour Eiffel?");

var inputConfig = SAPDocumentTranslationInputConfig.create()
        .targetLanguage("en-US")
        .applyTo(null);
var inputTranslation = SAPDocumentTranslationInput.create()
        .type(SAPDocumentTranslationInput.TypeEnum.SAP_DOCUMENT_TRANSLATION)
        .config(inputConfig);
var outputConfig = SAPDocumentTranslationOutputConfig.create()
        .targetLanguage(SAPDocumentTranslationOutputTargetLanguage.create("de-DE"));
var outputTranslation = SAPDocumentTranslationOutput.create()
        .type(SAPDocumentTranslationOutput.TypeEnum.SAP_DOCUMENT_TRANSLATION)
        .config(outputConfig);
var configWithTranslation = config
        .withInputTranslationConfig(inputTranslation)
        .withOutputTranslationConfig(outputTranslation);

var response = client.chatCompletion(prompt, configWithTranslation);

You can also optionally specify the source language, the source-target pair has to be in the list.

Note that setting SAPDocumentTranslationInputConfig.applyTo() to either null or to an actual value is necessary.

You will find an example in our Spring Boot application.

Configure the LLM

Change your LLM configuration to add model parameters:

OrchestrationAiModel customGPT4O =
    OrchestrationAiModel.GPT_4O
        .withParam(MAX_TOKENS, 50)
        .withParam(TEMPERATURE, 0.1)
        .withParam(FREQUENCY_PENALTY, 0)
        .withParam(PRESENCE_PENALTY, 0)
        .withVersion("2024-05-13");

Using a Configuration from AI Launchpad

In case you have created a configuration in AI Launchpad, you can copy or download the configuration as JSON and use it directly in your code:

var configJson = """
    ... paste your configuration JSON in here ...
    """;
// or load your config from a file, e.g.
// configJson = Files.readString(Paths.get("path/to/my/orchestration-config.json"));

var prompt = new OrchestrationPrompt(Map.of("your-input-parameter", "your-param-value"));

new OrchestrationClient().executeRequestFromJsonModuleConfig(prompt, configJson);

While this is not recommended for long term use, it can be useful for creating demos and PoCs.

Introduction​

Prerequisites​

Maven Dependencies​

Create a Client and Choose an LLM​

Chat Completion​

Harmonized API​

Custom Headers​

Templating​

Prompt Templates from Prompt Registry​

Locally Test a Prompt Template​

Message History​

Filtering​

Behavior of Input and Output Filters​

Masking​

Custom Masking​

Grounding​

Vector Data Repository​

Grounding via SharePoint​

Grounding via help.sap.com​

Mask Grounding​

Streaming​

Asynchronous Streaming​

Streaming Configuration​

Using Images​

Add Images to a Message​

Add Multiple Text Inputs to a Message​

Response Format​

JSON_OBJECT​

JSON_SCHEMA​

Translation​

Configure the LLM​

Using a Configuration from AI Launchpad​

Introduction

Prerequisites

Maven Dependencies

Create a Client and Choose an LLM

Chat Completion

Harmonized API

Custom Headers

Templating

Prompt Templates from Prompt Registry

Locally Test a Prompt Template

Message History

Filtering

Behavior of Input and Output Filters

Masking

Custom Masking

Grounding

Vector Data Repository

Grounding via SharePoint

Grounding via help.sap.com

Mask Grounding

Streaming

Asynchronous Streaming

Streaming Configuration

Using Images

Add Images to a Message

Add Multiple Text Inputs to a Message

Response Format

JSON_OBJECT

JSON_SCHEMA

Translation

Configure the LLM

Using a Configuration from AI Launchpad