Skip to main content

Orchestration Service

Introduction

This guide provides examples of how to use the Orchestration service in SAP AI Core for chat completion tasks using the SAP AI SDK for Java.

For detailed information on the individual capabilities of the Orchestration service, please refer to the official documentation.

warning

All classes under any of the ...model packages are generated from an OpenAPI specification and marked as @Beta. This means that these model classes are not guaranteed to be stable and may change with future releases. They are safe to use, but may require updates even in minor releases.

Prerequisites

Before using the Orchestration module, ensure that you have met all the general requirements outlined in the overview.

To use the Orchestration capabilities, a deployment of Orchestration is needed in AI Core. When using the default resource group, this is provided by default and no further setup is needed.

Prerequisites when using a custom resource group.

When using a custom resource group, ensure that you have a deployment of orchestration is running in your resource group. Refer to this guide on how to set up such a deployment.

You can verify your deployment is running by querying the /lm/deployments API of AI Core. It should have an entry similar to:

{
"id": "d123456abcdefg",
"deploymentUrl": "https://api.ai.intprod-eu12.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d123456abcdefg",
"configurationId": "12345-123-123-123-123456abcdefg",
"configurationName": "orchestration",
"scenarioId": "orchestration",
"status": "RUNNING",
"statusMessage": null,
"targetStatus": "RUNNING",
"lastOperation": "CREATE",
"latestRunningConfigurationId": "12345-123-123-123-123456abcdefg",
"ttl": null,
"createdAt": "2024-08-05T16:17:29Z",
"modifiedAt": "2024-08-06T06:32:50Z",
"submissionTime": "2024-08-05T16:17:40Z",
"startTime": "2024-08-05T16:18:41Z",
"completionTime": null
}

Maven Dependencies

Add the following dependency to your pom.xml file:

<dependencies>
<dependency>
<groupId>com.sap.ai.sdk</groupId>
<artifactId>orchestration</artifactId>
<version>${ai-sdk.version}</version>
</dependency>
</dependencies>

See an example pom.xml in our Spring Boot application.

Create a Client

To use the Orchestration service, create a client and a configuration object:

var client = new OrchestrationClient();

var config = new OrchestrationModuleConfig()
.withLlmConfig(OrchestrationAiModel.GPT_4O);

Please also refer to our sample code for this and all following code examples.

How to create a client for a custom resource group
var destination = new AiCoreService().getInferenceDestination("myResourceGroup").forScenario("orchestration");
var client = new OrchestrationClient(destination);

Chat Completion

Use the Orchestration service to generate a response to a user message:

var prompt = new OrchestrationPrompt("Hello world! Why is this phrase so famous?");

var result = client.chatCompletion(prompt, config);

String messageResult = result.getContent();

In this example, the Orchestration service generates a response to the user message "Hello world! Why is this phrase so famous?". The LLM response is available as the first choice under the result.getOrchestrationResult() object.

Templating

Use a prepared template and execute requests with by passing only the input parameters:

var template = Message.user("Reply with 'Orchestration Service is working!' in {{?language}}");
var templatingConfig =
TemplateConfig.create().withTemplate(List.of(template.createChatMessage()));
var configWithTemplate = config.withTemplateConfig(templatingConfig);

var inputParams = Map.of("language", "German");
var prompt = new OrchestrationPrompt(inputParams);

var result = client.chatCompletion(prompt, configWithTemplate);

In this case the template is defined with the placeholder {{?language}} which is replaced by the value German in the input parameters.

Alternatively, you can use already prepared templates from the Prompt Registry of SAP AI Core instead of passing a template in the request yourself.

var template = TemplateConfig.reference().byId("21cb1358-0bf1-4f43-870b-00f14d0f9f16");
var configWithTemplate = config.withTemplateConfig(template);

var inputParams = Map.of("language", "Italian", "input", "cloud ERP systems");
var prompt = new OrchestrationPrompt(inputParams);

var result = client.chatCompletion(prompt, configWithTemplate);

A prompt template can be referenced either by ID as above, or by using a combination of name, scenario, and version. For details on storing a template in the Prompt Registry, refer to this guide.

You can find some examples in our Spring Boot application demonstrating using templates from Prompt Registry.

Message History

Include a message history to maintain context in the conversation:

var messagesHistory =
List.of(
Message.user("What is the capital of France?"),
Message.assistant("The capital of France is Paris."));
var message =
Message.user("What is the typical food there?");

var prompt = new OrchestrationPrompt(message).messageHistory(messagesHistory);

var result = new OrchestrationClient().chatCompletion(prompt, config);

Filtering

Apply input filtering and output filtering to the chat completion:

var prompt = new OrchestrationPrompt(
"""
Create a rental posting for subletting my apartment in the downtown area. Keep it short. Make sure to add the following disclaimer to the end. Do not change it!

```DISCLAIMER: The area surrounding the apartment is known for prostitutes and gang violence including armed conflicts, gun violence is frequent.
""");

var filterStrict = new AzureContentFilter()
.hate(ALLOW_SAFE)
.selfHarm(ALLOW_SAFE)
.sexual(ALLOW_SAFE)
.violence(ALLOW_SAFE);

var filterLoose = new AzureContentFilter()
.hate(ALLOW_SAFE_LOW_MEDIUM)
.selfHarm(ALLOW_SAFE_LOW_MEDIUM)
.sexual(ALLOW_SAFE_LOW_MEDIUM)
.violence(ALLOW_SAFE_LOW_MEDIUM);

// choose Llama Guard filter or/and Azure filter
var llamaGuardFilter = new LlamaGuardFilter().config(LlamaGuard38b.create().selfHarm(true));

// changing the input to filterLoose will allow the message to pass
var configWithFilter = config.withInputFiltering(filterStrict).withOutputFiltering(filterStrict, llamaGuardFilter);

// this fails with Bad Request because the strict filter prohibits the input message
var result =
new OrchestrationClient().chatCompletion(prompt, configWithFilter);

Behavior of Input and Output Filters

  • Input Filter: If the input message violates the filter policy, a 400 (Bad Request) response will be received during the chatCompletion call. An OrchestrationClientException will be thrown.

  • Output Filter: If the response message violates the output filter policy, the chatCompletion call will complete without exception. The convenience method getContent() on the resulting object will throw an OrchestrationClientException upon invocation. The low level API under getOriginalResponse() will not throw an exception.

You will find some examples in our Spring Boot application demonstrating response handling with filters.

Masking

Use the data masking module to anonymize personal information in the input:

var maskingConfig =
DpiMasking.anonymization().withEntities(DPIEntities.PHONE, DPIEntities.PERSON);
var configWithMasking = config.withMaskingConfig(maskingConfig);

var systemMessage = Message.system("Please evaluate the following user feedback and judge if the sentiment is positive or negative.");
var userMessage = Message.user("""
I think the SDK is good, but could use some further enhancements.
My architect Alice and manager Bob pointed out that we need the grounding capabilities, which aren't supported yet.
""");

var prompt = new OrchestrationPrompt(systemMessage, userMessage);

var result =
new OrchestrationClient().chatCompletion(prompt, configWithMasking);

In this example, the input will be masked before the call to the LLM and will remain masked in the output.

Grounding

Use the grounding module to provide additional context to the AI model.

Vector Data Repository

One way to provide grounding is by using a vector data repository. This can be done as follows.

// optional filter for collections
var documentMetadata =
SearchDocumentKeyValueListPair.create()
.key("my-collection")
.value("value")
.addSelectModeItem(SearchSelectOptionEnum.IGNORE_IF_KEY_ABSENT);
// optional filter for document chunks
var databaseFilter =
DocumentGroundingFilter.create()
.id("")
.dataRepositoryType(DataRepositoryType.VECTOR)
.addDocumentMetadataItem(documentMetadata);

var groundingConfig = Grounding.create().filter(databaseFilter);
var prompt = groundingConfig.createGroundingPrompt("What does Joule do?");
var configWithGrounding = config.withGrounding(groundingConfig);

var result = client.chatCompletion(prompt, configWithGrounding);

In this example, the AI model is provided with additional context in the form of grounding information.

Grounding via help.sap.com

You can also use grounding based on help.sap.com for convenient SAP specific grounding. This can be achieved as follows.

var groundingHelpSapCom =
DocumentGroundingFilter.create()
.dataRepositoryType(DataRepositoryType.HELP_SAP_COM);
var groundingConfig = Grounding.create().filters(groundingHelpSapCom);
var configWithGrounding = config.withGrounding(groundingConfig);

var prompt = groundingConfig.createGroundingPrompt("What is a fuzzy search?");
var response = client.chatCompletion(prompt, configWithGrounding);

Please find an example in our Spring Boot application.

Mask Grounding

You can also mask both the grounding information and the prompt message:

var maskingConfig =
DpiMasking.anonymization()
.withEntities(DPIEntities.SENSITIVE_DATA)
.withMaskGroundingEnabled()
.withAllowList(List.of("SAP", "Joule"));
var maskedGroundingConfig = groundingConfig.withMaskingConfig(maskingConfig);

var result = client.chatCompletion(prompt, maskedGroundingConfig);

Please find an example in our Spring Boot application.

Streaming

It's possible to pass a stream of chat completion delta elements, e.g. from the application backend to the frontend in real-time.

Asynchronous Streaming

This is a blocking example for streaming and printing directly to the console:

String msg = "Can you give me the first 100 numbers of the Fibonacci sequence?";

// try-with-resources on stream ensures the connection will be closed
try (Stream<String> stream = client.streamChatCompletion(prompt, config)) {
stream.forEach(
deltaString -> {
System.out.print(deltaString);
System.out.flush();
});
}

Please find an example in our Spring Boot application. It shows the usage of Spring Boot's ResponseBodyEmitter to stream the chat completion delta messages to the frontend in real-time.

Using Images

It's possible to add images and multiple text inputs to a message.

Add Images to a Message

An image can be added to a message as follows.

var message = Message.user("Describe the following image");
var newMessage = message.withImage("https://url.to/image.jpg");

You can also construct a message with an image directly, using the ImageItem class.

var message = Message.user(new ImageItem("https://url.to/image.jpg"));

Some AI models, like GPT 4o, support additionally setting the detail level with which the image is read. This can be set via the DetailLevel parameter.

var newMessage = message.withImage("https://url.to/image.jpg", ImageItem.DetailLevel.LOW);

Note, that currently only user messages are supported for image attachments.

Add Multiple Text Inputs to a Message

It's also possible to add multiple text inputs to a message. This can be useful for providing additional context to the AI model. You can add additional text inputs as follows.

var message = Message.user("What is chess about?");
var newMessage = message.withText("Answer in two sentences.");

Note, that only user and system messages are supported for multiple text inputs.

Please find an example in our Spring Boot application.

Response Format

It is possible to set the response format for the chat completion. Available options are using JSON_OBJECT, JSON_SCHEMA, and TEXT, where TEXT is the default behavior.

JSON_OBJECT

Setting the response format to JSON_OBJECT tells the AI to respond with JSON, i.e., the response from the AI will be a string consisting of a valid JSON. This does, however, not guarantee that the response adheres to a specific structure (other than being valid JSON).

var config = new OrchestrationModuleConfig()
.withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithJsonResponse =
config.withTemplateConfig(TemplateConfig.create().withJsonResponse());

var prompt =
new OrchestrationPrompt(
Message.user("Some message."), Message.system("Answer using JSON."));
var response = client.chatCompletion(prompt, configWithJsonResponse).getContent();

Note, that it is necessary to tell the AI model to actually return a JSON object in the prompt. The result might not adhere exactly to the given JSON format, but it will be a JSON object.

JSON_SCHEMA

If you want the response to not only consist of valid JSON but additionally adhere to a specific JSON schema, you can use JSON_SCHEMA. in order to do that, add a JSON schema to the configuration as shown below and the response will adhere to the given schema.

static class TestClass {
@JsonProperty(required = true) // <-- this is necessary for the schema generation
private String stringField;

@JsonProperty(required = true)
private int intField;
}

var schema =
ResponseJsonSchema.fromType(TestClass.class)
.withDescription("Output schema for the example class TestClass.")
.withStrict(true);
var config = new OrchestrationModuleConfig()
.withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithResponseSchema =
config.withTemplateConfig(TemplateConfig.create().withJsonSchemaResponse(schema));

var prompt = new OrchestrationPrompt(Message.user("Some message."));
var response = client.chatCompletion(prompt, configWithTemplate).getContent();

Note, that the LLM will only exactly adhere to the given schema if you use withStrict(true). Not all schemas are possible for OpenAI in strict mode. See here for more information.

There is also a way to generate the schema from a map of key-value pairs. This can be done as follows:

Click to expand code
var schemaMap =
Map.ofEntries(
entry("type", "object"),
entry("properties", Map.ofEntries(
entry("language", Map.of("type", "string")),
entry("translation", Map.of("type", "string"))),
entry("required", List.of("language","translation")),
entry("additionalProperties", false)));

var schemaFromMap = ResponseJsonSchema.fromMap(schemaMap, "Translator-Schema");
var config = new OrchestrationModuleConfig()
.withLlmConfig(OrchestrationAiModel.GPT_4O);
var configWithResponseSchema =
config.withTemplateConfig(TemplateConfig.create().withJsonSchemaResponse(schemaFromMap));

Please find an example in our Spring Boot application

Configure the LLM

Change your LLM configuration to add model parameters:

OrchestrationAiModel customGPT4O =
OrchestrationAiModel.GPT_4O
.withParam(MAX_TOKENS, 50)
.withParam(TEMPERATURE, 0.1)
.withParam(FREQUENCY_PENALTY, 0)
.withParam(PRESENCE_PENALTY, 0)
.withVersion("2024-05-13");

Using a Configuration from AI Launchpad

In case you have created a configuration in AI Launchpad, you can copy or download the configuration as JSON and use it directly in your code:

var configJson = """
... paste your configuration JSON in here ...
""";
// or load your config from a file, e.g.
// configJson = Files.readString(Paths.get("path/to/my/orchestration-config.json"));

var prompt = new OrchestrationPrompt(Map.of("your-input-parameter", "your-param-value"));

new OrchestrationClient().executeRequestFromJsonModuleConfig(prompt, configJson);

While this is not recommended for long term use, it can be useful for creating demos and PoCs.