Chat Completion
The @sap-ai-sdk/orchestration
package provides a client for the orchestration service of SAP AI Core.
The orchestration service harmonizes the API of various generative AI models, enabling seamless integration and interaction with different models through a unified interface.
Additionally, it provides features like templating, content filtering, grounding and more to enhance the interaction with generative AI models.
Find more details about orchestration workflow here.
Installation
$ npm install @sap-ai-sdk/orchestration
Quick Start
Initialize a client with proper orchestration configuration for each module.
Below is an example for configuring the mandatory llm
module.
In addition, you can find more sample code here.
import { OrchestrationClient } from '@sap-ai-sdk/orchestration';
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o'
}
});
Here, the model name is specified along with a user message as the prompt.
To send a chat completion request, use the chatCompletion()
method.
Use the following convenience methods for handling chat completion response:
getContent()
parses the response and returns the model's output as a string.getFinishReason()
retrieves thefinish_reason
explaining why chat completion request stopped.getTokenUsage()
provides token usage details, includingtotal_tokens
,prompt_tokens
, andcompletion_tokens
.getAllMessages()
parses the response and returns a list of all messages.getAssistantMessage()
parses the response and returns the assistant message.getToolCalls()
parses the response and returns a list of tool calls generated by the model.getRefusal()
parses the response and returns the refusal message from the model.
const response = await orchestrationClient.chatCompletion({
messages: [
{ role: 'user', content: 'Hello World! Why is this phrase so famous?' }
]
});
console.log(response.getContent());
console.log(response.getFinishReason());
console.log(JSON.stringify(response.getTokenUsage()));
The following sections provide detailed configuration for each module.
LLM Configuration
Choose the LLM by setting the model_name
property in llm
module configuration.
Optionally, define model_version
(default: latest
) and model_params
for custom settings.
const llm: LlmModuleConfig = {
model_name: 'gpt-4o',
model_version: '2024-08-06', // optional
model_params: {
// optional
max_tokens: 50,
temperature: 0.1
}
};
Harmonized API
The Harmonized API lets you use different foundation models without the need to change the client code. You can switch from one LLM to another easily like the example below.
// Original config for using GPT 4o
/*
const llm: LlmModuleConfig = {
model_name: 'gpt-4o'
};
*/
// Switch to Claude 3.5 Sonnet
const llm: LlmModuleConfig = {
model_name: 'anthropic--claude-3.5-sonnet'
};
Thanks to the harmonized API, all available LLMs on the SAP Generative AI Hub can be accessed through orchestration, including:
- OpenAI GPT 4o
- OpenAI o1
- OpenAI o3 mini
- AWS Anthropic Claude
- AWS Amazon Nova
- GCP VertexAI Gemini
- Mistral AI
Check the SAP Notes for all available LLMs on SAP Generative AI Hub.
Templating
Use the orchestration client with the templating
configuration to define a static prompt.
This prompt can include placeholders, which are replaced with values from inputParams
during a chatCompletion()
method call.
This setup is useful when the base structure of your prompt remains the same across requests.
const orchestrationClient = new OrchestrationClient({
llm,
templating: {
template: [
{ role: 'user', content: 'What is the capital of {{?country}}?' }
]
}
});
const response = await orchestrationClient.chatCompletion({
inputParams: { country: 'France' }
});
The template
messages defined in the client configuration are static - they are always included in every chatCompletion()
request.
For more dynamic prompts, i.e., you want to vary the full message list per request without re-initializing the client, use the messages
property instead.
const orchestrationClient = new OrchestrationClient({ llm });
const response = await orchestrationClient.chatCompletion({
messages: [{ role: 'user', content: 'What is the capital of {{?country}}?' }],
inputParams: { country: 'France' }
});
Prompt Registry
You can define template content within the templating configuration, or refer to a template defined in the Prompt Registry.
const orchestrationClient = new OrchestrationClient({
llm,
templating: {
template_ref: {
name: 'my-get-capital-template',
scenario: 'my-scenario',
version: '0.0.1'
}
}
});
const response = orchestrationClient.chatCompletion({
inputParams: { country: 'France' }
});
A prompt template can be referenced either by ID, or by a combination of name, scenario and version. For details on storing a template in the Prompt Registry, refer to this guide.
Local Prompt Template
You can define the template in a YAML file and pass its content as a string to the templating
property.
This is useful for testing a prompt template locally before storing it in the Prompt Registry.
import { readFileSync } from 'fs';
import { OrchestrationClient } from '@sap-ai-sdk/orchestration';
// Read the YAML file containing the prompt template
const yamlTemplate = readFileSync('./path/to/prompt-template.yaml', 'utf-8');
const orchestrationClient = new OrchestrationClient({
llm,
templating: yamlTemplate
});
const response = orchestrationClient.chatCompletion({
inputParams: { country: 'France' }
});
The YAML string is parsed and validated against the expected schema. If the YAML is invalid or does not conform to the schema, an error will be thrown.
This section shows the format in which a prompt template can be defined.
Function Calling
Define and pass tool definitions to enable the model to call specific functions. Here's an example of temperature conversion using tool calls:
First, define the tool with name
, description
and parameters
properties:
const convertTemperatureTool: ChatCompletionTool = {
type: 'function',
function: {
name: 'convert_temperature_to_fahrenheit',
description: 'Converts temperature from Celsius to Fahrenheit',
parameters: {
type: 'object',
properties: {
temperature: {
type: 'number',
description: 'The temperature value in Celsius to convert.'
}
},
required: ['temperature']
}
}
};
Set strict
to true
to ensure function calls adhere to the function schema.
For more information refer to Orchestration Tool Calling.
Initialize the client and send the initial request with the tool definition:
const client = new OrchestrationClient({
llm: {
model_name: 'gpt-4o'
},
templating: {
tools: [convertTemperatureTool]
}
});
const response = await client.chatCompletion({
messages: [
{ role: 'user', content: 'Convert 20 degrees Celsius to Fahrenheit.' }
]
});
When the model decides to use a tool, it returns the function name and input arguments in the response. Use the model response to execute the function.
const initialResponse = response.getAssistantMessage();
let toolMessage: ToolChatMessage;
if (initialResponse & initialResponse.tool_calls) {
const toolCall = initialResponse.tool_calls[0];
const name = toolCall.function.name;
const args = JSON.parse(toolCall.function.arguments);
// Execute the function with the provided arguments
const toolResult = callFunction(name, args);
toolMessage: ToolChatMessage = {
role: 'tool',
content: toolResult,
tool_call_id: toolCall.id
};
}
The callFunction
function routes the calls to the actual function implementations.
function callFunction(name: string, args: any): string {
switch (name) {
case 'convert_temperature_to_fahrenheit':
return convertTemperatureToFahrenheit(args.temperature);
default:
throw new Error(`Function: ${name} not found!`);
}
}
function convertTemperatureToFahrenheit(temperature: number): string {
return `The temperature in Fahrenheit is ${(temperature * 9) / 5 + 32}°F.`;
}
Send the function result back to the model to get it's final response:
const finalResponse = await client.chatCompletion({
messages: [toolMessage],
messagesHistory: response.getAllMessages()
});
console.log(finalResponse.getContent());
Response Format
For general response formatting, use the response_format
parameter.
It is useful when model is not calling a tool and should still return a structured response.
The example below returns a JSON Schema with strict
set to true
to let the response adhere to the schema definition.
const templating: TemplatingModuleConfig = {
response_format: {
type: 'json_schema',
json_schema: {
name: 'capital_response',
strict: true,
schema: {
type: 'object',
properties: {
country_name: {
type: 'string',
description: 'The name of the country provided by the user.'
},
capital: {
type: 'string',
description: 'The capital city of the country.'
}
},
required: ['country_name', 'capital']
}
}
}
};
You can also define JSON schema using Zod schema as shown below:
import * as z from 'zod/v4';
import { toJsonSchema } from '@langchain/core/utils/json_schema';
import { ResponseFormatJsonSchema } from '@sap-ai-sdk/orchestration';
const countryCapitalSchema = z
.object({
country_name: z.string(),
capital: z.string()
})
.strict();
const response_format: ResponseFormatJsonSchema = {
type: 'json_schema',
json_schema: {
name: 'capital_response',
strict: true,
schema: toJsonSchema(countryCapitalSchema)
}
};
Message History
The orchestration service supports chat completion with message history. This allows the model to remember previous messages in the conversation, enabling a more coherent and context-aware interaction.
Provide the message history in the messagesHistory
parameter when calling the chatCompletion()
method.
const orchestrationClient = new OrchestrationClient({ llm });
const response = await orchestrationClient.chatCompletion({
messages: [{ role: 'user', content: 'What is my name?' }],
messagesHistory: [
{
role: 'system',
content:
'You are a helpful assistant who remembers all details the user shares with you.'
},
{
role: 'user',
content: 'Hi! Im Bob'
},
{
role: 'assistant',
content:
"Hi Bob, nice to meet you! I'm an AI assistant. I'll remember that your name is Bob as we continue our conversation."
}
]
});
Image Recognition
Many models in the orchestration service have image recognition capabilities, meaning the models can take images as input and answer questions about them.
The image_url
content type can only be used in messages with role: 'user'
.
Attempting to use image_url
in non-user messages will result in an error.
const orchestrationClient = new OrchestrationClient({ llm });
const response = await orchestrationClient.chatCompletion({
messages: [
{
role: 'user', // only user role supports image_url
content: [
{
type: 'text',
text: 'What is the content of the image?'
},
{
type: 'image_url',
image_url: {
url: '{{?imageUrl}}'
}
}
]
}
]
inputParams: {
imageUrl: 'IMAGE_URL'
}
});
IMAGE_URL
can either be a public URL or a base64 encoded image, e.g., data:image/jpeg;base64,...
.
You can send a single chat completion request with multiple images by defining multiple content blocks with image_url
type in the template.
The model will process each image and use the information from all of them to respond.
Content Filtering
Configure content filtering to restrict content that is passed to and received from a generative AI model.
This feature allows filtering both input and output of a model based on content safety criteria.
The following example demonstrates how to use content filtering with the orchestration client. See the sections below for details on the available content filters and how to build them.
const filter: FilterConfig = ... // Use a build function to create a content filter
const orchestrationClient = new OrchestrationClient({
llm,
filtering: {
input: {
filters: [filter] // Multiple filters can be applied
},
output: {
filters: [filter]
}
}
});
try {
const response = await orchestrationClient.chatCompletion({
messages: [{ role: 'user', content: 'I hate you!' }]
});
console.log(response.getContent());
} catch (error: any) {
console.error(error.message);
console.error(error.cause?.response?.data);
}
Multiple filters can be applied at the same time for both input and output filtering.
The chatCompletion()
method can throw an error with HTTP status code 400
if content filters hit.
In case of a 200
HTTP response, the getContent()
method can throw an error if the output filters hit.
See the Error Handling page for more details.
Azure Content Filter
Use buildAzureContentSafetyFilter()
function to build an Azure content filter.
Each category of the filter can be assigned a specific severity level, which corresponds to an Azure threshold value.
Severity Level | Azure Threshold Value |
---|---|
ALLOW_SAFE | 0 |
ALLOW_SAFE_LOW | 2 |
ALLOW_SAFE_LOW_MEDIUM | 4 |
ALLOW_ALL | 6 |
import { buildAzureContentSafetyFilter } from '@sap-ai-sdk/orchestration';
const filter = buildAzureContentSafetyFilter({
Hate: 'ALLOW_SAFE_LOW',
Violence: 'ALLOW_SAFE_LOW_MEDIUM'
});
Llama Guard Filter
Use buildLlamaGuardFilter()
function to build a Llama Guard content filter.
Available categories can be found with autocompletion. Pass the categories as arguments to the function to enable them.
import { buildLlamaGuardFilter } from '@sap-ai-sdk/orchestration';
const filter = buildLlamaGuardFilter('hate', 'violent_crimes');
Data Masking
Use the orchestration client with the masking module to mask sensitive information in the prompt while preserving necessary context for the generative AI model.
The following example demonstrates how to use data masking with the orchestration client. See the sections below for details on the available masking providers and how to build them.
const maskingProvider: MaskingProviderConfig = ... // Use a build function to create a masking provider
const orchestrationClient = new OrchestrationClient({
llm,
masking: {
masking_providers: [maskingProvider] // Multiple masking providers can be applied
}
});
const response = await orchestrationClient.chatCompletion({
messages: [
{
role: 'user',
content:
'Please write an email to {{?user}} ({{?email}}) about the amazing capabilities of SAP AI Core!'
}
],
inputParams: { user: 'Jane Doe', email: 'jane.doe@example.com' }
});
SAP Data Privacy Integration
Orchestration service offers a masking provider "SAP Data Privacy Integration (DPI)" to anonymize or pseudonymize sensitive information.
Use buildDpiMaskingProvider()
function to build a DPI masking provider with standard or custom entities.
Provide regular expression to match a custom entity.
Configure replacement_strategy
to control how entities are masked.
Set method to constant
to replace an entity with the specified value followed by an incrementing number.
Set method to fabricated_data
to replace an entity with a randomly generated value appropriate to its type.
const maskingProvider = buildDpiMaskingProvider({
method: 'annonymization',
entities: [
'profile-person',
{
type: 'profile-email',
replacement_strategy: {
method: 'fabricated_data'
}
},
{
type: 'custom',
regex: '\\b[0-9]{4}-[0-9]{4}-[0-9]{3,5}\\b',
replacement_strategy: {
method: 'constant',
value: 'REDACTED_ID'
}
}
],
// mask_grounding_input: false, // optional
allowlist: ['SAP'] // optional
});
The allowlist
property specifies terms which will be kept unmasked.
Set mask_grounding_input
to true
to mask Grounding input as well.
Grounding
Grounding enables integrating external, contextually relevant, domain-specific, or real-time data into AI processes.
Use buildDocumentGroundingConfig()
function to build configuration for document grounding service.
const orchestrationClient = new OrchestrationClient({
llm,
grounding: buildDocumentGroundingConfig({
input_params: ['groundingRequest'],
output_param: 'groundingOutput',
// metadata_params: ['PARAM_NAME']
filters: [
{
id: 'FILTER_ID',
// data_repository_type: 'vector', // optional, default to 'vector'
data_repositories: ['REPOSITORY_ID']
}
]
})
});
const response = await orchestrationClient.chatCompletion({
messages: [
{
role: 'user',
content:
'UserQuestion: {{?groundingRequest}} Context: {{?groundingOutput}}'
}
],
inputParams: {
groundingRequest: 'Give me a short introduction of SAP AI Core.'
}
});
By default, the optional filter property data_repository_type
is set to vector
.
Set it to help.sap.com
to retrieve context from the SAP Help Portal.
Set metadata_params
property with an array of parameter names to include Metadata in the grounding result, which can be mentioned when writing the prompt.
If set the value to '*'
, all metadata will be included.
Set data_repositories
property with an array of repository IDs to search in specific data repositories.
Skip this property to search in all available data repositories.
Translation
The translation module translates content sent to and received from a generative AI model into a chosen target language.
The module supports SAP's Document Translation service and allows translating both input and output of a model. The target language is mandatory, while source language will be auto-detected if not provided.
For available translation options, consult the list of supported languages to identify compatible language pairs.
const translationConfig = buildTranslationConfig({
sourceLanguage: 'en-US',
targetLanguage: 'de-DE'
});
const orchestrationClient = new OrchestrationClient({
llm,
inputTranslation: translationConfig,
outputTranslation: translationConfig
});
const response = orchestrationClient.chatCompletion({
messages: [
{
role: 'user',
content: 'Write an abstract for a thriller playing at SAP headquarters.'
}
]
});
Use JSON Configuration from AI Launchpad
If you already have an orchestration workflow created in your SAP AI Launchpad instance, you can either download the configuration as a JSON file or copy the JSON string in code to configure the orchestration client.
const jsonConfig = await fs.promises.readFile(
'path/to/orchestration-config.json',
'utf-8'
);
// Alternatively, you can provide the JSON string in code directly.
// const jsonConfig = 'YOUR_JSON_CONFIG'
const orchestrationClient = new OrchestrationClient(jsonConfig);
Streaming
The OrchestrationClient
supports streaming responses for chat completion requests based on the Server-sent events standard.
Use the stream()
method to receive a stream of chunk responses from the model.
After consuming the stream, call the helper methods to get the finish reason and token usage information.
const response = await orchestrationClient.stream({
inputParams: { country: 'France' }
});
for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}
const finishReason = response.getFinishReason();
const tokenUsage = response.getTokenUsage();
console.log(`Finish reason: ${finishReason}\n`);
console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);
Streaming the Delta Content
Use toContentStream()
method to create a stream generating delta content string.
for await (const chunk of response.stream.toContentStream()) {
console.log(chunk); // will log the delta content
}
Streaming with Tool Calls
Use getToolCalls()
method to get the tool calls at the end of a stream.
While getDeltaToolCalls()
method can be called on individual chunks, partial tool calls are typically not useful.
Therefore, it is recommended to use the getToolCalls()
method on the full response instead.
for await (const _ of response.stream) {
console.log('Waiting for the stream to end ...');
}
const toolCalls = response.getToolCalls();
Streaming with Abort Controller
Streaming request can be aborted using the AbortController
API.
In case of an error, SAP Cloud SDK for AI will automatically close the stream.
It can also be manually aborted if an AbortController
object was provided when calling the stream()
method.
const orchestrationClient = new OrchestrationClient({ llm });
const controller = new AbortController();
const response = await orchestrationClient.stream(
{
messages: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
],
inputParams: { country: 'France' }
},
controller
);
// Abort the streaming request after one second
setTimeout(() => {
controller.abort();
}, 1000);
for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}
In this example, streaming request will be aborted after one second. Abort controller can be useful, e.g., when end-user wants to stop the stream or refreshes the page.
Streaming Options
The orchestration service offers multiple streaming options, which you can configure in addition to the LLM streaming options.
These include options like defining the maximum number of characters per chunk or modifying the output filter behavior.
There are two ways to add specific streaming options to your client, either at initialization of orchestration client, or when calling the stream()
method.
You can check the list of available stream options here.
Below is an example for setting the streaming options when calling the stream()
method.
const response = orchestrationClient.stream(
{
messages: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
],
inputParams: { country: 'France' }
},
controller,
{
llm: { include_usage: false },
global: { chunk_size: 10 },
outputFiltering: { overlap: 200 }
}
);
Set the include_usage
property to false
if usage metrics should not be returned by default.
Set the llm
property to null
to not send any LLM streaming options.
When initializing a client with JSON module config, it is not possible to provide streaming options.
Custom Resource Group
By default, you should have an orchestration deployment in the default
Resource Group.
If you have deployed the orchestration service in a different resource group, you can specify the resource group name in the resourceGroup
property when creating the client.
Refer to Create a Deployment for Orchestration for more details on how to create a deployment for orchestration.
const orchestrationClient = new OrchestrationClient(orchestrationConfig, {
resourceGroup: 'my-custom-resource-group'
});
Custom Request Configuration
Set custom request configuration in the requestConfig
parameter when calling the chatCompletion()
method.
const response = await orchestrationClient.chatCompletion(
{
...
},
{
headers: {
'x-custom-header': 'custom-value'
// Add more headers here
},
params: {
// Add more parameters here
}
// Add more request configuration here
}
);
Custom Destination
When initializing the client, it is possible to provide a custom destination.
For example, when targeting a destination with the name my-destination
, the following code can be used:
const orchestrationClient = new OrchestrationClient(
orchestrationConfig,
deploymentConfig,
{
destinationName: 'my-destination'
}
);
By default, the fetched destination is cached.
To disable caching, set the useCache
parameter to false
together with the destinationName
parameter.
For more information about configuring a destination, refer to the Using a Destination section.