Chat Completion

The @sap-ai-sdk/orchestration package provides a client for the orchestration service of SAP AI Core. The orchestration service harmonizes the API of various generative AI models, enabling seamless integration and interaction with different models through a unified interface. Additionally, it provides features like templating, content filtering, grounding and more to enhance the interaction with generative AI models.

Find more details about orchestration workflow here.

Installation

$ npm install @sap-ai-sdk/orchestration

Quick Start

Initialize a client with proper orchestration configuration for each module. Below is an example for configuring the mandatory promptTemplating module. In addition, you can find more sample code here.

import { OrchestrationClient } from '@sap-ai-sdk/orchestration';

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  }
});

Here, the model name is specified along with a user message as the prompt.

To send a chat completion request, use the chatCompletion() method. Use the following convenience methods for handling chat completion response:

getContent() parses the response and returns the model's output as a string.
getFinishReason() retrieves the finish_reason explaining why chat completion request stopped.
getTokenUsage() provides token usage details, including total_tokens, prompt_tokens, and completion_tokens.
getAllMessages() parses the response and returns a list of all messages.
getAssistantMessage() parses the response and returns the assistant message.
getToolCalls() parses the response and returns a list of tool calls generated by the model.
getRefusal() parses the response and returns the refusal message from the model.

const response = await orchestrationClient.chatCompletion({
  messages: [
    { role: 'user', content: 'Hello World! Why is this phrase so famous?' }
  ]
});

console.log(response.getContent());
console.log(response.getFinishReason());
console.log(JSON.stringify(response.getTokenUsage()));

The following sections provide detailed configuration for each module.

Prompt Templating

LLM Configuration

Choose the LLM by setting the name property in the promptTemplating.model configuration. Optionally, define version (default: latest) and params for custom settings.

const promptTemplating = {
  model: {
    name: 'gpt-4o',
    version: '2024-08-06', // optional
    params: {
      // optional
      max_tokens: 50,
      temperature: 0.1
    }
  }
};

Harmonized API

The Harmonized API lets you use different foundation models without the need to change the client code. You can switch from one LLM to another easily like the example below.

// Original config for using GPT 4o
/*
const promptTemplating = {
  model: {
    name: 'gpt-4o'
  }
};
*/

// Switch to Claude 3.5 Sonnet
const promptTemplating = {
  model: {
    name: 'anthropic--claude-3.5-sonnet'
  }
};

Available LLMs on SAP Generative AI Hub

Thanks to the harmonized API, all available LLMs on the SAP Generative AI Hub can be accessed through orchestration, including:

OpenAI GPT 4o
OpenAI o1
OpenAI o3 mini
AWS Anthropic Claude
AWS Amazon Nova
GCP VertexAI Gemini
Mistral AI

Check the SAP Notes for all available LLMs on SAP Generative AI Hub.

Template Configuration

Use the orchestration client with the promptTemplating.prompt.template configuration to define a static prompt. This prompt can include placeholders, which are replaced with values from placeholderValues during a chatCompletion() method call. This setup is useful when the base structure of your prompt remains the same across requests.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    },
    prompt: {
      template: [
        { role: 'user', content: 'What is the capital of {{?country}}?' }
      ]
    }
  }
});

const response = await orchestrationClient.chatCompletion({
  placeholderValues: { country: 'France' }
});

info

The promptTemplating.prompt.template messages defined in the client configuration are static - they are always included in every chatCompletion() request. For more dynamic prompts, i.e., you want to vary the full message list per request without re-initializing the client, use the messages property instead.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  }
});

const response = await orchestrationClient.chatCompletion({
  messages: [{ role: 'user', content: 'What is the capital of {{?country}}?' }],
  placeholderValues: { country: 'France' }
});

Prompt Registry

You can define template content within the templating configuration, or refer to a template defined in the Prompt Registry.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    },
    prompt: {
      template_ref: {
        name: 'my-get-capital-template',
        scenario: 'my-scenario',
        version: '0.0.1'
      }
    }
  }
});

const response = await orchestrationClient.chatCompletion({
  placeholderValues: { country: 'France' }
});

A prompt template can be referenced either by ID, or by a combination of name, scenario and version. For details on storing a template in the Prompt Registry, refer to this guide.

Local Prompt Template

You can define the template in a YAML file and pass its content as a string to the promptTemplating.prompt property. This is useful for testing a prompt template locally before storing it in the Prompt Registry.

import { readFileSync } from 'fs';
import { OrchestrationClient } from '@sap-ai-sdk/orchestration';

// Read the YAML file containing the prompt template
const yamlTemplate = readFileSync('./path/to/prompt-template.yaml', 'utf-8');

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    },
    prompt: yamlTemplate
  }
});

const response = await orchestrationClient.chatCompletion({
  placeholderValues: { country: 'France' }
});

The YAML string is parsed and validated against the expected schema. If the YAML is invalid or does not conform to the schema, an error will be thrown.

This section shows the format in which a prompt template can be defined.

Function Calling

Define and pass tool definitions to enable the model to call specific functions. Here's an example of temperature conversion using tool calls:

First, define the tool with name, description and parameters properties:

const convertTemperatureTool: ChatCompletionTool = {
  type: 'function',
  function: {
    name: 'convert_temperature_to_fahrenheit',
    description: 'Converts temperature from Celsius to Fahrenheit',
    parameters: {
      type: 'object',
      properties: {
        temperature: {
          type: 'number',
          description: 'The temperature value in Celsius to convert.'
        }
      },
      required: ['temperature']
    }
  }
};

Set strict to true to ensure function calls adhere to the function schema. For more information refer to Orchestration Tool Calling.

Initialize the client and send the initial request with the tool definition:

const client = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    },
    prompt: {
      tools: [convertTemperatureTool]
    }
  }
});

const response = await client.chatCompletion({
  messages: [
    { role: 'user', content: 'Convert 20 degrees Celsius to Fahrenheit.' }
  ]
});

When the model decides to use a tool, it returns the function name and input arguments in the response. Use the model response to execute the function.

const initialResponse = response.getAssistantMessage();
let toolMessage: ToolChatMessage;

if (initialResponse & initialResponse.tool_calls) {
  const toolCall = initialResponse.tool_calls[0];
  const name = toolCall.function.name;
  const args = JSON.parse(toolCall.function.arguments);

  // Execute the function with the provided arguments
  const toolResult = callFunction(name, args);

  toolMessage: ToolChatMessage = {
    role: 'tool',
    content: toolResult,
    tool_call_id: toolCall.id
  };
}

The callFunction() function routes the calls to the actual function implementations.

function callFunction(name: string, args: any): string {
  switch (name) {
    case 'convert_temperature_to_fahrenheit':
      return convertTemperatureToFahrenheit(args.temperature);
    default:
      throw new Error(`Function: ${name} not found!`);
  }
}

function convertTemperatureToFahrenheit(temperature: number): string {
  return `The temperature in Fahrenheit is ${(temperature * 9) / 5 + 32}°F.`;
}

Send the function result back to the model to get its final response:

const finalResponse = await client.chatCompletion({
  messages: [toolMessage],
  messagesHistory: response.getAllMessages()
});

console.log(finalResponse.getContent());

Response Format

For general response formatting, use the response_format parameter. It is useful when the model is not calling a tool and should still return a structured response.

The example below returns a JSON Schema with strict set to true to let the response adhere to the schema definition.

const templating: TemplatingModuleConfig = {
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'capital_response',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          country_name: {
            type: 'string',
            description: 'The name of the country provided by the user.'
          },
          capital: {
            type: 'string',
            description: 'The capital city of the country.'
          }
        },
        required: ['country_name', 'capital']
      }
    }
  }
};

You can also define JSON schema using Zod schema as shown below:

import * as z from 'zod/v4';
import { toJsonSchema } from '@langchain/core/utils/json_schema';
import { ResponseFormatJsonSchema } from '@sap-ai-sdk/orchestration';

const countryCapitalSchema = z
  .object({
    country_name: z.string(),
    capital: z.string()
  })
  .strict();

const response_format: ResponseFormatJsonSchema = {
  type: 'json_schema',
  json_schema: {
    name: 'capital_response',
    strict: true,
    schema: toJsonSchema(countryCapitalSchema)
  }
};

Message History

The orchestration service supports chat completion with message history. This allows the model to remember previous messages in the conversation, enabling a more coherent and context-aware interaction.

Provide the message history in the messagesHistory parameter when calling the chatCompletion() method.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  }
});

const response = await orchestrationClient.chatCompletion({
  messages: [{ role: 'user', content: 'What is my name?' }],
  messagesHistory: [
    {
      role: 'system',
      content:
        'You are a helpful assistant who remembers all details the user shares with you.'
    },
    {
      role: 'user',
      content: "Hi! I'm Bob"
    },
    {
      role: 'assistant',
      content:
        "Hi Bob, nice to meet you! I'm an AI assistant. I'll remember that your name is Bob as we continue our conversation."
    }
  ]
});

Image Recognition

Many models in the orchestration service have image recognition capabilities, meaning the models can take images as input and answer questions about them.

note

The image_url content type can only be used in messages with role: 'user'. Attempting to use image_url in non-user messages will result in an error.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  }
});

const response = await orchestrationClient.chatCompletion({
  messages: [
    {
      role: 'user', // only user role supports image_url
      content: [
        {
          type: 'text',
          text: 'What is the content of the image?'
        },
        {
          type: 'image_url',
          image_url: {
            url: '{{?imageUrl}}'
          }
        }
      ]
    }
  ],
  placeholderValues: {
    imageUrl: 'IMAGE_URL'
  }
});

IMAGE_URL can either be a public URL or a base64 encoded image, e.g., data:image/jpeg;base64,.... You can send a single chat completion request with multiple images by defining multiple content blocks with image_url type in the template. The model will process each image and use the information from all of them to respond.

Content Filtering

Configure content filtering to restrict content that is passed to and received from a generative AI model.

This feature allows filtering both input and output of a model based on content safety criteria.

The following example demonstrates how to use content filtering with the orchestration client. See the sections below for details on the available content filters and how to build them.

const inputFilter: InputFilterConfig = ... // Use a build function to create an input content filter
const outputFilter: OutputFilterConfig = ... // Use a build function to create an output content filter

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  },
  filtering: {
    input: {
      filters: [inputFilter] // Multiple filters can be applied
    },
    output: {
      filters: [outputFilter]
    }
  }
});

try {
  const response = await orchestrationClient.chatCompletion({
    messages: [{ role: 'user', content: 'I hate you!' }]
  });
  console.log(response.getContent());
} catch (error: any) {
  console.error(error.message);
  console.error(error.cause?.response?.data);
}

Multiple filters can be applied at the same time for both input and output filtering.

tip

The chatCompletion() method can throw an error with HTTP status code 400 if content filters hit. In case of a 200 HTTP response, the getContent() method can throw an error if the output filters hit. See the Error Handling page for more details.

Azure Content Filter

Use buildAzureContentSafetyFilter() function to build an Azure content filter. Input and output filters are differentiated. Set the type parameter to input to build input filter configuration and output to build output filter configuration. Each category of the filter can be assigned a specific severity level, which corresponds to an Azure threshold value.

Severity Level	Azure Threshold Value
`ALLOW_SAFE`	0
`ALLOW_SAFE_LOW`	2
`ALLOW_SAFE_LOW_MEDIUM`	4
`ALLOW_ALL`	6

Prompt Attack Detection

A prompt attack is a malicious input that is designed to bypass a model's safety mechanisms or override previous instructions. Prompt attacks can lead to the generation of harmful content or the execution of malicious actions.

The Azure Content Safety service also supports prompt attack detection for input text via a prompt shield configuration. Set the prompt_shield property to true to enable the detection. This option is only available for input filters. For more information, refer to Prompt Shields in Azure AI Content Safety.

import { buildAzureContentSafetyFilter } from '@sap-ai-sdk/orchestration';

const inputFilter = buildAzureContentSafetyFilter('input', {
  hate: 'ALLOW_SAFE_LOW',
  violence: 'ALLOW_SAFE_LOW_MEDIUM',
  prompt_shield: true
});

const outputFilter = buildAzureContentSafetyFilter('output', {
  hate: 'ALLOW_SAFE',
  violence: 'ALLOW_SAFE_LOW_MEDIUM'
});

Llama Guard Filter

Use buildLlamaGuard38BFilter() function to build a Llama Guard 3 8B content filter. Input and output filters are differentiated. Set the type parameter to input to build input filter configuration and output to build output filter configuration.

Available categories can be found with autocompletion. Pass the categories as an array to the function to enable them.

import { buildLlamaGuard38BFilter } from '@sap-ai-sdk/orchestration';

const inputFilter = buildLlamaGuard38BFilter('input', ['self_harm']);
const outputFilter = buildLlamaGuard38BFilter('output', [
  'self_harm',
  'violence'
]);

Data Masking

Use the orchestration client with the masking module to mask sensitive information in the prompt while preserving necessary context for the generative AI model.

The following example demonstrates how to use data masking with the orchestration client. See the sections below for details on the available masking providers and how to build them.

const maskingProvider: MaskingProviderConfig = ... // Use a build function to create a masking provider

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  },
  masking: {
    masking_providers: [maskingProvider] // Multiple masking providers can be applied
  }
});

const response = await orchestrationClient.chatCompletion({
  messages: [
    {
      role: 'user',
      content:
        'Please write an email to {{?user}} ({{?email}}) about the amazing capabilities of SAP AI Core!'
    }
  ],
  placeholderValues: { user: 'Jane Doe', email: 'jane.doe@example.com' }
});

SAP Data Privacy Integration

Orchestration service offers a masking provider "SAP Data Privacy Integration (DPI)" to anonymize or pseudonymize sensitive information. Use buildDpiMaskingProvider() function to build a DPI masking provider with standard or custom entities. Provide regular expression to match a custom entity. Configure replacement_strategy to control how entities are masked. Set method to constant to replace an entity with the specified value followed by an incrementing number. Set method to fabricated_data to replace an entity with a randomly generated value appropriate to its type.

const maskingProvider = buildDpiMaskingProvider({
  method: 'annonymization',
  entities: [
    'profile-person',
    {
      type: 'profile-email',
      replacement_strategy: {
        method: 'fabricated_data'
      }
    },
    {
      type: 'custom',
      regex: '\\b[0-9]{4}-[0-9]{4}-[0-9]{3,5}\\b',
      replacement_strategy: {
        method: 'constant',
        value: 'REDACTED_ID'
      }
    }
  ],
  // mask_grounding_input: false, // optional
  allowlist: ['SAP'] // optional
});

The allowlist property specifies terms which will be kept unmasked. Set mask_grounding_input to true to mask Grounding input as well.

Grounding

Grounding enables integrating external, contextually relevant, domain-specific, or real-time data into AI processes. Use buildDocumentGroundingConfig() function to build configuration for document grounding service.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  },
  grounding: buildDocumentGroundingConfig({
    placeholders: {
      input: ['groundingRequest'],
      output: 'groundingOutput'
    },
    // metadata_params: ['PARAM_NAME']
    filters: [
      {
        id: 'FILTER_ID',
        // data_repository_type: 'vector', // optional, default to 'vector'
        data_repositories: ['REPOSITORY_ID']
      }
    ]
  })
});

const response = await orchestrationClient.chatCompletion({
  messages: [
    {
      role: 'user',
      content:
        'UserQuestion: {{?groundingRequest}} Context: {{?groundingOutput}}'
    }
  ],
  placeholderValues: {
    groundingRequest: 'Give me a short introduction of SAP AI Core.'
  }
});

By default, the optional filter property data_repository_type is set to vector. Set it to help.sap.com to retrieve context from the SAP Help Portal.

Set metadata_params property with an array of parameter names to include Metadata in the grounding result, which can be mentioned when writing the prompt. If set the value to '*', all metadata will be included.

Set data_repositories property with an array of repository IDs to search in specific data repositories. Skip this property to search in all available data repositories.

Translation

The translation module translates content sent to and received from a generative AI model into a chosen target language.

The module supports SAP's Document Translation service and allows translating both input and output of a model. The target language is mandatory, while source language will be auto-detected if not provided.

For available translation options, consult the list of supported languages to identify compatible language pairs. Input and output translation configurations are differentiated. Set the type parameter to input to build the input configuration and output to build the output configuration.

const inputTranslation = buildTranslationConfig('input', {
  sourceLanguage: 'en-US',
  targetLanguage: 'de-DE'
});

const outputTranslation = buildTranslationConfig('output', {
  sourceLanguage: 'en-US',
  targetLanguage: 'de-DE'
});

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  },
  translation: {
    input: inputTranslation,
    output: outputTranslation
  }
});

const response = await orchestrationClient.chatCompletion({
  messages: [
    {
      role: 'user',
      content: 'Write an abstract for a thriller playing at SAP headquarters.'
    }
  ]
});

Use JSON Configuration from AI Launchpad

If you already have an orchestration workflow created in your SAP AI Launchpad instance, you can either download the configuration as a JSON file or copy the JSON string in code to configure the orchestration client.

const jsonConfig = await fs.promises.readFile(
  'path/to/orchestration-config.json',
  'utf-8'
);
// Alternatively, you can provide the JSON string in code directly.
// const jsonConfig = 'YOUR_JSON_CONFIG'

const orchestrationClient = new OrchestrationClient(jsonConfig);

Streaming

The OrchestrationClient supports streaming responses for chat completion requests based on the Server-sent events standard.

Use the stream() method to receive a stream of chunk responses from the model. After consuming the stream, call the helper methods to get the finish reason and token usage information.

const response = await orchestrationClient.stream({
  placeholderValues: { country: 'France' }
});

for await (const chunk of response.stream) {
  console.log(JSON.stringify(chunk));
}

const finishReason = response.getFinishReason();
const tokenUsage = response.getTokenUsage();

console.log(`Finish reason: ${finishReason}\n`);
console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);

Streaming the Delta Content

Use toContentStream() method to create a stream generating delta content string.

for await (const chunk of response.stream.toContentStream()) {
  console.log(chunk); // will log the delta content
}

Streaming with Tool Calls

Use getToolCalls() method to get the tool calls at the end of a stream. While getDeltaToolCalls() method can be called on individual chunks, partial tool calls are typically not useful. Therefore, it is recommended to use the getToolCalls() method on the full response instead.

for await (const _ of response.stream) {
  console.log('Waiting for the stream to end ...');
}

const toolCalls = response.getToolCalls();

Streaming with Abort Controller

Streaming request can be aborted using the AbortController API. In case of an error, SAP Cloud SDK for AI will automatically close the stream. It can also be manually aborted if an AbortSignal object was provided when calling the stream() method.

const orchestrationClient = new OrchestrationClient({
  promptTemplating: {
    model: {
      name: 'gpt-4o'
    }
  }
});

const controller = new AbortController();
const response = await orchestrationClient.stream(
  {
    messages: [
      { role: 'user', content: 'Give a long history of {{?country}}?' }
    ],
    placeholderValues: { country: 'France' }
  },
  controller.signal
);

// Abort the streaming request after one second
setTimeout(() => {
  controller.abort();
}, 1000);

for await (const chunk of response.stream) {
  console.log(JSON.stringify(chunk));
}

In this example, streaming request will be aborted after one second. Abort controller can be useful, e.g., when end-user wants to stop the stream or refreshes the page.

Streaming Options

The orchestration service offers multiple streaming options, which you can configure in addition to the LLM streaming options. These include options like defining the maximum number of characters per chunk or modifying the output filter behavior. There are two ways to add specific streaming options to your client, either at initialization of orchestration client, or when calling the stream() method.

You can check the list of available stream options here.

Below is an example for setting the streaming options when calling the stream() method.

const response = orchestrationClient.stream(
  {
    messages: [
      { role: 'user', content: 'Give a long history of {{?country}}?' }
    ],
    placeholderValues: { country: 'France' }
  },
  controller,
  {
    llm: { include_usage: false },
    global: { chunk_size: 10 },
    outputFiltering: { overlap: 200 }
  }
);

Set the include_usage property to false if usage metrics should not be returned by default. Set the llm property to null to not send any LLM streaming options.

note

When initializing a client with JSON module config, it is not possible to provide streaming options.

Custom Deployment Configuration

By default, there should be an orchestration deployment in the default Resource Group.

If the orchestration service has been deployed in a different resource group, it is possible to specify the resource group name in the resourceGroup property when creating the client.

const orchestrationClient = new OrchestrationClient(orchestrationConfig, {
  resourceGroup: 'my-custom-resource-group'
});

Additionally, it is possible to manually specify a deployment ID using the deploymentId property instead of letting the SDK resolve it. Make sure to set a correct resource group, in which the deployment was created.

const orchestrationClient = new OrchestrationClient(orchestrationConfig, {
  deploymentId: 'my-custom-deployment-id'
});

Refer to Create a Deployment for Orchestration for more details on how to create and manage deployments for orchestration.

Custom Request Configuration

Set custom request configuration in the requestConfig parameter when calling the chatCompletion() method.

const response = await orchestrationClient.chatCompletion(
  {
    ...
  },
  {
    headers: {
      'x-custom-header': 'custom-value'
      // Add more headers here
    },
    params: {
      // Add more parameters here
    }
    // Add more request configuration here
  }
);

Custom Destination

When initializing the client, it is possible to provide a custom destination. For example, when targeting a destination with the name my-destination, the following code can be used:

const orchestrationClient = new OrchestrationClient(
  orchestrationConfig,
  deploymentConfig,
  {
    destinationName: 'my-destination'
  }
);

By default, the fetched destination is cached. To disable caching, set the useCache parameter to false together with the destinationName parameter.

For more information about configuring a destination, refer to the Using a Destination section.

Installation​

Quick Start​

Prompt Templating​

LLM Configuration​

Harmonized API​

Template Configuration​

Prompt Registry​

Local Prompt Template​

Function Calling​

Response Format​

Message History​

Image Recognition​

Content Filtering​

Azure Content Filter​

Prompt Attack Detection​

Llama Guard Filter​

Data Masking​

SAP Data Privacy Integration​

Grounding​

Translation​

Use JSON Configuration from AI Launchpad​

Streaming​

Streaming the Delta Content​

Streaming with Tool Calls​

Streaming with Abort Controller​

Streaming Options​

Custom Deployment Configuration​

Custom Request Configuration​

Custom Destination​