Skip to main content

Introduction

In this article, you will learn about the different resilience patterns and how to add them to your application using the SAP Cloud SDK. Version 3.0 of the SAP Cloud SDK introduced the concept of generic middlewares, which are not resilience specific. The SAP Cloud SDK provides default implementations for the standard resilience patterns like timeout, retry, and circuit breaker. These implementations should work in most cases, but you can easily create your own.

The samples repository provides examples that illustrate the usage of middlewares for resilience. Note that middlewares, and therefore resilience, applies to requests against the target system, and by default, nothing is added. The SAP Cloud SDK automatically applies resilience to the internal SAP BTP service calls done under the hood. This is done automatically, and you do not have to worry about resilience with respect to internal service calls.

Here is a video that explains middlewares and resilience hands-on:

Resilience Middleware

The SAP Cloud SDK offers a resilience() function to build a resilience middleware. If you do not specify any options the method will return a timeout and circuit breaker:

import { resilience } from '@sap-cloud-sdk/resilience';

const [circuitBreaker, timeout] = resilience();

You can pass very basic options to adjust the creation of these middlewares:

type ResilienceOptions = {
timeout?: boolean | number; // default is true with 10 seconds timeout. A number indicates the timeout in milliseconds.
circuitBreaker?: boolean; // default is true.
retry?: boolean | number; // default is false. A number indicates the number of retries.
};

const [retry, circuitBreaker, timeout] = resilience({ retry: true });
const [retry5times, timeout5Seconds] = resilience({
timeout: 5000,
curcuitBreaker: false,
retry: 5
});

You can find details on each option in the dedicated sections below. You apply the resilience by adding it to your request as discussed in the middleware documentation:

import { executeHttpRequest } from '@sap-cloud-sdk/http-client';
import { myService } from './generated-client';

executeHttpRequest(myDestination, { middleware: resilience() });

const { myApi } = myService();
myApi.getAll().middleware(resilience()).execute(myDestination);
note

Using the resilience() function is the recommended way to build resilience-related middleware, it ensures the correct order of middlewares.

Timeout

The timeout middleware is based on the Promise.race() method and does not use an additional library. You can create a timeout middleware with the timeout() function. The default timeout is 10 seconds, but can be changed on demand.

import { timeout } from '@sap-cloud-sdk/resilience';

// Three seconds timeout
myApi.getAll().middleware(timeout(3000)).execute(myDestination);
note

The timeout should be the first entry in your list of middlewares.

Circuit Breaker

The implementation uses the opossum circuit breaker. You can create a circuit breaker middleware in the following way:

import { circuitBreaker } from '@sap-cloud-sdk/resilience';

myApi.getAll().middleware(circuitBreaker()).execute(myDestination);
note

The circuit breaker middleware should be after the timeout in the list of middlewares.

The circuit breaker uses meaningful default values suggested by opossum:

  • Rolling count timeout: 10 seconds
  • Volume threshold: 10
  • Error threshold percentage: 50%
  • Reset timeout: 30 seconds

These numbers mean the following:

  • The circuit breaker records all requests for a duration of 10 seconds.
  • If in this time window, 10 or more requests are recorded, the error threshold is calculated.
  • If 50% or more of the requests in this time window fail, the circuit breaker opens. From this point in time, all requests to the target system are blocked directly and not executed anymore.
  • After a reset time of 30 seconds, the circuit breaker will go to the half-open state. This means the next request will go through as a test.
    • If it is successful, the circuit breaker will close again.
    • If it fails, the circuit breaker will remain open.
note

Requests with HTTP status 4XX do not count as failed, because they do not indicate an unhealthy system.

A circuit breaker needs to record all requests across users for a certain time window. This state is stored in central circuit breaker instances. Each target system and tenant is represented by a separate circuit breaker instance.

Retries

The retry middleware should be used with caution, because it is often mitigating a problem that should be solved properly. Also, if something fails consistently, it does not help to press the same button multiple times. You should consider some rules for adding retries:

  • The error should be the exception, not the default.
  • The error should happen randomly so a second call has a high likelihood of returning something.
  • The source of the error is out of your domain to fix.

You can create a retry middleware using the retry() function, which adds 3 retries by default. You can change the number in the following way:

import { retry } from '@sap-cloud-sdk/resilience';

// Four retries
myApi.getAll().middleware(retry(4)).execute(myDestination);

The retry middleware uses the async-retry library and uses meaningful default values:

  • Number of retires: 3.
  • Exponential growth for waiting time using a base of 2.
  • Initial waiting time of 1 second.
  • Random factor on waiting time between 1 and 2.

This means the maximum waiting times between the three retries are 2, 4, and 8 seconds.

note

The retry middleware should be the last entry in the list of resilience middlewares. This ensures the retry is executed in all failure cases.

Remarks on Custom Implementations

One central idea to include resilience using a middleware concept is that you can implement your logic in case the provided one does not fit your needs. Note that the concept of resilience is not trivial, and you should develop and test your implementation thoroughly:

  • You should keep the order of middlewares in mind. A retry should be at the end and a timeout or cache implementation at the very beginning.
  • You should filter certain errors from a retry and circuit breaker. For example, a 401 or 403 response will not magically work if you do a retry.
  • If the middleware adds shared state between requests, e.g. a circuit breaker, you must consider isolation between customers in a multi-tenant situation.
  • Align the parameters of the middlewares if necessary. Assume the system needs protection from a circuit breaker, and you want to use a retry to execute the request after system recovery. Then the circuit breaker reset time and retry waiting times should be aligned accordingly.