The SAP Cloud SDK for Java provides abstractions for some frequently used resilience patterns like timeout, retry or circuit breaker. Applying such patterns helps making an application more resilient against failures it might encounter.
The following article describes which resilience features the SDK offers and how to apply them. If you are looking for a quick start with resilience also check out our dedicated tutorial on the topic!
The SDK allows to run any code in the context of one or more resilience patterns. There are two essential building blocks for achieving this:
ResilienceConfigurationthat determines which patterns should be applied.
ResilienceDecoratorwhich is capable of applying the configuration to an operation.
The fluent Resilience Configuration API provides builders that help with assembling different resilience patterns and their associated parameters. Which patterns are available and how to use them is explained in the dedicated section below.
The Resilience Decorator is capable of applying such a configuration to a given
Consider the following code:
This code executes
operation() in a resilient manner according to a
The decorator will apply all in
configuration configured patterns and all logic that is needed to combine these patterns.
Some resilience patterns are applied over multiple executions of the same operation. For example the circuit breaker will prevent further executions, if a significant portion of previous attempts failed.
To understand how the SDK applies this concept consider the following snippet:
Here executions one, two and three will all share the same "resilience state". This means that they will share the same instance of a circuit breaker or bulkhead. So the state is shared via the identifier of the associated configuration.
The decorator operates with two kinds of operations:
|Callable||May throw checked or unchecked Exceptions|
|Supplier||May only throw unchecked Exceptions|
Noticeable is the difference in signatures: Callable throws a checked exception while Supplier does not. So you can choose whatever fits your use case best.
The decorator allows for three different ways of applying a configuration:
|Execute||Immediately runs the operation|
|Decorate||Returns a new operation to be run later|
|Queue||Immediately runs the operation asynchronously|
In case your operation should run asynchronously we highly recommend you leverage the
queue functionality. The decorator will ensure the Thread Context with Tenant and Principal information is propagated correctly to new Threads.
Note that the Resilience Decorator will try to propagate the current Thread Context at the time the decorator is invoked. This is important when you are decorating a Callable or Supplier and running it later. The Thread Context must be available whenever
decorateSupplier is evaluated. So if the call to
ResilienceDecorator should take place asynchronously you should follow these steps to ensure the Thread Context is available.
An operation might fail for two reasons:
- The operation itself encounters a failure and throws an error or exception
- A resilience pattern causes the operation to fail (e.g. the circuit breaker prevents further invocations)
The SDK wraps all kind of checked and unchecked exceptions into a
ResilienceRuntimeException and throws them.
To deal with failures one can either catch the
ResilienceRuntimeException or provide a fallback function:
In the case of Callable this relieves you of the need to catch the exception at the outer level.
ResilienceConfiguration with default values is created by providing an identifier for it:
The identifier can be either a string or a class. In case of the latter the (full) classname will be used as identifier. The identifier will be used to apply resilience patterns across multiple invocations to operations.
Check the JavaDoc for which patterns and parameters will be applied by default. You can also create a configuration with all patterns disabled:
Individual resilience patterns are configured via dedicated builder classes like
TimeLimiterConfiguration and are added to the configuration via dedicated setters, e.g.
For details see the list of Resilience Capabilities below.
The SDK is capable of applying the different resilience patterns in a tenant and principal aware manner. Consider for example the Bulkhead pattern which limits the amount of parallel executions. If the operation is tenant specific then you would probably want to avoid one tenant blocking all others.
For this reason the SDK by default isolates resilience patterns based on tenant and principal, if they are available. This strategy can be configured, e.g. for running without any isolation use:
Other than no isolation there are essentially two modes for tenant and/or principal isolation:
|Required||Always isolates on tenant and/or principal level, will throw an exception if no tenant/principal is available|
|Optional||Only isolates if tenant and/or principal information is available|
Details can be found on the API reference of ResilienceIsolationMode.
The following resilience patterns are available and can be configured in a Resilience Configuration:
|Timeout||TimeLimiterConfiguration||Limit how long an operation may run before it should be interrupted|
|Retry||RetryConfiguration||Retry a failed operation a limited amount of times before failing|
|Circuit Breaker||CircuitBreakerConfiguration||Reject attempts if too many failures occurred in the past|
(also known as Shed Load or Load Shedding)
|BulkheadConfiguration||Limit how many instances of this operation may run in parallel|
You can find good explanations on how the individual patterns behave on the documentation of resilience4j which the SDK uses under the hood to perform resilient operations.
Be aware that the patterns interact with each other. They are applied in the following order:
- Circuit Breaker
This means that every individual attempt triggered by retries will be limited by the timeout. Every failed retry will be accounted for in the circuit breaker. Only if all retries failed the fallback function will be considered.