With the current trend to build distributed-systems, it is increasingly important to build fault-tolerant services. Fault tolerance is about using different strategies to handle failures in a distributed system. Moreover, the services should be resilient and be able to operate further if a failure occurs in an external service and not cascade the failure and bring the system down. There is a set of common patterns to achieve fault tolerance within your system. These patterns are all available within the MicroProfile Fault Tolerance specification.
Learn more about the MicroProfile Fault Tolerance specification, its annotations, and how to use it in this blog post. This post covers all available interceptor bindings as defined in the specification:
- Fallback
- Timeout
- Retry
- CircuitBreaker
- Asynchronous
- Bulkhead
Specification profile: MicroProfile Fault Tolerance
- Current version: 2.1
- GitHub repository
- Latest specification document
- Basic use case: Provide a set of strategies to build resilient and fault-tolerant services
Provide a fallback method
First, let's cover the @Fallback
interceptor binding of the MicroProfile Fault Tolerance specification. With this annotation, you can provide a fallback behavior of your method in case of an exception. Assume your service fetches data from other microservices and the call might fail due to network issues or downtime of the target. In case your service could recover from the failure and you can provide meaningful fallback behavior for your domain, the @Fallback
annotation saves you.
A good example might be the checkout process of your webshop where you rely on a third-party service for handling e.g. credit card payments. If this service fails, you might fall back to a default payment provider and recover gracefully from the failure.
For a simple example, I'll demonstrate it with a JAX-RS client request to a placeholder REST API and provide a fallback method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | @Fallback(fallbackMethod = "getDefaultPost") public JsonObject getPostById(Long id) { return this.webTarget .path(String.valueOf(id)) .request() .accept(MediaType.APPLICATION_JSON) .get(JsonObject.class); } public JsonObject getDefaultPost(Long id) { return Json.createObjectBuilder() .add("comment", "Lorem ipsum") .add("postId", id) .build(); } |
With the @Fallback
annotation you can specify the method name of the fallback method which must share the same response type and method arguments as the annotated method.
In addition, you can also specify a dedicated class to handle the fallback. This class is required to implement the FallbackHandler<T>
interface where T
is the response type of the targeted method:
1 2 3 4 5 6 7 8 | @Fallback(PlaceHolderApiFallback.class) public JsonObject getPostById(Long id) { return this.webTarget .path(String.valueOf(id)) .request() .accept(MediaType.APPLICATION_JSON) .get(JsonObject.class); } |
1 2 3 4 5 6 7 8 9 10 | public class PlaceHolderApiFallback implements FallbackHandler<JsonObject> { @Override public JsonObject handle(ExecutionContext context) { return Json.createObjectBuilder() .add("comment", "Lorem ipsum") .add("postId", Long.valueOf(context.getParameters()[0].toString())) .build(); } } |
As you'll see it in the upcoming chapters, the @Fallback
annotation can be used in combination with other MicroProfile Fault Tolerance interceptor bindings.
Furthermore, you can instruct the fallback annotation to apply only when a specific exception is thrown. You can also include different exceptions from not triggering the fallback behaviour:
1 2 3 | @Fallback(value = PlaceHolderApiFallback.class, applyOn = {MyCustomException.class, MySevereException.class}, skipOn = NumberFormatException.class) |
By default the fallback occurs on every exception extending Throwable
and does not skip on any exception.
Add timeouts to limit the duration of a method execution
For some operations in your system, you might have a strict response time target. If you make use of the JAX-RS client or the client of MicroProfile Rest Client you can specify read and connect timeouts to avoid long-running requests. But what about use cases where you can't declare timeouts easily? The MicroProfile Fault Tolerance specification defines the @Timeout
annotation for such problems.
With this interceptor binding, you can specify the maximum duration of a method. If the computation time within the method exceeds the limit, a TimeoutException
is thrown.
1 2 3 4 5 6 | @Timeout(4000) @Fallback(fallbackMethod = "getFallbackData") public String getDataFromLongRunningTask() throws InterruptedException { Thread.sleep(4500); return "duke"; } |
The default unit is milliseconds, but you can configure a different ChronoUnit
:
1 2 3 4 5 6 | @Timeout(value = 4, unit = ChronoUnit.SECONDS) @Fallback(fallbackMethod = "getFallbackData") public String getDataFromLongRunningTask() throws InterruptedException { Thread.sleep(4500); return "duke"; } |
Define retry policies for method calls
A valid fallback behavior for an external system call might be just to retry it. With the @Retry
annotation, we can achieve such behavior. Directly retrying to execute the request might not always be the best solution. Similarily you want to add delay for the next retry and maybe add some randomness. We can configure such a requirement with the @Retry
annotation:
1 2 3 4 5 6 7 8 9 10 11 12 | @Retry(maxDuration = 5000, maxRetries = 3, delay = 500, jitter = 200) @Fallback(fallbackMethod = "getFallbackData") public String accessFlakyService() { System.out.println("Trying to access flaky service at " + LocalTime.now()); if (ThreadLocalRandom.current().nextLong(1000) < 50) { return "flaky duke"; } else { throw new RuntimeException("Flaky service not accessible"); } } |
In this example, we would try to execute the method three times with a delay of 500 milliseconds and 200 milliseconds of randomness (called jitter
). The effective delay is the following: [delay – jitter, delay + jitter] (in our example 300 to 700 milliseconds).
Furthermore, endless retrying might also be counter-productive. That's why we can specify the maxDuration
which is quite similar to the @Timeout
annotation above. If the whole retrying takes more than 5 seconds, it will fail with a TimeoutException
.
Similar to the @Fallback
annotation, we can specify the type of exceptions to trigger and not trigger a retry:
1 2 3 | @Retry(maxRetries = 3, retryOn = {RuntimeException.class}, abortOn = {NumberFormatException.class}) |
Add a Circuit Breaker around a method invocation to fail fast
Once an external system you call is down or returning 503 as it is currently unavailable to process further requests, you might not want to access it for a given timeframe again. This might help the other system to recover and your methods can fail fast as you already know the expected response from requests in the past. For this scenario, the Circuit Breaker pattern comes into place.
The Circuit Breaker offers a way to fail fast by directly failing the method execution to prevent further overloading of the target system and indefinite wait or timeouts. With MicroProfile Fault Tolerance we have an annotation to achieve this with ease: @CircuitBreaker
There are three different states a Circuit Breaker can have: closed, opened, half-open.
In the closed state, the operation is executed as expected. If a failure occurs while e.g. calling an external service, the Circuit Breaker records such an event. If a particular threshold of failures is met, it will switch to the open state.
Once the Circuit Breaker enters the open state, further calls will fail immediately. After a given delay the circuit enters the half-open state. Within the half-open state, trial executions will happen. Once such a trial execution fails, the circuit transitions to the open state again. When a predefined number of these trial executions succeed, the circuit enters the original closed state.
Let's have a look at the following example:
1 2 3 4 5 6 7 8 9 | @CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 5, failureRatio = 0.5, delay = 500) @Fallback(fallbackMethod = "getFallbackData") public String getRandomData() { if (ThreadLocalRandom.current().nextLong(1000) < 300) { return "random duke"; } else { throw new RuntimeException("Random data not available"); } } |
In the example above I define a Circuit Breaker which enters the open state once 50% (failureRatio=0.5
) of five consecutive executions (requestVolumeThreshold=5
) fail. After a delay of 500 milliseconds in the open state, the circuit transitions to half-open. Once ten trial executions (successThreshold=10
) in the half-open state succeed, the circuit will be back in the closed state.
This annotation also allows defining the exception types to skip and to fail on:
1 2 3 | @CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 5, delay = 500, skipOn = {NumberFormatException.class}, failOn = {RuntimeException.class}) |
Execute a method asynchronously with MicroProfile Fault Tolerance
Some use cases of your system might not require synchronous and in-order execution of different tasks. For instance, you can fetch data for a customer (purchased orders, contact information, invoices) from different services in parallel. The MicroProfile Fault Tolerance specification offers a convenient way for achieving such asynchronous method executions: @Asynchronous
:
1 2 3 4 5 | @Asynchronous public Future<String> getConcurrentServiceData(String name) { System.out.println(name + " is accessing the concurrent service"); return CompletableFuture.completedFuture("concurrent duke"); } |
With this annotation, the execution will be on a separate thread and the method has to return either a Future
or a CompletionStage
Apply Bulkheads to limit the number of concurrent calls
The Bulkhead pattern is a way of isolating failures in your system while the rest can still function. It's named after the sectioned parts (bulkheads) of a ship. If one bulkhead of a ship is damaged and filled with water, the other bulkheads aren't affected, which prevents the ship from sinking.
Imagine a scenario where all your threads are occupied for a request to a (slow-responding) external system and your application can't process other tasks. To prevent such a scenario, we can apply the @Bulkhead
annotation and limit concurrent calls:
1 2 3 4 5 6 7 | @Bulkhead(5) @Asynchronous public Future<String> getConcurrentServiceData(String name) throws InterruptedException { Thread.sleep(1000); System.out.println(name + " is accessing the concurrent service"); return CompletableFuture.completedFuture("concurrent duke"); } |
In this example, only five concurrent calls can enter this method and further have to wait. If this annotation is used together with @Asynchronous
, as in the example above, it means thread isolation. In addition and only for asynchronous methods we can specify the length of the waiting queue with the attribute waitingTaksQueue
. For non-async methods, the specification defines to utilize semaphores for isolation.
MicroProfile Fault Tolerance integration with MicroProfile Config
Above all, the MicroProfile Fault Tolerance specification provides tight integration with the config spec. You can configure every attribute of the different interceptor bindings with an external config source like the microprofile-config.properties
file.
The pattern for external configuration is the following: <classname>/<methodname>/<annotation>/<parameter>
:
1 2 3 | de.rieckpil.blog.RandomDataProvider/accessFlakyService/Retry/maxRetries=10 de.rieckpil.blog.RandomDataProvider/accessFlakyService/Retry/delay=300 de.rieckpil.blog.RandomDataProvider/accessFlakyService/Retry/maxDuration=5000 |
YouTube video for using MicroProfile Fault Tolerance
Watch the following YouTube video of my Getting started with MicroProfile series to see MicroProfile Fault Tolerance in action:
You can find the source code with further instructions to run this example on GitHub.
Have fun using MicroProfile Fault Tolerance,
Phil