Adding resiliency with MicroProfile Fault Tolerance

Last Updated:  October 11, 2020 | Published: September 11, 2019

With the current trend to build distributed-systems, it is increasingly important to build fault-tolerant services. Fault tolerance is about using different strategies to handle failures in a distributed system. Moreover, the services should be resilient and be able to operate further if a failure occurs in an external service and not cascade the failure and bring the system down. There is a set of common patterns to achieve fault tolerance within your system. These patterns are all available within the MicroProfile Fault Tolerance specification.

Learn more about the MicroProfile Fault Tolerance specification, its annotations, and how to use it in this blog post. This post covers all available interceptor bindings as defined in the specification:

  • Fallback
  • Timeout
  • Retry
  • CircuitBreaker
  • Asynchronous
  • Bulkhead

Specification profile: MicroProfile Fault Tolerance

  • Current version: 2.1
  • GitHub repository
  • Latest specification document
  • Basic use case: Provide a set of strategies to build resilient and fault-tolerant services

Provide a fallback method

First, let's cover the @Fallback interceptor binding of the MicroProfile Fault Tolerance specification. With this annotation, you can provide a fallback behavior of your method in case of an exception. Assume your service fetches data from other microservices and the call might fail due to network issues or downtime of the target. In case your service could recover from the failure and you can provide meaningful fallback behavior for your domain, the @Fallback annotation saves you.

A good example might be the checkout process of your webshop where you rely on a third-party service for handling e.g. credit card payments. If this service fails, you might fall back to a default payment provider and recover gracefully from the failure.

For a simple example, I'll demonstrate it with a JAX-RS client request to a placeholder REST API and provide a fallback method:

With the @Fallback annotation you can specify the method name of the fallback method which must share the same response type and method arguments as the annotated method.

In addition, you can also specify a dedicated class to handle the fallback. This class is required to implement the FallbackHandler<T> interface where T is the response type of the targeted method:

As you'll see it in the upcoming chapters, the @Fallback annotation can be used in combination with other MicroProfile Fault Tolerance interceptor bindings.

Furthermore, you can instruct the fallback annotation to apply only when a specific exception is thrown. You can also include different exceptions from not triggering the fallback behaviour:

By default the fallback occurs on every exception extending Throwable and does not skip on any exception.

Add timeouts to limit the duration of a method execution

For some operations in your system, you might have a strict response time target. If you make use of the JAX-RS client or the client of MicroProfile Rest Client you can specify read and connect timeouts to avoid long-running requests. But what about use cases where you can't declare timeouts easily? The MicroProfile Fault Tolerance specification defines the @Timeout annotation for such problems.

With this interceptor binding, you can specify the maximum duration of a method. If the computation time within the method exceeds the limit, a TimeoutException is thrown.

Getting started with Eclipse MicroProfile Course Bundle

NEWS: Up-to-date with MicroProfile 3.3

All you need to know about MicroProfile

Looking for a resource to learn more about MicroProfile and all its specifications in-depth? Signup for the MicroProfile Course Bundle (E-Book & Video Course)

The default unit is milliseconds, but you can configure a different ChronoUnit:

Define retry policies for method calls

A valid fallback behavior for an external system call might be just to retry it. With the @Retry annotation, we can achieve such behavior. Directly retrying to execute the request might not always be the best solution. Similarily you want to add delay for the next retry and maybe add some randomness. We can configure such a requirement with the @Retry annotation:

In this example, we would try to execute the method three times with a delay of 500 milliseconds and 200 milliseconds of randomness (called jitter). The effective delay is the following: [delay – jitter, delay + jitter] (in our example 300 to 700 milliseconds).

Furthermore, endless retrying might also be counter-productive.  That's why we can specify the maxDuration which is quite similar to the @Timeout annotation above. If the whole retrying takes more than 5 seconds, it will fail with a TimeoutException.

Similar to the @Fallback annotation, we can specify the type of exceptions to trigger and not trigger a retry:

Add a Circuit Breaker around a method invocation to fail fast

Once an external system you call is down or returning 503 as it is currently unavailable to process further requests, you might not want to access it for a given timeframe again. This might help the other system to recover and your methods can fail fast as you already know the expected response from requests in the past.  For this scenario, the Circuit Breaker pattern comes into place.

The Circuit Breaker offers a way to fail fast by directly failing the method execution to prevent further overloading of the target system and indefinite wait or timeouts. With MicroProfile Fault Tolerance we have an annotation to achieve this with ease: @CircuitBreaker

There are three different states a Circuit Breaker can have: closed, opened, half-open.

In the closed state, the operation is executed as expected. If a failure occurs while e.g. calling an external service, the Circuit Breaker records such an event. If a particular threshold of failures is met, it will switch to the open state.

Once the Circuit Breaker enters the open state, further calls will fail immediately.  After a given delay the circuit enters the half-open state. Within the half-open state, trial executions will happen. Once such a trial execution fails, the circuit transitions to the open state again. When a predefined number of these trial executions succeed, the circuit enters the original closed state.

Let's have a look at the following example:

In the example above I define a Circuit Breaker which enters the open state once 50% (failureRatio=0.5) of five consecutive executions (requestVolumeThreshold=5) fail. After a delay of 500 milliseconds in the open state,  the circuit transitions to half-open. Once ten trial executions (successThreshold=10) in the half-open state succeed, the circuit will be back in the closed state.

This annotation also allows defining the exception types to skip and to fail on:

Execute a method asynchronously with MicroProfile Fault Tolerance

Some use cases of your system might not require synchronous and in-order execution of different tasks. For instance, you can fetch data for a customer (purchased orders, contact information, invoices) from different services in parallel.  The MicroProfile Fault Tolerance specification offers a convenient way for achieving such asynchronous method executions: @Asynchronous:

With this annotation, the execution will be on a separate thread and the method has to return either a Future or a CompletionStage

Apply Bulkheads to limit the number of concurrent calls

The Bulkhead pattern is a way of isolating failures in your system while the rest can still function. It's named after the sectioned parts (bulkheads) of a ship. If one bulkhead of a ship is damaged and filled with water, the other bulkheads aren't affected, which prevents the ship from sinking.

Imagine a scenario where all your threads are occupied for a request to a (slow-responding) external system and your application can't process other tasks. To prevent such a scenario, we can apply the @Bulkhead annotation and limit concurrent calls:

In this example, only five concurrent calls can enter this method and further have to wait. If this annotation is used together with @Asynchronous, as in the example above,  it means thread isolation. In addition and only for asynchronous methods we can specify the length of the waiting queue with the attribute waitingTaksQueue.  For non-async methods, the specification defines to utilize semaphores for isolation.

MicroProfile Fault Tolerance integration with MicroProfile Config

Above all, the MicroProfile Fault Tolerance specification provides tight integration with the config spec.  You can configure every attribute of the different interceptor bindings with an external config source like the microprofile-config.properties file.

The pattern for external configuration is the following: <classname>/<methodname>/<annotation>/<parameter>:

YouTube video for using MicroProfile Fault Tolerance

Watch the following YouTube video of my Getting started with MicroProfile series to see MicroProfile Fault Tolerance in action:

You can find the source code with further instructions to run this example on GitHub.

Have fun using MicroProfile Fault Tolerance,

Phil

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Sign up for Our Mailing List And Get

the Testing Java Applications ($9) Cheat Sheet for Free

Testing Java Applications Cheat Sheet Cover
>