Blog

Harden Performance of REST calls using Spring WebFlux

12.10.2021 by Manuel Gerding - 7 min read

Do you have sequential REST calls in your code? If so, how do they behave when the network is slow? Don't know yet? Let’s learn together and cover how easy we can improve our code using Spring WebFlux when fetching data from multiple independent REST endpoints.

Our Example System

In order to make it hands-on we use a small demo online shop (checkout the GitHub repository) consisting of a Gateway-microservice which provides a REST endpoint (/products) to deliver various products to a shop-frontend. Since the Gateway is stateless it fetches all products directly from other microservices (Hot-Deals, Fashion and Toys) in a synchronous way (see code below).

@RestController
public class ProductsController {
   @Value("${rest.endpoint.hotdeals}")
   private String urlHotDeals;
   ...
   @GetMapping("/products")
   public Products getProducts() {
       Products products = new Products();
       products.setFashion(getProduct(urlFashion));
       products.setToys(getProduct(urlToys));
       products.setHotDeals(getProduct(urlHotDeals));
       return products;
   }

   private List<Product> getProduct(String url) {
       return restTemplate.exchange(url,
                                    HttpMethod.GET,
                                    null,
                                    productListTypeReference
                            ).getBody();
   }

}

While this code works fine and delivers all the products to the frontend in a reasonable amount of time (~200ms), there is still some room for improvement under turbulent conditions.

Demo Onlineshop in good conditions - everything works fine

What to improve?

So, what happens in turbulent conditions? Like the network is delayed or one of the three microservices is not as fast as the others? Is the response still delivered in a reasonable amount of time? Usually, delaying the network is not as easy as it sounds. Especially in a non-local environment it is hard to establish tooling for that and takes more time then the fixing part afterwards. Luckily, we have steadybit in place and can make use of it!

Steadybit for Rescue

Opening up steadybit, it has already discovered the entire system and is ready to run an experiment to simulate a turbulent condition. We’ll start with delaying the traffic for all three product-microservices (Hot-Deals, Fashion and Toys) making use of the new experiment features introduced some weeks ago. Simply follow the next steps to create and execute the experiment. In case you haven’t installed and configured steadybit yet, let’s discuss how you can get it.

steadybit has discovered our system

Step 1: Create Experiment

The first thing to do is to create a new experiment, which is pretty straightforward:

  1. We go to Experiments, choose to create a new Experiment and choose the guided approach using our wizard.
  2. We give the experiment a useful name (e.g. “Shop is performant when having slow network”) and choose the Global area for now (giving access to the entire environment).
  3. We choose to attack containers and specify them using attributes. We use a query like the one below to target all three product microservices (hot-deals, fashion-bestseller and toys-bestseller).
Create Experiment, step 2 - specify targets via query
  1. Since we want to have a maximum effect we choose to have an impact of 100% at the following wizard step.
  2. Apply the attack from the category “Network”, “Delay Traffic” and start with default settings (having 500ms delay). We also disable the Jitter in the “Additional Settings'' which would add some randomness to represent the real world. Complete the wizard via “Save Experiment”.

Step 2: Extending Experiment

Although the experiment is ready to be executed, we will tune it to measure the HTTP response time while experimenting. Therefore, we add a “wait”-step in front of the “Delay Traffic”-attack of 10 seconds (being our reference response time) as well as a HTTP call for the /products endpoint over a time span of 90 seconds in the second lane.

Extending experiment with useful HTTP check

Step 3: Run Experiment

Now it’s time to take action and run the experiment. Thanks to the new experiment features we are able to see all relevant parts directly in steadybit’s execution window. We can check the response time using our HTTP responses. Furthermore, by looking at the Deployment Replica Count and Kubernetes events we can be sure that PODs are not restarted by Kubernetes due to failed Liveness probes.

Running experiment with current solution - slow network has multiplied effect on system's performance

Uff, what an effect. Although the network is only delayed by 500ms, it is summed up in total to a response time of up to 2000 ms. Compared with the values before and after the delay network attack (~30ms) the response time has increased by more than 60 times.

Step 3: Integrate WebFlux

The cause of this behavior is the fact that the Gateway fetches all three product-endpoints sequentially, after each other. Thereby it doesn’t start to request e.g. products of toys before fashion has responded (see code above, method getProducts()).

In order to solve that we can parallelize all three requests as they have no dependencies on each other. The easiest way to do that is to integrate Spring WebFlux. Spring WebFlux provides support for reactive programming in Web applications being based on Project Reactor’s Flux and Mono.

So, first, we add the dependency to the Gateway’s pom.xml.

...
  <dependencyManagement>
    <dependencies>
      ...
      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
      </dependency>
    </dependencies>
</dependencyManagement>
...

After that, we can refactor the Gateway’s REST Endpoint in ProductsController by using the WebClient implementation based on Mono instead of Spring’s RestTemplate. The WebClient is a non-blocking asynchronous REST client, so that all three requests are triggered in parallel - independent of the response-time of one microservice.

@RestController
public class ProductsController {

   private final WebClient webClient;
   ...
   @GetMapping("/products")
   public Mono<Products> getProducts() {
       Mono<List<Product>> hotdeals = getProduct("/products/hotdeals");
       Mono<List<Product>> fashion = getProduct("/products/fashion");
       Mono<List<Product>> toys = getProduct("/products/toys");

       return Mono.zip(hotdeals, fashion, toys)
               .flatMap(transformer -> Mono.just(new Products(transformer.getT1(), transformer.getT2(), transformer.getT3())));
   }

   private Mono<List<Product>> getProduct(String uri) {
       return webClient.get().uri(uri)
               .retrieve()
               .bodyToFlux(productTypeReference)
               .collectList()
               .flatMap(Mono::just)
               .doOnError(throwable -> log.error("Error occured", throwable));
   }
}

Step 4: Validate Change

Let’s validate our new implementation in steadybit again by re-running the same experiment and checking the response time: Now the response time is roughly 550ms - 600ms which is reasonable as it is exactly the sum of the induced delay of 500ms plus the response time in good conditions (~50ms - 100ms). So, pretty good performance tweak!

Running experiment with current solution - slow network has limited effect on system's performance

Conclusion

In this blog post we have learned how we can speed up the implementation of an orchestration endpoint by using Spring WebFlux Mono. Thanks to the new experiment features of steadybit we could also easily validate the difference of the two implementations.

However, don’t forget to check the new WebFlux-based implementation for newly introduced weak spots. Start for instance by validating the exception handling, followed by checking how WebFlux client reacts to unavailability of one of the REST endpoints (e.g. via a simulated AWS zone outage). Of course, these need to be re-validated whenever relevant code changes happen.

Written by

Manuel Gerding, Product Manager

Manuel is the youngster in the steadybit family and is constantly hungry for knowledge and new perspectives to broaden his horizons. After working almost a decade as a consultant and software engineer he focusses his perspective on the needs and demands of the user. His mission is to build a great product that really makes customer’s services more stable and valuable to their customers. To regain energy, Manuel loves to read a good book, take a trip with his bike or do a short meditation session.
@manuelgerding Manuel Gerding

Recent articles