Java Performance Services
Training, Seminars, Benchmarking, Tuning

Java Performance Tuning Course


Chania Crete, May 17-20, 2010


Sun Extreme Learning EXL-2025

Houston, December 1-4,2009
New York, December 8-11, 2009
Washington DC, January 5-8, 2010



San Francisco, January 11-14

Anti-if

I have joined Anti-IF Campaign

Calendar

««Nov 2009»»
SMTWTFS
1234
5
67
891011121314
15161718192021
22232425262728
2930

Performance Anti-Patterns

My Top Tags

                                       

Mailing List

My RSS Feeds








Upside down benchmarking

posted Sunday, 20 April 2008

One of the more difficult puzzlers that I presented in Vegas has to do with benchmarks, or configurations that resulted in response times that went in a completely different direction than expected. For example, one application that I was working with responded better when the default garbage collector was used over the concurrent collector. When these unexpected things happen, I like to call them upside down results or an upside down benchmark.

In the case of garbage collection, conventional wisdom tells us that the concurrent collector should give us better more consistent response times because it doesn’t have as severe a “stop-the-world” phase as the default collectors do. I can list a number of other upside benches. For example, there are benchmarks that show using a Hibernate is faster than using straight JDBC (without the caching and other fancy speed me up features turned on). At face value this doesn’t make sense. What could be faster than direct use of the JDBC. Adding Hibernate will only lengthen the execution path. Without the caching and other fancy features, using Hibernate should only make the whole process slower. So what is at work here?

If we define response time as the sum of all the individual response times for each of the components that a system is comprised of, we then need to understand these individual response times are determined by a more complex mix of rates of consumption or computing resources moderated by the availability. Simply put, if we have several threads trying to share the CPU, while one is occupying the CPU, the other must wait in a queue. We all know that queuing adds latency which works to inflate response times. By how much is a function of several variables.

If the latency introduced by queuing for a resource results in our response times becoming longer than what can be tolerated, then the resource in contention will be considered to be a bottleneck. Typically we would look at the code path and reason out how we are utilizing the resource and look for ways to reduce our dependency on it. But what happens if we’ve mis-identified the bottleneck? Or worse, lets say that we’ve identified the bottleneck and we get a nice reduction only to find that the system performs worse.

In figure 1 we have two response time curves. Lets let the little line connecting the curve to the graph to represent response time at the given load. For the purposes of this discussion we will ignore the trivial case where no queuing takes place.

If we trace a single request through the system we will see that it arrives at S1 and will immediately be queued. So, our requests service time is the sum of the service time for all of the requests waiting in front of us plus our service time. After clearing S1 the request will be queued for S2 where the same service time calculation applied. Total response time for the system is the sum of each of the individual response times.

 

 Figure 1. Total response time for a double queued system

 

Lets say we tune S1 which results in a nice increase in performance. In response, we should expect the overall response to decrease by the amount of the improvement we gain in tuning S1. Following our performance best practices, we re-baseline the system with the identical load only to find (much to our surprise) that response time has risen! In other words, the results are upside down.


As previously stated, the request will arrive at the first service and immediately be queued. The length of the queue will be a ratio of the average time to service an individual request times the inter request arrival rate. For example, if we have 1 request arrive every second and our service time is .5 seconds, the average queue length will be .5. So half of the time there will be one request in the queue ahead of ours. So our expect average response time for service 1 will be 1.5 seconds. More importantly, S1 is acting as a throttle on the flow of requests to down stream requests.

If S1 is acting as a throttle for S2, improving S1's performance is akin to letting the motor on your car race faster. This acts to increase the load on S2. Since it’s performance characteristics have not changes, our requests suffer from an increased average response time. The net increase is a factor of the overall service time and where we move to on the performance curve. If the increase in load pushes us beyond that knee, we will experience a much larger than expected increase in response time. This will result in the sum of the two service times being longer instead of being short. This effect is diagramed in figure 2.

Figure 2. Total response time for a double queued system

The reality is; real systems are much more complex than this trivial example. Real systems contain many queues, many of them hidden from us. Many of these queues are bounded by physical constraints that result in asymptotic response curves. For example, TCP or even IP are bounded by a physical resource that isn’t sharable. You can only load the network with a finite amount of data. When it becomes saturated, interference will result in even longer response times.

Why should you care? Well there are a couple of reasons. First, the Hibernate benchmark was subject to a bottleneck that existed outside of the system under study. From the description of the problem it wasn’t clear to me if there was a problem with the network or the database or something in between. What was clear is that adding Hibernate into the benchmark took enough pressure off of some down stream queue/service that the overall system response time improved. This isn’t the first time I’ve seen this.

A number of years ago while tuning a Java based system, a co-worker was just about to launch a system written in Lisp that was designed to do some data mining. The process took about 4 hours running on a single box. The time budget was 1 hour. No problem, we add 3 more boxes and since each individual calculation had no dependencies, it should be done in 1 hour. Needless to say the first run on 4 machines took more than 12 hours to complete. The database looked normal as did the network. However an issue in the network stack on the machine with the database created a point of serialization in the entire system. This kept the load off of the database allowing it to respond normally. It also created a lot of confusion because once the database was ruled out, there was no apparent reason why the system should be running slower.

In the case of Concurrent GC vs the default collector there were some slightly different mechanics. However the story is about the same. The concurrent collector in allowing threads to run at the same time as the application actually allowed the application threads to progress faster. The decrease in time to get to the bottleneck resulted in more pressure on the bottleneck. The effect of the increased pressure on the bottleneck was decreased overall performance. In this case (and a few others that I’ve run into), the “stop-the-world” property of the default collector was acting to throttle application threads speeding along to the bottleneck.

Just recently I ran into a situation where the response to high levels of contention was to decide to cluster the application. My comments were; make sure that you know where the real bottleneck is. If it is with-in the main body of the application running inside the JVM, you will most likely be ok. If it is external to the main body of the application or, it is  a resource that will be shared by the nodes in the cluster, you maybe setting yourself up for a world of hurt. Just make sure that you have a way to rollback if you are just guessing.