One of the more difficult puzzlers that I presented in Vegas has to do with benchmarks, or configurations that resulted in response times that went in a completely different direction than expected. For example, one application that I was working with responded better when the default garbage collector was used over the concurrent collector. When these unexpected things happen, I like to call them upside down results or an upside down benchmark.
In the case of garbage collection, conventional wisdom tells us that the concurrent collector should give us better more consistent response times because it doesn’t have as severe a “stop-the-world” phase as the default collectors do. I can list a number of other upside benches. For example, there are benchmarks that show using a Hibernate is faster than using straight JDBC (without the caching and other fancy speed me up features turned on). At face value this doesn’t make sense. What could be faster than direct use of the JDBC. Adding Hibernate will only lengthen the execution path. Without the caching and other fancy features, using Hibernate should only make the whole process slower. So what is at work here?
If we define response time as the sum of all the individual response times for each of the components that a system is comprised of, we then need to understand these individual response times are determined by a more complex mix of rates of consumption or computing resources moderated by the availability. Simply put, if we have several threads trying to share the CPU, while one is occupying the CPU, the other must wait in a queue. We all know that queuing adds latency which works to inflate response times. By how much is a function of several variables.
If the latency introduced by queuing for a resource results in our response times becoming longer than what can be tolerated, then the resource in contention will be considered to be a bottleneck. Typically we would look at the code path and reason out how we are utilizing the resource and look for ways to reduce our dependency on it. But what happens if we’ve mis-identified the bottleneck? Or worse, lets say that we’ve identified the bottleneck and we get a nice reduction only to find that the system performs worse.
In figure 1 we have two response time curves. Lets let the little line connecting the curve to the graph to represent response time at the given load. For the purposes of this discussion we will ignore the trivial case where no queuing takes place.
If we trace a single request through the system we will see that it arrives at S1 and will immediately be queued. So, our requests service time is the sum of the service time for all of the requests waiting in front of us plus our service time. After clearing S1 the request will be queued for S2 where the same service time calculation applied. Total response time for the system is the sum of each of the individual response times.

Figure 1. Total response time for a double queued system
Lets say we tune S1 which results in a nice increase in performance. In response, we should expect the overall response to decrease by the amount of the improvement we gain in tuning S1. Following our performance best practices, we re-baseline the system with the identical load only to find (much to our surprise) that response time has risen! In other words, the results are upside down.
As previously stated, the request will arrive at the first service and immediately be queued. The length of the queue will be a ratio of the average time to service an individual request times the inter request arrival rate. For example, if we have 1 request arrive every second and our service time is .5 seconds, the average queue length will be .5. So half of the time there will be one request in the queue ahead of ours. So our expect average response time for service 1 will be 1.5 seconds. More importantly, S1 is acting as a throttle on the flow of requests to down stream requests.
If S1 is acting as a throttle for S2, improving S1's performance is akin to letting the motor on your car race faster. This acts to increase the load on S2. Since it’s performance characteristics have not changes, our requests suffer from an increased average response time. The net increase is a factor of the overall service time and where we move to on the performance curve. If the increase in load pushes us beyond that knee, we will experience a much larger than expected increase in response time. This will result in the sum of the two service times being longer instead of being short. This effect is diagramed in figure 2.

Figure 2. Total response time for a double queued system