Measuring Serialization
posted Wednesday, 9 April 2008
Last night I spoke at the Paris JUG. The location was quite interesting, a smallish theater in the back of a store that sold art, lamp oils and other things. At first I thought I was in the wrong place. Even many of the attendees had to be coaxed into the store not realizing they'd found the right address.
As is typical, the most interesting part of the evening is getting to talk to technologies with the people who've shown up. And since my concurrency talk is more about the bigger picture, where we are going in computing and what we (as developers) need to do to adapt, I find that it excites people into thinking and coming up with solutions. Last night was no exception. The after talk conversations had me considering the possibility of a new metric, line of code contained in a synchronized block.
I like software metrics because they quickly allow me to see potential problems in code that I’ve no familiarity with. I say potential problems with a strong emphasis on the word potential. It is by no means a binary litmus that say, yes it’s good or no, this code needs work. However software metrics do suggest areas of your code that you should take a peek at. This is exactly what the lines of code in a synchronized block would give you. A hint that this part of your code maybe aggressively synchronized.
The first glance it looks like it should be an easy metric to whip up. However, there are a number of things that need to be considered. For example, how do you count loops? How does one count method calls? Are there any other operations that we may need to consider such as array copies or calls to external systems? For example, I could have a call such as creditService.authorize(purchase) that makes a remote call to a bank. That call is most likely going to be a lot more expensive than any large chunk of code that remains local. And since it is synchronized, it represents a point of serialized execution in your application. In other words, your code will run as single threaded that will only use 1 CPU no matter how many you have.
It maybe such that you needed to synchronized a large block of code. Maybe that it doesn’t matter that there is a lot of code in the synchronized block. These are things you can determine once you’ve had a look at it. The bigger question is; how do you find these things in the first place?
One drawback with this “performance” metric is that is based on a static analysis of the code. If you’ve seen or read about “the box”, my abstraction of a computer system, you know that this metric takes the measurement in a single layer (application) absent of the others. To make matters worse, the application layer is devoid of all the dynamic information that result in a more meaningful measure. In other words, measuring for hot locks in a running application is the truth whereas measuring lines of code or potential activities in a synchronized block is a guess. Given that this is a micro performance measurement, it may or may not be important. It maybe that higher order architectural decisions will result in much larger problems. However, while in development, it maybe the only measurement that you’re going to get and in that case, it maybe useful to highlight a potential problem. This is generally the case with any metric. It gives us a hint of a problem and quite often that is good enough.