A few years back Jack released a newsletter on April 1st that proclaimed Java as the fasted language ever. In that news letter to pointed out "stunning and irrefutable" evidence to support his claim. The email we received in response to that newsletter was simply amazing. We were called all sorts of things by people who some how missed the joke. So it was pretty funny when I clicked through to a blog entry on how to measure Java execution time to see the identical code being used. We'd already demonstrated that this code would run in 0 as in zero milliseconds. But here it is again only this time taking a whopping 3782 milliseconds to complete.
class timecalc {
public static void main(String a[]) {
long time = System.currentTimeMillis();
System.out.println(time);
for(int i=0;i<5000000;i++) {
for(int j=0;j<1000;j++);
}
long end = System.currentTimeMillis();
System.out.println(end-time);
}
}
To be fair Jack's code was slightly different in that the loop counts where in the trillions or big enough that any traditional application environment would run forever. What the benchmark really spoke to was the power of dynamic run time profiling directed optimizations. In other words, Jacks benchmark was (intentionally) horribly broken. Lets explore.
The first thing to notice is that the statement "for(int j=0;j<1000;j++);"is a bodiless for loop. This dead code can be safely eliminated from the application with other than the timing information, has no ill effect on the output produced by this application. Eliminating it leaves us with this.
class timecalc {
public static void main(String a[]) {
long time = System.currentTimeMillis();
System.out.println(time);
for(int i=0;i<5000000;i++) {}
long end = System.currentTimeMillis();
System.out.println(end-time);
}
}
We can now apply the same argument to the remaining for loop leaving us with this.
class timecalc {
public static void main(String a[]) {
long time = System.currentTimeMillis();
System.out.println(time);
long end = System.currentTimeMillis();
System.out.println(end-time);
}
}
That leaves us with a measure of the time to make a single call to System.currentTimeMillis(), a call to System.out.println(time), and the time it took to detect the dead code and eliminate it. Generally all of that will take place in less than 1ms or at least the resolution of the clock which in some systems leaves you measuring random noise in at least the least significant digit. As a sidebar if you google about for more information you will be a number of post that confuse accuracy with precision. A hand waving definition would be; I can make an accurate measurement that suffers from a low degree of precision. A hand waving example would be; adding two integers takes 0ms on a modern processor is accurate but not very precise.
There was another comment in the posting that I find points to a problem that is common in many benchmarks no matter how big or small they are. The comment suggested that you should run the bench many times because some times the value of 0 was returned. In other words, the benchmark sometimes displayed the right answer but that value was thrown away because it didn't make sense even though running dead code for 3782ms did! Now while it may seem like I'm picking on this particular posting (in fact I am), this commonly happens when people start looking at benchmarking results. They get an answer they don't understand and then just casually throw it away as an anomaly.
Computers are very deterministic machines. The execution time for the same sequence of instructions run over and over again with the same data will identical. Ok, there are a few exceptions to this but I'd argue that those exceptions are few and far between and truly are exceptions. That said, my assertion doesn't hold in light of every day observations. That is because a real CPU is sometimes doing things other than executing our sequence of instructions. While some of these diversions maybe necessary and/or acceptable, we do need to minimize these diversions or the results of our benchmark will become suspect. We can visualize these diversions by looking at the variance in our measurements. Variance is a measure of precision. In this case I bet the variance would have been greater than the value being measured which would cause us to question the measurements accuracy. A proper treatment of the data with the assumption that we are witnessing a deterministic event would have lead us to the conclusion that there are some interfering factors in this benchmark. Of course we know that one of those possible interfering factors was dead code elimination. The next question I would have asked, how should we treat this interference? Do we need to eliminate it or some how account of it or is it part of our answer (embrace and accept)? The answer to that question comes down the purpose of the benchmark.
Flame away :p .....
A good example of precision v. accuracy that most people can relate to:
nitpicker.. completely deterministic but in practical terms you win! I've
had people say complete indeterministic which is completely inaccurate.
Kirk>> A proper treatment of the data with the assumption that we are
witnessing a deterministic event would have lead us to the conclusion that
there are some interfering factors in this benchmark. Of course we know
that one of those possible interfering factors was dead code elimination.
The next question I would have asked, how should we treat this
interference? Do we need to eliminate it or some how account of it or is it
part of our answer (embrace and accept)?