Today I the pleasure of setting up a benchmark on a duel 8-core Nehalem machine. Funny thing was, the owner of the box thought there were two quad-core chips in the machine but a little investigation demonstrated otherwise.
I started running the application using the default CentOS JRE (1.6.0_09). I test ran the bench for 5 minutes and as is customary, I was watching server GC activity during the run. When the run stopped I figured GC would stop also but it didn't, it kept on running and running. So, I let it run just to see how long it would run for. After 20 minutes I returned to the machine to find that GC had just stopped.
I then upgraded to the 1.6.0_16 and re-ran the bench. The difference was huge! I've not done analysis on the logs but all I can say is that there were far few full GC pauses. Not only that, GC stopped almost immediately after the benchmark completed. Good new because what I figured to be a problem with the application turned out to be a problem with the JVM.
Just exactly what was the problem was, I'm not sure. I do know that some where after _09 and before _14, the throughput collector became NUMA aware. It's made me wonder if running older versions of JVMs that are not Nehalem (NUMA) aware maybe a bit buggy when running on that class of machine? I don't really know. What I do know if the difference in GC behavior is stunning! Long story short, if you are using Nehalem, make sure you are using and up to date VM.
We also deploy our Java applications on CentOS (currently CentOS4, Java5
and Intel Xeon). For the "next version" of our product, the combination
will most probably be CentOS5, Java6 and Intel Nehalem... In other words,
interesting stuff, because I think currently the CentOS guys here at work
installed a JDK 1.6.0_06.