Java Performance Services
Training, Seminars, Benchmarking, Tuning

Java Performance Tuning Course


Chania Crete, May 17-20, 2010


Expert led Training
Including admission to JFokus
Stockholm Sweden
January 25th-29th.

Sun Extreme Learning EXL-2025

Washington, Jan 5-8,2009
San Fransisco, Jan 11-14, 2009
Atlanta, Feb 8-11, 2010
Tokyo, March 1-5, 2010
Hong Kong, TBD



San Francisco, January 11-14

Anti-if

I have joined Anti-IF Campaign

Calendar

««Feb 2010»»
SMTWTFS
  123456
78910111213
14151617181920
21222324252627
28

Performance Anti-Patterns

My Top Tags

                                       

Mailing List

My RSS Feeds








My advice on JVM heap tuning, keep your fingers off the knobs!

posted Friday, 15 February 2008

Just in the last few months I've been seeing an increase in the number of people seeking advice in the forums on how to tune garbage collection and/or size Java heap. While it is encouraging to see that  more people are becoming more aware that JVM configuration is important to good performance, I got the sense that people are still struggling to sort out how all of this automated memory management stuff actually works.

They get the idea that things are kept in heap and that heap is subdivided. They get the idea that the garbage collector works to free up memory that is no longer being "used". They are out looking at lists that go on and on... and on.. .... and on, and on with every possible switch configuration setting and then are all of a sudden overwhelmed with the array of choice. Having no time to build a complete understanding, which is an understandable situation, they go off to a forum and ask a question which of course draws dozens of responses all of which are mostly different thus crystallizing the only thing we know about GC and heap sizing, your guess is a good as mine. And so, we guess.

When it comes to performance and tuning, it has been my experience that guessing is not a very good way of ensuring desired results in a consistent manner. This knowledge resulted in Jack and I (I forget who said it first so I'll say Jack) to coin the mantra, Measure, don't guess. The question is, how do we take the guessing out of heap sizing and collector selection? In other words, what do we need to measure.

It is my expressed opinion that the primary goal of all performance tuning exercises should be maximize the end user experience given the resource constraints we are required to work under. A less formal expression of this would be; minimize response times to the end user (though low latency maybe a more important target). So it falls from this that we need to be measuring user response times. If the user response times are within tolerance then we are done! There is no need to touch anything. However, if you are asking the question, is memory management a bottleneck in my system, then most likely it is because that user response times are not with-in tolerances and we need to start asking deeper questions by first looking at GC efficiency (also known as GC throughput).

GC efficiency is defined as the % of time it is running in exclusion to your application over the run time of the application. You may find it useful to limit this definition from the amount of time the JVM has been running to the amount of time your application has been active. No matter the definition, the best way to calculation this value is to collect the logs produced by -verbose:gc switch (-Xloggc:logfile.name preferred for Sun JVM) and feed it through a tool such as HPJTune (free download). If the value produced by the tool is greater than 10%, you have a case for proceeding with the tuning process. If it is less than 10% but greater than 5%, tuning may help but it might not give you the boost you were hoping for. Anything less then 5% and you're most likely wasting your time. Again, latency concerns aside which is why the hedge words "most likely" are included in the preceding statement.

Ok, you take the measurement and you see that value is way above 10%. Now what? The answer depends on which version of the JVM you are using as over time things have gotten better and have also opened up more options. But lets focus on the "things have gotten better" part.

Garbage collection ergonomics are to memory management as the "Just-in-time" or JIT compiler is to execution speed. Information is collected via dynamic profiling (think HotSpot) and that information is used to control a number of aspect of Java heap and the behavior of the collectors and in some cases, the choice of collector. The simple example is what happens a 1.6 version of the JVM starts up.

On startup, the 1.6 JVM does a survey of the environment in which it is executing and uses that information to determine if it should behave as a server or a client JVM. This choice affects, the number of GC helper threads, the sizes of the various spaces with-in the heap and a whole wrath of other configuration values. But configuration doesn't stop here. Instead dynamic profiling directed by GC ergonomics is used to further refine how much heap is allocated, how it is to be proportioned to if we need to change from the less efficient throughput focused collector to the more efficient implementations. Now here is a take away, the more switches you set, the more parameters you fix, the less options ergonomics has to dynamically adjust to the situation on the ground (or in the JVM as it may be). Short story, the less you fiddle with, the better things will be for you in the long run.

So with that in mind, you need to define the goals of the tuning exercise. Most likely they are going to be; improve GC throughput and decrease GC pause times. Most likely the answer to both of these problems is going to be, configure max memory using the -Xmx flag. If you do this carefully, you should see (and there will be notable exceptions to this) GC efficiency improving. Hopefully you will also see GC pause times falling along with the improvement. So, if a little is good, a lot must be really good.. no? And no. If you give the system too much memory, GC frequency will fall and GC efficiency will improve but you will start to experience long GC pause times as the system tries to maintain the much too large heap space. In other words, GC pause times will bottom out at some optimal heap size. If you move away from that point, GC pause times will start to increase.

The trouble with the above explanation is that it is too simple. For one it considers that the optimal heap size is a constant for your application. In fact it isn't. This is why we don't want to get hung up in finding the optimal size, there simply isn't one. It all depends on time local rates of object creation and object churn and other things. Fortunately GC ergonomics will adjust things to be best cope with the flux. That is, unless you pin it down to specific values using command line settings.

One of the advantages (of which there are many) of using generational spaces is that you can play tricks to reduce GC pause time. Here is how it works. Objects are created in young space or more specifically in Eden under most circumstances. When Eden is full a collection is triggered during which live objects will either be concentrated into one of the two survivor spaces or tenured into old space. Now there are two important points to be considered.

Point 1, paradoxically, the cost of collecting young is determined by the number of objects that survive, not the number that die. This is because to clear Eden all you have to do is copy the live object to a survivor space. Now all of the memory in Eden can be returned to the free list in one shot. The expensive operation, aside from finding the live objects, is to copy them. Note, we are not finding dead object, we are finding live ones. Ditto for the other survivor space which will be combined into the newly activated one.

Point 2, objects that reach a certain age will be tenured into old space. Objects that won't fit into a either a Survivor space or Eden will be created or copied directly into old space. In the latter case this is known as a premature promotion. The reason why we don't want objects to be promoted prematurely is that they are very very likely to die very quickly and removing them from old requires a mark, sweep, and compaction (read in place copy just like disk defragmentation). In fact we don't want old to fill up at all because that will result in more full GCs and full GCs are always very expensive in regards to pause time. More over, as old fills, it becomes more difficult for it to meet its young generation guarantee. If your head is spinning from the details don't despair, things do get easier from here on in.

What the young generation guarantee states is that there should always be enough space in old to accommodate all survivors from young. Executive summary, if I don't believe that there is enough free space in old, I must trigger a full GC. Hint, don't let short lived objects "leak" into old space. The stressor is; don't keep long lived objects in young space. The dilemma is, that which keeps short lived objects in young will also keep long lived objects out of old. Did I say this was going to get easier?

The parameters that you want to be looking at that will affect how long an object can stay in young are; survivor ratio and tenuring threshold.

Survivor ratio is simply a value the tells the JVM how to partition young into old and survivor spaces. If the survivor spaces are too small, objects will be prematurely promoted into old to make way for newer objects. If they are too big, young generation will have to be GC'ed more often resulting in objects aging faster than normal which results in them being prematurely tenured. Even worse, object creation maybe forced into old space where it is much more expensive. Again, balance is the key.

So, if we are to make an informed decision we must have a measurement. In this case the measurement to take is look at the age distributions of your objects in the survivor spaces. The default threshold for Sun is 31. If you have objects leaking into old and the age distribution doesn't include old objects the conclusion can only be, you are experiencing premature promotion. If you are experiencing many full GCs yet the age distribution looks normal, then it maybe a case were you need to increase the tenuring threshold in an attempt to capture these objects in young space.

What every GC problems I've faced, I've found that I've rarely needed to set anything other than max memory and the survivor ratio. I've also learned that some problems disguise themselves as GC problems. These problem won't go away by tuning the GC. Though you maybe able to mask or hide them for some period of time, they will manifest themselves (often viciously after having been suppressed for so long) sooner or later unless you address the underlying fault. One example of this could be ineffective use of thread pooling that allows the system to runaway leading to high rates of object creation that are beyond the collectors ability to keep up.

The best lesson I've learned is that you only have to be a rocket scientist to create this stuff, you don't have to be one to use it. What I'm trying to say is that if you don't know what is going on and you're unsure of what to do, instead of guessing, try poking around for a bit to get a better measure because I've seen where a proper measure will explain all to even the most technically inept. I also know that a bad guess can cripple GC ergonomics.

Last point, don't forget to measure for effect and the user experience rules all even when the means defy what any "expert" tells you.

tags:                




1. Eric Jain left...
Friday, 15 February 2008 7:50 pm :: http://eric.jain.name/

Good point! Last time I tried to tune and benchmark several JVMs, pretty much every setting I changed ended up decreased the performance :-) That said, I do like to set a maximum heap size (especially if the machine isn't dedicated to the application), and sometimes you need to override the maximum perm gen size (e.g. for Tomcat if you need to redeploy a lot without restarting). Also I think it makes sense to tell the JVM if you want to optimize for overall throughput or minimize delays, a fundamental trade-off (JRockit: -Xgcprio:throughput vs pausetime).


2. Kirk Pepperdine left...
Friday, 15 February 2008 8:46 pm

Thanks Eric, there is actually a lot more to say but I ran out of time. Witness the poor proofing. Good catch on Tomcat and the classloading/perm space problem. That one is a constant source of confusion for those that face it. I know it tripped me up the first time I ran into it.


3. Gil Tene left...
Saturday, 16 February 2008 7:14 pm

Good overview of how collectors work and the tradeoffs they have to face. I think the "measure, don't guess" matra is a very good one, with one caveat - make sure you really measure the behavior you target for the real world, and not just cover up the behavior long enough to pass a relatively short test. I know that you, Kirk, actually spend the time to watch and verify the longevity and stability qualities of the systems you tune, but most people don't have the tools.

The key problem I've seen with the way people tune GC using iterative attempts that involve tests and measurements over less-that-full-day-runs. They usually avoid, for practical reasons, verifying longevity (in full/real load 24 hour tests, for example) or testing for stability (by verifying that oldgen doesn't grow at all, and that no promotion is being done at all for, so you know that full GC pause will never hit you, for example, or by verifying that the heap is not being fragmented at all by CMS to the point where a full compaction will be needed). You may be ok with limited longevity or stability (e.g. - ok to reboot every night or every week), but then you need to verify and measure data that convinces you that the you *will* achieve your targets under real loads.

In "normal" JVMs, virtually all the GC tuning exercises end up being about making the bad GC situation rare , not go away. They are rarely about verifying it's actually gone away. The best evidence of this is that verifying this is hard.

When it's a normal STW GC system - the bad thing people try to make rare is a full GC (which compacts every time in most STW collectors). When it's a "mostly concurrent" GC system (such as CMS), the rare bad thing is compaction, or concurrent marker not being able to keep up with mutation rate under load and causing a full pause. The more sophisticated the collector modes, the easier it is to miss the inevitable move big pause.

Bottom line - don't trust a test unless you've seen it do 10 *compacting* full GCs during the test, or have proven to yourself that for the duration of time you need the system to run you will *never* see a compacting full GC.

This point was a major driver for the design of the "Pauseless" garbage collector in Azul's JVM. Instead of having GC modes and code that are rarely run, and flags that are mostly useful for making them happen even more rarely - our collector goes through all major GC operations frequently. When Azul's Generational "Pauseless" collector executes, there are no "rare but bad" things that won't happen under load on a frequent and regular basis, and no attempts to tune such behaviors away (into tomorrow). Instead, we compact often, and will keep up with 10s of GB/sec of allocation, all of which is is dealt with with concurrent compaction at a matching rate.

This fundamental behavior - being able to reliably compact the heap without pausing the application, means that we use generational behavior for throughput and efficiency control, but not as a way to avoid full GCs. In our systems, Full GCs are good, and we are glad to have them happen often - you'll see several in any test, and be able to trust your system's behavior as a result.

-- Gil.

Gil Tene, CTO & Co-Founder Azul Systems


4. Kirk Pepperdine left...
Saturday, 16 February 2008 8:20 pm

Gil, thanks for the great comments and get insight gained with you experience building Azul hardware. I always recommend that people test for as long as they plan to run their application. If the the application is 24/7/365, then of course you will want to make an exception. However you won't find slow leaks or slow leaks into perm or other patterns that can destablize your application.

I should add that recent testing with the 1.6 (data loads) suggested about 10 minutes from startup for GC to adjust. What I saw was 10 minutes of ever erratic behavior after which the system just dropped into a steady state. Again, this was a data load which is most likely atypical but it did suggest a reaction time.

Kirk


5. neil from Dallas left...
Tuesday, 19 February 2008 8:22 pm

Neil from Dallas says hi, and I am coming to Hungary to visit you this summer. You have been forewarned.


6. Jack left...
Saturday, 23 February 2008 6:19 pm

It was you Kirk that came up with the "Measure, don't guess" mantra. I've never been that pithy,


7. Jürgen left...
Friday, 10 October 2008 3:10 pm

Hi Kirk,

nice post!

I wonder why you didn't mention something about the tuning parameter -XX:NewRatio. Imagine following case: We have a huge heap with a maximum size of 9gb. The JVM by default partitions the heap into 1.2gb eden, 1.2gb survivor and 6.6gb old. The server application is responsible for excecuting short but memory intensive tasks (e.g. svg rendering). Thus there aren't many long lived objects which will get promoted into old gen.

The proplem we have that the young gen gets full very quickly with a lot of of short lived objects which aren't dead yet. A lot of them get prematurely promoted into old gen. Sometimes young generation collections are very expensive because there are too many live objects (young gen max: 22sec vs. avg: 0.5 sec).

Isn't it appropriate in this case to increase the young generation via the newratio parameter to avoid premature promotion and move available space from the oversized old gen to the new gen? The goal is of course to reduce the amount of live objects when a young generation collection occurs. This seems for me to be a better choice than to tune the survivor ratio which won't help us moving some space from old gen to new gen.

What do you think in this case?

BTW... I also found some "typos"!

You wrote: "Survivor ratio is simply a value _the_ tells the JVM how to partition young into _old_ and survivor spaces" I think you wanted to write: "Survivor ratio is simply a value _that_ tells the JVM how to partition young into _eden_ and survivor spaces"

Jürgen


8. Kirk Pepperdine left...
Friday, 10 October 2008 3:53 pm

Hi Jürgen,

There are clearly times when you must get in and dabble with collector settings and certainly the situation you are describing is one case in point. Also you've hit on another point, if you are going to dabble at least use the ratio and hint parameters rather than those that fix values. Although newratio is a fixes a size value, it still allows the size of young to be adjusted by ergonomics as it adjusts the overall size of the heap. Fixing the size of young gen would limit the ability of ergonomics to resize or if it did, it could have unintended consequences.

What your your young collection times reflect is a couple of things. 1) GC is "run to failure". Since it is the collectors job to collect, failure in this case means, it wasn't able to collect. 2) The collector must evacuate (ie copy) all objects out of eden into either a survivor space (if there is enough room) or old space (should survivor not be able to handle the volume). The more objects that are found alive in young, the more likely you are to spill into old space. Bottom line, letting objects live is bad. Message, push down to narrow the scoping of all variables so they fall out of scope naturally. If you find yourself setting variables to null or overwriting old values, the place holder (variable) maybe too broadly scoped. Tensor is, you can only create and throw them away up to a certain rate after which that will start adversely impact performance. Message here is; you may need to switch to a more memory efficient algorithm.

I could over analyze this and have lots of fun doing it. But now I've got to go schnitzelize some poor pig parts!

regards, Kirk