Java Performance Services
Training, Seminars, Benchmarking, Tuning

Java Performance Tuning Course


Chania Crete
May 22rd-25th 2012

Sydney Australia
Feb 6-9, 2012

Montreal Canada
Feb 27-March 2, 2012

San Francisco CA
March 20-23 2012

Paris France
March 27-March 2, 2012

Request an in-house seminar

Anti-if

I have joined Anti-IF Campaign

Calendar

««May 2012»»
SMTWTFS
   12345
6789101112
13141516171819
20212223242526
2728293031

Performance Anti-Patterns

My Top Tags

                                       

Mailing List

My RSS Feeds








Why do I have this long GC pause?

posted Saturday, 26 February 2011

Just recently I ran into a question; why did this collection take 9 seconds to complete. It's a question that comes up reasonably often and how to answer it is a topic that I give considerable attention to in my performance tuning workshop. Fortunately the question was accompanied with a garbage collection log fragment and so I though it would be fun to work through the analysis here.

Here are some details before we take a look at the gc log. The application is running in a 6.0 32 bit VM. Max heap configured to 1536mb which is about as big as your going to get in a 32 bit JVM. The Concurrent Mark Sweep collector was explicitly set. MaxPermSize was set to a healthy 256m. Now, lets look at the log record to see what it is telling us.

70314.666: [GC {Heap before gc invocations=1105:

 par new generation   total 72512K, used 72448K [0x85800000, 0x89ee0000, 0x8b800000)

  eden space 72448K, 100% used [0x85800000, 0x89ec0000, 0x89ec0000)

  from space 64K,   0% used [0x89ec0000, 0x89ec0000, 0x89ed0000)

  to   space 64K,   0% used [0x89ed0000, 0x89ed0000, 0x89ee0000)

 concurrent mark-sweep generation total 1085184K, used 740804K [0x8b800000, 0xcdbc0000, 0xe5800000)

 concurrent-mark-sweep perm gen total 262144K, used 138639K [0xe5800000, 0xf5800000, 0xf5800000)

70314.668: [ParNew: 72448K->0K(72512K), 0.0537868 secs] 813252K->756978K(1174720K)Heap after gc invocations=1106:

 par new generation   total 73664K, used 0K [0x85800000, 0x8a000000, 0x8b800000)

  eden space 73600K,   0% used [0x85800000, 0x85800000, 0x89fe0000)

  from space 64K,   0% used [0x89fe0000, 0x89fe0000, 0x89ff0000)

  to   space 64K,   0% used [0x89ff0000, 0x89ff0000, 0x8a000000)

 concurrent mark-sweep generation total 1102208K, used 756978K [0x8b800000, 0xcec60000, 0xe5800000)

 concurrent-mark-sweep perm gen total 262144K, used 138639K [0xe5800000, 0xf5800000, 0xf5800000)

}

, 0.0570178 secs]

70340.365: [Full GC {Heap before gc invocations=1106:

 par new generation   total 73664K, used 72414K [0x85800000, 0x8a000000, 0x8b800000)

  eden space 73600K,  98% used [0x85800000, 0x89eb7a60, 0x89fe0000)

  from space 64K,   0% used [0x89fe0000, 0x89fe0000, 0x89ff0000)

  to   space 64K,   0% used [0x89ff0000, 0x89ff0000, 0x8a000000)

 concurrent mark-sweep generation total 1102208K, used 756978K [0x8b800000, 0xcec60000, 0xe5800000)

 concurrent-mark-sweep perm gen total 262144K, used 139266K [0xe5800000, 0xf5800000, 0xf5800000)

70340.366: [CMS[Unloading class com.ibm.ws.webmsg.channel.component.WebMsgChannelInit]

[Unloading class sun.reflect.GeneratedMethodAccessor311]

[many other classes unloaded that I've removed] 

: 756978K->586452K(1102208K), 9.0385238 secs] 829392K->586452K(1175872K), [CMS Perm : 139266K->137180K(262144K)]Heap after gc invocations=1107:

 par new generation   total 73600K, used 0K [0x85800000, 0x89ff0000, 0x8b800000)

  eden space 73536K,   0% used [0x85800000, 0x85800000, 0x89fd0000)

  from space 64K,   0% used [0x89fd0000, 0x89fd0000, 0x89fe0000)

  to   space 64K,   0% used [0x89fe0000, 0x89fe0000, 0x89ff0000)

 concurrent mark-sweep generation total 1102208K, used 586452K [0x8b800000, 0xcec60000, 0xe5800000)

 concurrent-mark-sweep perm gen total 262144K, used 137180K [0xe5800000, 0xf5800000, 0xf5800000)

}

, 9.0417245 secs]  

The first entry indicates that a young generation GC was triggered at 70314.666 seconds after VM startup. It was triggered because eden was almost full (consuming 72448K of the 72512K bytes). Tenured space consumed 740804K of the 1085184K (~68%). Before the collection starts, the VM must estimate memory consumption of tenured at the end of GC. If the estimated values exceeds 70% then it is obliged to collect tenured space. Normally this calls for a full GC collection but the CMS collector only operates in tenured so it's technically not a full collection. There is one thing to note in this log entry and that is, both survivor spaces are empty. More on that later.

The next record describes the results of the young generational parallel collection. As expected, the collection empties eden. However it unexpectedly leaves the survivor spaces empty. Since the young generation collector can't remove objects from tenured space, the reported occupancy of 756978K represents what was there before GC started + everything that was promoted. Having empty survivor spaces at the end of the collection implied everything got promoted. If we subtract the before and after values we see that 16174K has survived the young gen collection. The log record reports survivor spaces to be 64K in size which is obviously a problem.

Normally promotion to tenured happens when an object has survived a prerequisite number of collections (default is either 4 or 15). If young gen doesn't have enough space to hold all survivors, it will make room by promoting the oldest objects  and keep promoting younger and younger objects until there is enough room to hold the rest. In this case the survivor space is way too small forcing promotion of everything directly from eden. This condition is known as a pre-mature promotion. 

Some premature promotion is generally ok. Frequent premature promotion is an indication of memory starvation. Though I'd need the entire GC log to confirm this, I suspect that just about every collection ends with a premature promotion. Since the premature promotion is most likely tied into the long GC pause, lets defer this discussion until after we've inspected the logs.

The logs tell us that tenured space is configured to 1102208K and is just shy of 69% occupancy (756978K). As previously mentioned, the default occupancy needed to trigger a collection of tenured is 70% which explains why CMS wasn't triggered. At 70340.365, the log indicates eden is 98% utilized and that triggered a collection to start. With tenured sitting at 69% occupied we would expect a CMS as expected survivors would certainly push above the 70% threshold. Instead of the CMS, the logs report a full parallel collection.

When objects are removed from tenured space they leave holes in the heap. Over time it is possible that heap will become so fragmented that it will be necessary to perform a compaction. Compaction is expensive and requires that the heap be locked. CMS only briefly locks the heap during phase 1 and phase 4 of the collection. It defers the task of compaction to the parallel collector. Before the parallel collector gets busy performing a mark, sweep and compaction, it must park all the application threads at a safe point. Application threads are only released when the collection is complete. This activity is responsible for the (9 second?) "stop-the-world" pause that we are now battling.

There is one more point to consider before offering a diagnosis. The name "garbage collection" suggests that more garbage is more costly to collect. In fact, the opposite it true. To see this, consider what the garbage collector is doing. The collector has to visit and analyze and quite possibly move and swizzle pointers to every live objects. As you can see, there isn't much mention of dead objects in that statement as they are pretty much ignored. While the number of dead objects in tenured space does have some effect, the overall cost of a collection is dominated by the number of live objects. What this GC log is telling us is that about half of tenured objects continue to survive collections. Having to dealing with all those survivors has some how kept the collector busy for 9 seconds.

Heap tuning is about creating the conditions where most objects die in a young generational space. This log is telling us that making survivor spaces larger could improve the situation. The problem is; the space needed to enlarge survivor spaces will come from eden. A smaller eden will result more frequent collections. This in turn will put more pressure on survivor spaces which could lead to continued frequent premature promotions. We need to find a way to increase the size of survivor spaces without affecting the size of eden.

One option is to make young gen larger. A larger young gen will allocate more memory to both survivor spaces and eden. However, this cannot happen without taking space away from tenured. Remember, tenured is already at 50%. Making it smaller will push that number closer to the 70% trigger threshold and we certainly want to avoid that. What we'd really like to do is make overall heap larger but we're stuck with the 32 bit limit so that's out of the question. There is one other observation to consider. I've seen applications run for weeks at a time without ever suffering from a full GC by configuring a lower (50%) initiating occupancy threshold. Having a lower threshold gives the allocators and collector more space in which to work through fragmentation. Again, having a resident set size of 50% pretty much precludes this option. 

What the last paragraph is suggesting is that this application simply requires more memory. But unfortunately that avenue is closed unless it can be moved to a 64 bit JVM. If moving to a 64 bit JVM is not an option, the only alternative is to reduce the number of objects in tenured space. This requires one to inventory what is consuming tenuring space and rework the application to keep less of it. Another path is to split the work between JVMs by clustering the application. Of course this only works if the data can be partitioned between the JVMs in the cluster. This application happens to be clustered so it might be possible to spin up more JVMs to distribute the data load.

Finally we can conclude that there isn't going to be a quick win for this application. It might be that the app is bound to the 32 bit JVM (which I suspect is the case). Inventorying tenured is time consuming and reducing object counts requires that the application be reworked. Each of these remedies will require you to convince others that what you're suggesting is the right direction to take. This is where the GC logs combined with in-depth analysis become very essential part of any performance analysis. It provides you with the evidence you need to say, here is the problem and here is a path we can take to solve it. And once developers understand what has gone wrong, it rarely takes long for them to fix it. My favorite GC flags are -Xloggc:gc.log, -XX:+PrintGCDetails, and -XX:+PrintTenuringDistribution. In this case the flag -XX:+PrintHeapAtGC was also set. One final flag to consider setting -XX:+PrintGCApplictionStoppedTime but this flag is typically only useful when debugging the collector it's self.

PS, I'm always interested in looking at more GC logs so if you got a few hanging about waiting to be analyzed, zip'em them up and send them to me. 

 

 

 

tags:                    




1. Paris Apostolopoulos left...
Sunday, 27 February 2011 2:39 pm :: http://javapapo.blogspot.com

Nice entry, Kirk many thanks!


2. Ron left...
Tuesday, 1 March 2011 2:44 am

Since you have so much headroom in permgen, I might steal a few 10's of megs from there and give it to old gen. Has -Xmx been titrated up as far as it will go?


3. Kirk Pepperdine left...
Tuesday, 1 March 2011 7:29 am

We really need the entire log to say for sure but IMHO, this app needs 100s of megs extra. One goal is to reduce tenured occupancy to about 30-40% so that we can lower the occupancy initiating threshold. We also need to make survivor spaces much much larger. But, it's a start.


4. dimarra left...
Thursday, 31 March 2011 1:39 am

Great article on memory sizing. I have a JVM with gc.log that I would like you to give feedback on. The behavior that we observer is that once the major GC happens in continues to happen at 18 per minute taking 16secs average. The oldspace has 400mb available of the 800mb allocated. let me know if you are interested in this. cheers.


5. Kirk left...
Thursday, 31 March 2011 1:46 am

Hi Dimarra,

I'm interested in the GC log. Send me an email and I'll be happy to talk to you about how to collect. If you visit www.kodewerk.com there is a contact form you can use


6. Ann left...
Wednesday, 15 February 2012 12:37 am

hi, Thank you for your post. We are recently having a strange GC issue in our application which is causing a 5 sec delay. Can you verify the logs and see if you can find something for us 56845.011: 1444142K->1427147K(9213888K), 0.0033540 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 56848.401: 1444171K->1427160K(9213888K), 0.0023169 secs] [Times: user=0.02 sys=0.01, real=0.00 secs] 56853.155: 1444184K->1427474K(9213888K), 0.0031038 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 56857.561: 1444498K->1427818K(9213888K), 0.0033794 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 56858.219: 1444842K->1427670K(9213888K), 0.0040157 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 56860.511: 1435686K->672047K(9213888K), [CMS Perm : 24500K->24376K(24704K)], 6.0298213 secs] 56945.727: 978735K->673217K(9539776K), 0.0078044 secs] [Times: user=0.10 sys=0.02, real=0.01 secs] 57021.302: 979905K->673974K(9539776K), 0.0090295 secs] [Times: user=0.13 sys=0.00, real=0.01 secs] 57119.477: 980662K->674796K(9539776K), 0.0095918 secs] [Times: user=0.13 sys=0.00, real=0.01 secs] 57297.653: 981484K->675588K(9539776K), 0.0117952 secs] [Times: user=0.17 sys=0.00, real=0.01 secs]


7. Kirk Pepperdine left...
Wednesday, 15 February 2012 4:57 am

Hi Ann,

It's very difficult to tell from this small log fragment that also seems to be very malformed. One issue I see is Perm space is almost full even after the CMS collection. The rest is pure speculation with the logs in this adulterated condition but here it goes. I cannot see a good reason why GC is being called in the first place. The pace of GC seems to suggest that the application isn't very busy and heap is no where full and attempts to collect are generally unsuccessful. It all suggests that the VM hasn't made the decision to call for a GC but I don't see indications that RMI or a developer has made that call either. So, I'm going to stick with my original guess that perm space is too small and that is what is causing the difficulties. Make it bigger and let us know how it works out.