I had a number of titles for this blog entry. My first though for a title was; "Has your application swallowed an Elephant?". As much as I liked it, it obscured the point of the blog, how the concurrent collector may fail to deliver a clean heap or allow your application to progress while it is running.
CMS Overview
To understand how CMS fails we should first take a look at how it is suppose to work. It isn't necessary that one understand the exact details as to how the collector functions in order to understand how it is failing. These details are very complex so I like to offer a simpler explination that abstracts away the complexity. We are all familiar with how we can affect object references in code by using "=". String s = new String(); gives us a string object and that in turn binds the object into either a local or instance context. It is this reference that keeps an object live in memory. Once the context in which it is bound goes out of scope or we execute s = anotherObject, the object is said to unreachable and the collector is free to reclaim memory. The activity in which memory is reclaimed is known as mark and sweep. In order for mark and sweep to work, all objects must be seperately registered in another structure. Since this structure will be mutated by potentially many threads, it must be implemented in a thread safe manner. And when the garbage collector runs, it must have exclusive access to this structure. Because of this, the collector will pause our mutating application threads while it works with an mutates this structure. This commonly known as the "stop-the-world" behavior or gc pause. Since many applications are sensitive to pausing, the question was; can we somehow find a way to let the application threads run along side the collectors threads. One answer to that question is, the Concurrent Mark and Sweep (CMS) collector.
The whole premise behind the CMS collector is that it minimizes this stop the world behavior. It does this by making a copy of the special structure that all objects are registered in. Ok, it actually is more clever than this but the point is, the collector threads have a stable view of memory that they can work against and mutate at will without interfering with the running of the application. We only need a very short pause at the beginning of the colletion to get this view. CMS threads are regulated so that they will only consume 50% of the CPU (they yield a lot). That value is configurable though I wouldn't recommend it is something that one should be adjusting unless you had a very very good reason to do so. At the end of the collection, the collector will re-pause application threads and reconcile the differences between what they were doing and what the application was doing. You may see this in the logs as a "dirty card rescan".
As you can imagine, this process takes a wee bit more CPU. However the tradeoff can be well worth the benefits of reduced pause time. That said, there are a number of situations where either CMS fails, or CMS can rub sore points in your application the wrong way. This is where the fun starts.
Hurry up and wait
One thing I rant about is, the user experience is king which translates to, if taking some action makes the user experience worse but increasing response times to their activities, ignore the advice you've been given even if you've found it in the WebLogic or WebSphere documentation or its coming from "expert"'s blog site (mine included) and roll back the change. CMS has always been known to be the collector of choice but that said, it really does work in that it allows your application threads to progress much faster (wall clock faster). This can lead to a situation I call "hurry up and wait".
In hurry up and wait your theads are progessing much quicker than they normally would be towards that next bottleneck in your application. Lets consider an idea I call "conribution to response time". Behind this idea is queuing theory and since people find maths so exciting I'll just handwave the explination. Just about every high level request will be broken down into a series of smaller requests. With each of these smaller requests we can the potential to run into a queue (most often it will be implicitly). The response time that a user experiences will be the sum of the response times for all the queues that the request passed through. To be clear, reponse time components of any queue will most likely be non-linear and if that sub-request is hitting up against a real resource boundry, response times can go vertical very quickly.
Yet another consequnce of queues, one we experience just about every day in our lives, are lines. If we walk into our bank, local shop or onto a plane or a train, we are introducing latency into our lives. And we all know that the longer the line, the more latency we are going to feel. Computers are no different. How does this fit with CMS failure? Quite simply put, using a CMS collector allows our threads to make forward progress much faster and that can lead to situations where theads spend more time in these sub-system queues than they normally would. If these sub-systems are already stressed, the extra threads coming at them faster than they would had the collector haulted them for a full stop-the-world collection, can cause their response times to go vertical. If this causes their contribution to response time to become larger than the full stop-the-worlds contribution to response time, using the CMS collector will cause your users experience to go south (and I don't mean that in a good way where you get to spend winter on the beach).
Before you decide to use the CMS collector, you need to understand where the bottlenecks are in your system. You also need to understand if CMS in allowing your thread to progress, will allow them to put enough pressure on the bottleneck to the point where using CMS is now a detriment. Quite often the only real way to understand this is to bench your application and hope the bench results hold up when you apply the conclusions in production.
Holes in Memory
CMS is an old space collector and while young generational collectors compact accidentially (by evacuating the space to anoher space), CMS must use another stratigy. Copying objects is expensive so CMS avoid compaction. However object need to be allocate in a contiguous space and if you heap is highly fragmented, it is possible that there won't be a space big enough. In this case CMS will be forced to compact. Currently that compaction can only happen when all the application threads have been stopped. CMS compaction times can be significant. Its most likely to happen when old space is heavily littered with live objects. It's got to move all of these objects as well as swizzle pointers so that all of the references are correct at the end of compaction. This activity can be devisating for any application that is sensitive to gc pause times.
To demonsrate the effects of this particular failure all you need to do is write an application that creates both large and small objects and then holds onto these objects long enough so they "leak" into old space. If you bias your release pattern to the small objects, heap fragmentation will force CMS to compact. If your real application follows this general heap usage pattern, think long lived large collections with short lived small objects, you may want to look at how you might want to move to a different usage pattern.
Leaks from Perm Space
As if there weren't enough ways to create memory leaks, with CMS we now really have to worry about how we use perm space. Perm space is that mysterious little box of memory that often seen in diagrams of Java heap but no one really knows whats in there. One of the things that end up in perm space are classloaders. The relationship and hence references that exist between classloaders, classes, and instances of a class are some what complex. Needless to say, if your application is creating and using specialized classloaders, you need to be aware that even if you cut all ties to them, CMS may not be able to recover them.
To be honest, I don't know the exact reasons behind this behavior. What I do know is that it can not only leave perm space short of memory (which can cause an OOME to be thrown), it will also leave old space looking like it's swallowed an elephant. If you are using default settings, CMS will be triggered at 70% occupancy. If your application loads and then releases a lot of classes (think on the fly JSP compulations or synthetic method construction (I'll confirm this with the JRuby guys)), the results of this activity will exist as long as the classloader is in memory. You may have released the classloader but it is also being referenced by other things. The upshot here is; unless you have class unloading enabled, all of the classloader activity will fill up old space and you'll eventually start thrashing on GC. Prior to the 1.6, you must also specify that you want CMS to collect perm space.
Note that this does not happen with the regular collector. So the big clue that you are suffering from a CMS failure in this case is, same application, same usage patters, same JRE and so on so that the only difference is in the collector and CMS leaves heap full where as the regular pause collector doesn't. As a side note I often look for this comparable to help me eliminate potential causes for performance problems.
Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:<somefilename>. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return.
>the exact reasons behind this behavior...
Kirk I know you are a fan of HP's JTune, but regarding GC logs, if you
specify -XX:+PrintGCTimeStamps and -XX:+PrintGCDetails on the command line
when starting the JVM, the resulting log will contain data that can be
interpreted by GChisto (https://gchisto.dev.java.net/), a nice tool from
Tony Printezis, the very smart guy behind the new G1 collector.