Java Performance Services
Training, Seminars, Benchmarking, Tuning

Java Performance Tuning Course


Chania Crete
May 22rd-25th 2012

Sydney Australia
Feb 6-9, 2012

Montreal Canada
Feb 27-March 2, 2012

San Francisco CA
March 20-23 2012

Paris France
March 27-March 2, 2012

Request an in-house seminar

Anti-if

I have joined Anti-IF Campaign

Calendar

««May 2012»»
SMTWTFS
   12345
6789101112
13141516171819
20212223242526
2728293031

Performance Anti-Patterns

My Top Tags

                                       

Mailing List

My RSS Feeds








Using Wordle as a Profiler

posted Monday, 9 January 2012

I mentioned the idea to use Wordle as an execution profiler while presenting the profiling section of my performance tuning course in Paris last December. The idea was seeded by presentation that Neal Ford did a few years ago in which he used Wordle to expose the vocabulary of a Java application. Neal took a pile of source code and fed it into Wordle. The result was a word cloud that visualized the dominate words in the source code. What I was interested in was exploring how I could use Wordle to visualize an applications dominate activity (i.e., the bottleneck).

 In profiling we setup a mechanism to repeatedly ask, “What are you doing now? What are you doing now, What are you doing now?” . Profilers aggregate and count the answers to this question. It is the treatment of the answers that allows us to identify hotspots (dominate activities) in our application. Wordle aggregates and counts words. It uses those counts to build a word cloud. So it seems as though it should work as long as we feed is the right set of words. In this case, the words should come from the answers to the repeated question, “What are you doing now?”. To answer that question, we can use thread dumps.

 At this point you should be thinking, “Thread dumps to profile? Are you daft? The impact on performance will be horrendous!”. I completely agree, done haphazardly, continuously taking thread dumps from a running application would inflict a huge performance penalty on that application. But what if we didn’t need all that data to figure out what was going on? Instead, what if we switched to “slow profiling” where we took a thread dump every 5 minutes minimizing it’s impact on performance? I recently used this idea to create a Java agent calledConditionalThreadDumper which I then launched into a production system. The impact on performance was horrendously unnoticeable.

 The ConditionalThreadDumper polled the system on a regular basis looking to see if it was in a particular (bad) condition. For example, we might look at CPU utilization and another performance metric to see if they have exceeded some level of utilization. If so, it would turn on a mechanism to created a thread dump every x minutes. If the system worked it’s way out of the condition, the thread dumper would be turned off. Launching in this agent into production resulted in several thousand data points over a period of a couple of days. Not a lot when you consider an execution profiler can collect that many samples in a few minutes. However, this was running in production and it was only running when the system was in trouble. So while there wasn’t a lot of data, it was significantly more relevant to the problem at hand, as in aggregate, these thread dumps tell a story of the dominate activity was in the badly behaving application.

 Blasting thousands of raw thread dumps into Wordle simply won’t work so I hacked together a scanner that harvested stack traces of interest from the thread dumps. What I was looking for were application threads (in this case Tomcat threads, which are easily identifiable). After collecting and cleaning all of the stack trace entries for the Tomcat threads I ended up with more than 350,000 lines of data. The user interface for Wordle made inserting all of this data impractical so I modified the code to aggregate and count identical entries. This reduced that dataset to just under 90,000 lines.

I fed the top 100 or so items into Wordle and it came back with a clear but useless answer. The dominate activity was java.lang.Object.wait. Since I already knew that the thread pool was oversized it wasn’t a surprise to see that the dominate activity were threads blocked on a lock (see Figure 1).

  • java.lang.Object.wait:91295
  • java.lang.Thread.run:46578
  • org.apache.tomcat.util.net.AprEndpoint$Worker.run:45174
  • org.apache.tomcat.util.net.AprEndpoint$Worker.await:44589
  • waiting on (a org.apache.tomcat.util.net.AprEndpoint$Worker):44586
  • javax.servlet.http.HttpServlet.service:1137

 Figure 1. Dominating Activities

 

I filtred java.lang.Object.wait and five other related and uninteresting entries from the data and then fed the top 100 or so back into Wordle. Not surprisingly, the output from Wordle demonstrated what we knew to be true from initial work with "real" performance tooling. The dominate activity was interactions with the JDO product, Kodo (kodo.kernel.BrokerImpl.find) r. Not far behind, was a proxy using reflection (java.lang.reflect.Method.invoke). This is quite visible in Figure 2. 

thread dump word cloud 

Figure 2. Wordle output showing dominate activities

 

 

So there you have it, how to use Wordle as a wall clock time, method exclusive execution profiler.

 

 If you are interested in learning more about performance tuning using more traditional tooling and techniques, please join me in one of my open enrollment performance tuning seminars. I have a few scheduled for both Europe and North America. I'm also very excited to announce that I'm working with Exceed in Australia. I'll be there for two weeks. My first will be presenint the workshop at an inhouse offering. The second week, Feb 6th is for my first full open enrollment performance tuning workshop down under. This will also be a rare opportunity for me to meetup with people in Oz so if you want to meet up, drop me a line.

tags:                




1. Yeroc left...
Monday, 9 January 2012 8:17 pm

Profiling by grabbing periodic stack traces reminds me of "poor man's profiler" (http://poormansprofiler.org/) and a related blog post for Java (http://blog.tsunanet.net/2010/08/jpmp-javas-poor-mans-profiler.html). Automatically capture the stacktraces when the system is experiencing cpu/memory issues is a nice touch.


2. Daniel Trebbien left...
Wednesday, 11 January 2012 12:59 pm

Are you planning on making your ConditionalThreadDumper open source? It seems like an interesting tool.


3. Kirk Pepperdine left...
Wednesday, 11 January 2012 2:54 pm

I hadn't thought of open sourcing the conditional thread dumper. I've open sourced two projects over Christmas of which jPCM is going to take a wee bit of effort to get an alpha release out. Right now the stumbling block is having the C++ code work on Mac as well as Linux and Windows. I've got two really smart guys adjusting the C++ code and once that's done things should get very interesting