Tom Deneau, Sr. Member of Technical Staff 5/6/2010
In this article, we will look at the effect of enabling the attachment of a debugger on Java application performance. Specifically, we will look at a piece of the Java Platform Debugger Architecture called the JDWP agent. We will see how this agent may cause unexpected negative effects on performance on some Java Virtual Machines (JVMs), even when we are not actually debugging. We will take a deep dive to explain this performance slowdown and see how some changes to the JVM could alleviate this problem. We will also show how we applied these changes to improve the performance of the Hotspot JVM in the OpenJDK when it is started with debugging enabled but the debugger has not yet attached.
The specific investigations in this article used the Hotspot JVM, but some of these issues may apply to other JVM implementations.
At AMD, our Java performance team was looking at some Java web-based workloads. In general, we like to try to get the CPU utilization as high as possible, since we want to concentrate on CPU performance issues rather than database or network bottlenecks. For one workload in particular, which ran the JIRA issue tracking server, we were having trouble getting good CPU utilization. When we looked more closely at the JVM options, we saw we were using
which, as we’ll see below, was used in case we wanted to attach a debugger to the JVM at some later time. Note that we weren’t actually attaching a debugger when we were measuring performance, so we had assumed this agentlib option would be performance-neutral in this usage scenario. When we removed this option, both the CPU utilization and the performance on this workload (measured in requests handled per second) improved dramatically. We measured the effect of running with and without the –agentlib:jdwp option on a few other workloads, generating these results:
|Application||Relative performance with the JDWP agent compared to without|
|Internal XPlanner Read Workload||65%|
|Internal JIRA Browse workload||17%|
Note: these benchmarks were measured on recent AMD processor-based systems and use a recent Hotspot JVM (JDK6 Update 14P). In this table, we’re not concerned with absolute scores, just the relative impact of using the JDWP agent.
Clearly this –agentlib:jdwp option does not have a zero effect on performance, at least for some workloads. Let’s try to understand what’s going on here.
The Java Platform Debugger Architecture
The Java Platform Debugger Architecture (JPDA) is a multi-tiered debugging architecture that allows tools developers to create portable debugger applications. The JPDA is used by all Java debugging tools, from command-line debuggers like jdb to GUI-based debuggers like Eclipse. A good description of the JPDA can be found at http://java.sun.com/j2se/1.5.0/docs/guide/jpda/architecture.html. The following diagrams are borrowed from there.
How a debugger like jdb fits into JPDA
Let’s look at the jdb command-line debugger shipped with the JDK to see how it fits into this JPDA (GUI-based debuggers fit into JPDA in a similar way). From the jdb documentation, we see that jdb can either
- Launch a new JVM from the jdb command line with a specified main class of the application to be debugged, and stop the VM before executing that class’s first instruction, or
- Attach to a JVM that is already running.
Using jdb to launch a new JVM
The “launch a new JVM” mode is perhaps the simplest, let’s try it first. We invoke
then look at whether any new processes got spawned besides the jdb process. On Linux, for example, we might use ps auxww so we can also see all the command-line options for each process. Anyway, at this point we don’t see any new processes because the JVM has not really started yet. Let’s set a breakpoint and execute up to it:
jdb should respond that we are now at the breakpoint. If we look at the processes, we see one with a command line something like the following:
java -Xdebug -Xrunjdwp:transport=dt_socket,address=myHost:50121,suspend=y MyClass
It makes sense that a Java process got launched with MyClass as the entry point, but those other flags must be things to help the debugger talk to the JVM. Looking up the meaning of –Xrun, we see that it is the old syntax for loading a Java VM Tools Interface (JVMTI) agent to a JVM, equivalent to the –agentlib option that started with JDK5. The syntax for –Xrun is -Xrunagentname:options (the name of the agent comes immediately after -Xrun) so this command is invoking a JVMTI agent called JDWP. A JVMTI agent is just a shared library with some required entry points, so this would be a .so file in the Linux OS or a .dll file in the Windows® OS. The options specify how the JDWP agent will listen for debugger connections, in our case using a specified socket.
Using jdb to attach to a running JVM
What about the second way of using jdb, that is, attaching to a running JVM?Does the JDWP agent get used in that case? Yes. If you look at the jdb documentation you will see that, in the attach case, you are required to start the JVM itself with the -agentlib:jdwp option followed by the proper JDWP options to allow you to connect to it. For example, the following command will run the MyClass application, and allow jdb to connect to it at a later time.
% java -agentlib:jdwp=transport=dt_socket,address=8000,
(Note that the syntax is slightly different from the older –Xrun option, but the result is the same). You can then attach jdb to the JVM with the following command:
% jdb –attach 8000
You can also confirm that if you do not start the JVM with the -agentlib:jdwp command, you won’t be able to connect jdb to it.
So we see that, in both cases, the JVM got started with the JDWP agent. And we also see from the “attach to a running JVM” case that, if we want to start a Java application with the possibility of attaching a debugger later, we must start the JVM with the JDWP agent. In those cases when we need to start a JVM with the JDWP agent and possibly run for days before attaching a debugger, it could be important that the JDWP agent not negatively impact the overall performance.
So the JDWP agent fills the role of the “back-end” in the JDPA diagram. It’s basically a reference implementation for the JDWP protocol. It listens for JDWP commands from debuggers across some communications channel, and communicates with the JVM using the JVMTI interface. Since we’re concerned with any effects the JDWP agent may have on JVM performance, we’ll be looking at the JDWP agent’s use of the JVMTI interface.
A brief look at JVMTI
From our earlier diagram, we saw that the JDWP agent uses the JVMTI to talk to the JVM. This interface was introduced in JDK5 and replaces the JVMPI and JVMDI interfaces from earlier Java versions. A good reference for JVMTI can be found at http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
In the JVMTI spec, you can see that one major way agents find out what’s happening in the JVM is by the two-step process of enabling Events using SetEventNotificationMode and enabling Callback Functions using SetEventCallbacks. SetEventNotificationMode can be used either to enable or disable specific events; enabling or disabling can be either global or specific to a single thread.
Another important JVMTI concept is that of capabilities, used by the API functions AddCapabilities, GetPotentialCapabilities, RelenquishCapabilities, etc. The relation between capabilities and event notification is summarized in the following paragraph from the JVMTI spec:
The “Capabilities” section of each function and event describe which capabilities, if any, they are associated with. “Required Functionality” means it is available for use and no capabilities must be added to use it. “Optional Functionality” means the agent must possess the capability before it can be used. To possess a capability, the agent must add the capability.
So we see that, at least for certain events, before we can enable that event notification, we may need to add a relevant capability. Note that while event notification can be thread-specific or global, capabilities are always global.
Some capabilities also have restrictions on when they can be added. For example, some may only be added in the OnLoad (initialization) phase. One reason capabilities might be restricted to OnLoad phase is they might impact the way bytecodes are compiled into JIT code. For example, adding the capability can_generate_method_entry_events may require generating some additional code at the beginning of a compiled method to check whether the method entry event notification is currently enabled. In general, the spec does not impose which capabilities a JVM must support or what restrictions the JVM may make as to when they can be added.
Here’s another relevant note from the JVMTI spec regarding performance overhead:
Note that the overhead of using a capability is completely different than the overhead of possessing a capability. Take single stepping as an example. When single stepping is on (that is, when the event is enabled and thus actively sending events)[,] the overhead of sending and processing an event on each instruction is huge in any implementation. However, the overhead of simply possessing the capability may be small or large, depending on the implementation.
The JVMTI spec makes no requirements as to the overhead of either using or possessing any capability.
The fact that a particular JVM implementation is allowed to restrict some capabilities to the OnLoad phase is nice and flexible for the JVM, but has implications on the agent. If an agent needs a capability, it may be required to add it at OnLoad time, because the JVM may not allow it to be entered later. This is true even if the actual events associated with the capability may not be enabled until later. We’ll see an example of this with the JDWP agent.
How the JDWP agent uses JVMTI
Since we’ve seen that running the JVM with the JDWP agent can degrade the performance of some workloads, it would be good to see what kind of calls are going across the JVMTI interface between the JDWP agent and the JVM.
One way to look at the actual commands going across the JVMTI interface is to use Hotspot’s TraceJVMTI command. There’s not much documentation about this option; you’ll have to look at the Hotspot sources. But one useful way of using it can be to trace all SetEventNotificationMode function calls, which can be done by
which details input and output and errors.
You can even substitute the word “all” for SetEventNotificationMode to see all functions, but you might get overwhelmed with data. It would also be nice to see what capabilities are being added. Unfortunately, in the current Trace JVMTI implementation, tracing capabilities functions does not show the individual capability’s bits. So, for this, we’ll look at the JDWP agent sources, which can be found in the directory jdk/src/share/back (“back” here refers to debugger back end in the preceding JPDA diagram).
As we expected, the JDWP agent has some required capabilities and, since it might not be able to get them later (as already mentioned), it asks for them at OnLoad time. This is all in the file debugInit.c. The required capabilities it asks for are:
There are also some “nice to have” capabilities that get added only if the JVM supports them (the agent finds out by calling GetPotentialCapabilities), but — for now — we’ll concern ourselves with the required capabilities.
If we get past the capabilities setting, we will then see some calls to enable a few event notifications. The events VM_INIT and VM_DEATH are always enabled. The EXCEPTION event is conditionally enabled. You might ask why we would ever enable the EXCEPTION event at this point, since the debugger has not even attached yet. In the JDWP agent options, you’ll notice there are the following two options:
stop VM (and wait for debugger attach) on throw of specified exception
stop VM (and wait for debugger attach) on any uncaught exception
So the agent enables EXCEPTION events at OnLoad time only if one of these two options has been set. In the normal case, when we want the JVM to run freely until we attach a debugger, we would not set these options. This is the case for which we will examine the performance.
If we ever do attach a debugger, the JDWP agent will enable different event notifications. If you’re interested, I recommend you try the
flag to see what events do get asked for when the debugger is attached. Here, though, we’re concerned with the performance impact when the JDWP agent is attached but the debugger is not attached, so we won’t get into those details.
Java6 and Attach on Demand
Before we look at the performance of the JVM with JDWP attached, let’s explore a feature that was added with JDK6 called “attach on demand” (see API here) and see if that would help us. This API allows an agent to be attached to a running JVM, which means the agent doesn’t have to be on the JVM’s command line at startup. If starting the JVM with the agent causes a performance problem, why don’t we just use “attach on demand” and attach the agent later — for instance, just before we want to attach the debugger?
It’s pretty easy to write a little utility to try this (for some reason the JDK doesn’t ship with such a utility). When we run this utility to attach the JDWP agent to an already running JVM, we see we get the following error:
AgentInitializationException returned Failed to load agent library
Reading the “attach on demand” documentation more closely, we see that — if we want to attach a JVMTI agent using this API — the agent must have an entry point called Agent_OnAttach (JVMTI agents that get specified on the JVM command line get called at their Agent_OnLoad entry point). Looking at the JDWP source, we see it does not even have an Agent_OnAttach entry point.
This seems like a big flaw. Why doesn’t the JDWP agent support this new functionality? If we were industrious enough to modify the JDWP agent to have an Agent_OnAttach entry point (and make it do the same things that Agent_OnLoad does), we would then get an error when we try to add our required capabilities at Agent_OnAttach time. The reason is, for many of these capabilities, the JVM only allows them to be set at JVM Initialization time. So, while there may be some useful agents that can be attached on demand, it’s clear that the JDWP agent is not one of them.
Note that we would hit the same capabilities obstacle if we just continued to attach the JDWP agent at JVM startup as we do today, but modified the JDWP agent so it would not actually add its capabilities until later — say, when the debugger attaches. Again, we are limited by the fact that these capabilities must be added at JVM initialization time.
Sidebar Note: An example of a useful JVMTI agent that could be attached later would be an agent that tracks Compiled Method Load events. The capability for this is not restricted to be added at init time. In addition, such a late-attaching agent could use the interesting JVMTI API called GenerateEvents (which, by the way, only works for CompiledMethodLoad events). This API allows an agent to “catch up” for all the events it may have missed up to that point. As the JVMTI documentation states, “This function is useful when events may have been missed due to the agent attaching after program execution begins.This function generates the missed events.”
Profiling with and without the JDWP agent
We collected CodeAnalyst profiles running the same JIRA workload, with and without the JDWP agent. CodeAnalyst is AMD’s time- and event-based profiler, available free at http://developer.amd.com/cpu/CodeAnalyst. Here are some key findings from the profiles (collected on a Windows Server 2008 OS)
|Without JDWP||With JDWP|
|Java.exe process, % of overall system||60 %||38%|
Java.exe process, % Time spent in JITted code
Java.exe process, % Time spent in jvm.dll
Within the JITted code, nothing stood out as different. Within the jvm.dll code, in the non-JDWP run, the top function was java_lang_Throwable::fill_in_stack_trace, with almost 3% of the overall samples (more about this later). On the JDWP run, this function was less than 1%. The highest jvm.dll function with the JDWP agent was ObjectSynchronizer::deflate_idle_monitors.
Nothing becomes too clear from this data, except that we are spending more time in non-JITted code and there must be some lock contention problem because the system is idle more often. This agrees with the lower CPU utilization we mentioned at the beginning of this article.
Information from thread stack traces
Most people are aware of the JDK’s jstack utility, which can dump the stack of all the threads running in a JVM. (On some platforms, similar information can be gotten using the Ctrl-Break key or the SIGQUIT signal). Such Java-based stack traces didn’t give us enough information in this situation. That’s because a thread’s real stack can have both Java methods and JVM functions (the JVM is written in C++). Jstack will only show the Java methods.
Instead, we used the Linux’s GNU debugger (gdb) utility to get some stack trace snapshots by stopping the JVM during heavy use, then using something like
thread apply all info stack
From the CodeAnalyst data, we could see, in the cases when the JDWP agent was connected, many of the threads were in a stack of JVM functions called from their Java stack methods. A typical stack frame was
Monitor::IWait () Monitor::wait () VMThread::execute () OptoRuntime::deoptimize_caller_frame () OptoRuntime::handle_exception_C_helper () OptoRuntime::handle_exception_C () . . . java methods after this
The frequency of deoptimize_caller_frame pointed to lots of switching from compiled frames to interpreted frames. The monitor calls indicated there might even be some shared lock access through which all of these threads were going. This would explain the reduced CPU utilization we saw on the JDWP run. In addition, the OptoRuntime functions shown above — while they appeared in many thread stack traces — did not show up much in the timer samples from CodeAnalyst.
Other things we knew about the JIRA workload
We already knew from other analysis that our JIRA workload made pretty heavy use of exceptions. For example, many exceptions were thrown from java.util.ResourceBundle and caught at various levels of JIRA or the packages it uses like Apache Velocity. In a normal run of JIRA (without the JDWP agent), we would easily see 3% of the total system samples ending up in java_lang_Throwable::fill_in_stack_trace.
We also knew that Xplanner used some exceptions but not as much as JIRA, and that SPECjjb2005 did not really use exceptions. Perhaps code with exceptions is particularly sensitive to the JDWP agent. To test this theory, we wrote a micro-benchmark that did nothing but throw and catch exceptions. Sure enough — with the JDWP agent attached, this micro-benchmark suffered even more than JIRA. The ratio of “without JDWP” to “with JDWP” was about 10-1. As an extreme case, if one coded this micro-benchmark to keep throwing a static exception rather than creating a new exception on each iteration, the ratio of “without JDWP” to “with JDWP” was about 3000-1.
How JVMTI capabilities can affect code generation in general
If we look back at the startup of the JDWP agent, we see that one of the capabilities requested at OnLoad time was can_generate_exception_events. And we see in the JVMTI spec that this capability is required if we ever want to get exception events. But we also saw that the actual exception event notification was disabled in the normal way of running the JDWP agent, at least until the debugger attached.
So the capability can_generate_exception_events seems to be enough to degrade the performance of exception-throwing code. We can confirm this by writing our own simple JVMTI agent that sets only this single capability and does not enable any event notification. Such an agent results in a similar slowdown for our microbenchmark.
Let’s explore some of the ways a JVM’s code generation could be affected by a capability like can_generate_exception_events.
In our discussion of JVMTI events and capabilities, we used the capability can_generate_method_entry_events as an example and noted that possessing this capability might have some effect on JITted code generation because the capability at least enables the possibility that the method entry event will need to be generated. Similarly, the capability can_generate_exception_events at least enables the possibility that the exception event will need to be generated.
Let’s look at some of the different strategies by which the can_generate_exception_events capability could affect a JVM’s code generation. We’ll assume that if, at runtime, the exception event notification really is enabled for the current thread, then sending the event would be complicated enough that it would not be done in the compiled code. Instead, the JVM would fall back to an “interpreted” state.
- When the capability is set, the JVM could refuse to compile any method that could possibly throw an exception. When the method gets handled in the interpreter, if it takes it a code path along which it actually throws an exception, the interpreter could check whether the exception event notification is enabled for the current thread, and — if so — could do the event notification. Using this strategy, methods that might throw exceptions would always run interpreted rather than compiled.
- When the capability is set, a new method that might throw an exception could still be compiled, but any code path of the method that throws an exception could have a special trap compiled that would always fall back to the interpreter. The interpreter could then do the actual exception throw and also check whether the exception event notification is enabled for the current thread; if so, it could do the event notification. The advantage over the previous strategy would be that methods with a lot of non-exception paths, or that didn’t take the exception paths very often, would still execute as compiled code.
- When the capability is set, a new method that might throw an exception could still be compiled, but any code path of the method that throws an exception could have a special runtime call compiled to check if the current thread needs exception event notification. (We assume that checking whether exception event notification is enabled for this thread is simple enough that it can be done in a compiler helper call without falling back to the interpreter.) If event notification is not enabled for this thread (or not enabled globally), exception handling could proceed in the compiled code. If exception handling is enabled for this thread, we would fall back to the interpreter, which would do the actual exception throw and also do the event notification. In this strategy, we are avoiding the fallback to the interpreter unless the event really needs to be sent.
- When the capability is set, and it is time to compile a new method, we could check whether exception event notification is enabled. If it is not enabled at all (for example, the JDWP agent is attached but a debugger is not attached), we could compile the method as if the capability were not set (i.e., we would handle the exception throw in the compiled code without even checking any of the exception event notification state). Later, if the exception event notification gets enabled anywhere, the JVM would have to scan for all compiled events that could possibly throw an exception, invalidate them, and either recompile them using one of the previous three strategies or interpret them (Strategy 1). This strategy incurs no overhead until an exception event is enabled, but at that point a lot of methods get recompiled. Another advantage of this strategy is that the can_generate_exception_events capability doesn’t really need to be added at OnLoad time. In this strategy the capability by itself doesn’t affect code generation, so it could be added just before the exception event notification is enabled.
How Hotspot handles can_generate_exception_events
When we examined the Hotspot sources, we could see it used Strategy 2. This means that, when the can_generate_exception_events capability is added, each exception throw from a compiled stack frame could incur the overhead of de-optimizing to an interpreted frame that would handle the exception throw and then not have to actually do the exception event notification. In workloads in which the exceptions are not that “exceptional,” this de-optimizing burden could be quite noticeable.
Our Java performance team worked with the OpenJDK to change the way Hotspot dealt with can_generate_exception_events to a strategy more like Strategy 3. Preliminary code changes and measurements show that Strategy 3 can erase almost all the JDWP agent deficit we saw on the JIRA and XPlanner workloads. This chart shows some before/after ratios:
|Application||Relative performance withJDWP agent compared to without|
|Internal JIRA browse workload||
|Internal XPlanner read workload||
We’ve shown how the JDWP agent fits into the JPDA, how it communicates with the JVM using JVMTI, and how — in the particular case of JVMTI’s can_generate_exception_events capability — that interaction with the JVM can cause a performance degradation. We’ve also explored ways to change the JVM so the JDWP agent does not cause performance degradation, and showed how we applied those techniques to the Hotspot JVM in the OpenJDK.
We’d be interested in hearing of actual cases where people were affected by this performance issue. Please comment via our blog about this article.