Java applications are not always the easiest things to optimize. You have several
factors influencing the performance of the app, from the code itself to how
long the app has been running, which often affects the efficiency of just-in-time
compilation (more about that later). Fortunately, you do have a few techniques
at your disposal for optimizing Java apps. And with AMD CodeAnalyst,
you have a few options for analyzing the performance of your apps, as well.
This article will walk through a sample test harness that
demonstrates how CodeAnalyst can profile your Java app. You'll
learn how to set up a CodeAnalyst project for the app and get
an idea of what to watch for in your own apps.
A Sample Java Application
To
demonstrate how CodeAnalyst profiles Java apps, it'll help to
have a test harness with clear hotspots that show up in the
profile. The following code sample contains some common coding
techniques that should give you a clear idea of what to look
for and how to begin setting up a CodeAnalyst project for your
own app.
Listing
1 has several quirks worth noting. The main()
method calls the loop_wrapper() method, which in
turn calls the critical_section() method. Notice
that loop_wrapper() sets up the most important
part of the demonstration. But without the
println statements in main(), the
dead-code eliminator might optimize
loop_wrapper() out of the picture since the
result isn't otherwise used.
Most importantly, the three-tiered approach gives the VM a
chance to find the most commonly executed code. On most VMs,
just-in-time compilation allows methods to be recompiled
progressively more efficiently. But in general, a method can't
be recompiled while said method is still running. By entering
and exiting a section of critical code, in this case
critical_section(), which consumes almost all of
the processing because of the division operation, the VM
begins to learn which code is most important. It can then
improve the recompilation of that section of code.
As the VM finishes training, it reaches steady state, at
which point compilation has been optimized and you can measure
true performance of the app using CodeAnalyst. If you try to
profile the app too soon, the profile could be off because of
any of the following reasons:
- The profiler will be measuring one-time initialization
code.
- As the VM works its way through different optimization
levels, the profiler could register duplicates of the
method, which are actually the same method optimized at
lower levels of efficiency.
This sample gives you the chance to specify two parameters.
The first sets the number of iterations to be used for
training the VM. The second sets the number of iterations to
be used for measuring it, so you can view its performance in
CodeAnalyst. As you run the sample, remember of the amount of
time it takes to run both sets of iterations. When you set up
your profile, you'll be able to specify how long to wait
before CodeAnalyst starts watching. This way, you can allow
the VM to reach steady state without any interference, and
then get a more accurate assessment of your code's
performance.
Use something like the following to run the code:
java DevX_CA_Example 4000 40000
Your
results will look similar to the following, but probably with
wildly different numbers:
2891ms Finished training
30594ms Finished final run
Result was 0
Setting Up CodeAnalyst
Before
setting up the CodeAnalyst project, you'll need to create a
batch file that it can use to run the app. This one-line batch
uses the same command that you just used to test the app, but
with an additional option that's built based on your
architecture:
- If your JDK is 1.4.x or higher, use the PI agent and the
following switch:
-XrunCAJVMPIAxx
- If your JDK is 1.5.x or higher and you're using the BEA
or Sun VM, use the TI agent and the following switch:
-XrunCAJVMTIAxx
- If your JDK is 1.5.x or higher and you're using the IBM
VM, use the TI agent and the following switch:
-agentlib:CAJVMTIAxx
- For each of the above, the last "xx" should be either 32
or 64, depending on whether you're using a 32-bit or 64-bit
machine.
As an example, if you're using Sun's 1.5.0 JDK on a 64-bit
VM, your batch file would contain the following command:
java -XrunCAJVMTIA64 DevX_CA_Example 4000 40000
If you have trouble with this later, try exchanging "Xrun"
with "agentlib:" or vice versa, as some VM vendors are moving
toward the "agentlib" syntax. Save this command to a
Run_Example.bat file. This will be referenced in your project.
If you haven't already, install
CodeAnalyst and poke around a bit. When you create a project, you'll see several
profiling options, which are described in more detail in "Optimizing for Multi-Core with AMD CodeAnalyst." For this test
app, you'll set up a timer-based profile.
- Create a New Project
- Set Session Name to "DevX Java CodeAnalyst Example
- For Project Directory, select the directory where you
want the CodeAnalyst profiling results to be saved.
- Working Directory should be the directory where your
byte code can be found
- Launch App: navigate to the batch file you created,
Run_Example.bat. By default, the full path should be
included and it should be contained in quotes.
- Check "Stop on app exit?"
- Check "Duration of app?"
- When you ran a test of the app in the previous section,
you should have seen an execution time for the "training"
set of iterations for your particular machine. Enter that as
your "Start delay," in seconds. This will ensure you're
profiling actual performance.
- Check "Include Java information?" This is the key
distinction for testing Java profiles in CodeAnalyst.
- Check either PI or TI, depending on the agent you're
using as determined above.
- Click OK. CodeAnalyst might ask if the app is 64-bit.
Obviously, if you're running on a 64-bit VM, click
Yes.
The profile is ready to run. At this point, just hit the
Start button on the main menu or choose Sampling -> Start.
Analyzing the Results
If all
goes correctly, you should see results like
Figures 1
and
2.
The first thing you should see is a module view in the
System Data tab (Figure 1).
This lists all the binaries that were encountered while
profiling. At the top, you should see "DevX_CA_Example 4000
40000," in this example running 28437 total samples.
If you double-click the process, you'll see a list of
methods encountered by the profiler for that binary (Figure 2).
In this case, it only encountered
critical_section(). This shows you where the bulk
of the processing, or in this case, all of it, took place.
Double-click again and you'll get more information than you
probably want or need at this point. If asked for the Java
source, cancel. The ability to profile ticks by source code
will be beefed up in coming versions of the tool. But for now,
you can at least take a look at the assembler for it.
Where to Go from Here
This
gives you a simple demonstration of a timer-based profile.
Event-based profiling, also extremely helpful, will be covered
in future walkthroughs.
Using AMD CodeAnalyst, you have several options for gauging
the performance of your Java app. For additional information
on CodeAnalyst in general, check out the following resources:
If you want to learn more about using CodeAnalyst to
track the performance of multi-threaded apps, here are some
additional resources: