Java Performance in 64bit land
If you were buying a new car and your primary goal was performance, or more specifically raw power – given the choice between a 4 cylinder and a 8 cylinder engine, the choice is obvious. Bigger is better. Generally when we look at computers the same applies, or at least that is how the products are marketed. Thus a 64bit system should out-perform a 32bit system, in the same way that a quad core system should be faster than a dual core.
Of course, what a lot of the world is only starting to understand is that more isn’t always better when it comes to computers. When dealing with multiple CPUs, you’ve got to find something useful for those extra processing units to do. Sometimes your workload is fundamentally single-threaded and you have to let all those other cores sit idle.
The 32bit vs. 64bit distinction is a bit more subtle. The x86-64 architecture adds not only bigger registers to the x86 architecture, but more registers. Generally this translates to better performance in benchmarks (as having more registers allows the compilers to create better machine code). Unfortunately until recently, moving from a 32bit java to a 64bit java mean taking a performance hit.
When we go looking at java performance, there are really 2 areas of the runtime that matter: the JIT and the GC. The job of the JIT is to make the code that is running execute as fast as possible. The GC is designed to take as little time away from the executing of code as possible (while still managing memory). Thus java performance is all about making the JIT generate more optimal code (more registers helps), and reducing the time the GC has to use to mange memory (bigger pointers makes this harder).
J9 was originally designed for 32bit systems and this influenced some of the early decisions we made in the code base. Years earlier I had spent some time with a PowerPC system that ran in 64bit mode trying to get our Smalltalk VM running on it, and had reached the conclusion that the most straight forward solution was simply to make all of the data structures (objects) twice as big to handle the 64bit pointers. With J9 development (circa 2001), one of the first 64bit systems we got our hands on was a Dec Alpha so we applied the straight forward ‘fattening’ solution, allowing a common code base to support both 32bits and 64bits.
A 64bit CPU will have a wide data bus, but recall that this same 64bit CPU can run 32bit code as well and it still has the big wide data bus to move things around with. When we look at our 64bit solution of allowing the data to be twice as big, we’re actually at a disadvantage relative to 32bits on the same hardware. This isn’t a problem unique to J9, or even Java – all 64bit programs need to address this data expansion. It turns out that the dynamics of the java language just tend to make this a more acute problem as java programs tend to be all about creating, and manipulating objects (aka data structures).
The solution to this performance issue is to be smarter about the data structures. This is exactly what we did in the IBM Java6 JDK with the compressed references feature. We can play tricks (and not get caught) because the user (java programmer) doesn’t know the internal representation of the java objects.
The trade off is that by storing less information in the object, we limit the total amount of memory that can be used by the JVM. This is currently an acceptable solution, as computer memory sizes are nowhere near the full 64bit address range. We only use 32bits to store pointers, and take advantage of 8 byte aligned objects to get a few free bits [ pointer << 3 ]. Thus the IBM Java6 JDK using compressed references (-Xcompressedrefs) can address up to 32Gb of heap.
We’re not the only ones doing this trick, Oracle/BEA have the -XXcompressedRefs option and Sun has the -XX:+UseCompressedOops option. Of course, each of the vendors implementations are slightly different with different limitations and levels of support. Primarily you see these flags used in benchmarking, but as some of our customers are starting to run into heap size limitations on 32bit operating systems they are looking to move to 64bit systems (but would like to avoid giving up any performance).
There is a post on the websphere community blog that talks about the IBM JDK compressed references and has some pretty graphs showing the benefits. And Billy Newport gives a nice summary of why this feature is exciting.