FeatureUSENIX

 

java performance

mccluskey_glen

by Glen McCluskey

Glen McCluskey is a consultant with 15 years of experience and has focused on programming languages since 1988. He specializes in Java and C++ performance, testing, and technical documentation areas.

<glenm@glenmccl.com>


Since Java first came out a few years ago, debate and discussion about performance have continued, especially in terms of comparison to other languages. Java has the philosophy of "write once run anywhere," that is, the idea of compiled Java programs being portable. This portability is achieved by compiling Java programs into bytecodes, which are then interpreted at runtime. More recently, there has been an emphasis on use of just-in-time compilers, by which bytecodes are compiled at runtime into a more efficient native form.

It is interesting to look at the current state of Java performance. One way is to devise some simple benchmark programs. Such programs can provide useful information and also have some limitations.

Sorting

The first benchmark program to consider is one that sorts a vector of numbers, using an O(N**2) algorithm. This program doesn't have any dependencies like I/O or use of library functions, and is valuable in measuring the actual execution speed of Java bytecodes. A similar program can be written using C++, as a comparison. The Java program looks like this:

public class sort {
 public static void main(String args[]) {
  final int N = 25000;
  short vec[] = new short[N];
  for (int i = 0; i < N; i++)
   vec[i] = (short)(N - i);
  for (int i = 0; i < N - 1; i++) {
   for (int j = i + 1; j < N; j++) {
    if (vec[i] > vec[j]) {
     short t = vec[i];
     vec[i] = vec[j];
     vec[j] = t;
    }
   }
  }
 }
}

and its C++ counterpart:

const int N = 25000;
short vec[N];
int main()
{
 for (int i = 0; i < N; i++)
  vec[i] = N - i;
 for (int i = 0; i < N - 1; i++) {
  for (int j = i + 1; j < N; j++) {
   if (vec[i] > vec[j]) {
    short t = vec[i];
    vec[i] = vec[j];
    vec[j] = t;
   }
  }
 }
 return 0;
}

We will use three different compilers for this analysis:

  •  JDK 1.2 (Java 2) interpreter
  •  JDK 1.2 just-in-time compiler
  •  C++Builder 4 (Borland C++)

enabling compiler optimization switches in each case.

The running times of the above program are as follows:
interpreter326 seconds
just-in-time8
C++7

The just-in-time compiler represents a huge advantage over interpretation and is nearly as fast as C++.

But this example also illustrates one of the pitfalls of comparing the performance of two different languages. Suppose that I change the line in the Java sort that reads:

for (int i = 0; i < N; i++)

to:

for (int i = 0; i <= N; i++)

This off-by-one error will be caught when the Java program runs, but an equivalent error in the C++ program probably will not be. In other words, Java checks array subscripts at runtime, but C++ does not. The Java approach does more, but at some cost. Choosing which one is "right" depends a lot on your philosophy.

Copying Files

For the second example, consider copying files. The Java version is:

import java.io.*;
public class copy {
 public static void main(String args[])
 {
  if (args.length != 2) {
   System.err.println("error");
   System.exit(1);
  }
  try {
   FileInputStream fis =
    new FileInputStream(args[0]);
   FileOutputStream fos =
    new FileOutputStream(args[1]);
   byte buf[] = new byte[1024];
   int n;
   while ((n = fis.read(buf)) > 0)
    fos.write(buf, 0, n);
   fis.close();
   fos.close();
  }
  catch (IOException e) {
   System.err.println(e);
   System.exit(1);
  }
 }
}

and the C++ version:

#include <stdio.h>
int main(int argc, char* argv[])
{
 FILE *fpin, *fpout;
 if (argc != 3 || (fpin = fopen(argv[1], "rb")) == NULL ||
  (fpout = fopen(argv[2], "wb")) == NULL) {
   fprintf(stderr, "error\n");
   return 1;
 }
 int n;
 char buf[1024];
 while ((n = fread(buf, sizeof(char), sizeof(buf), fpin)) > 0)
  fwrite(buf, sizeof(char), n, fpout);
 fclose(fpin);
 fclose(fpout);
 return 0;
}

We could have used system calls (read/write) in the C++ version, but it seems fairer to use fread/fwrite, in that the Java I/O library, even at its most fundamental level, is at least one layer removed from actual system calls, and so the C++ version should be as well. Running times for copying large (125 MB) uncached files are:

interpreter38
just-in-time40
C++37

In other words, there's not really much of a difference between the approaches. We might expect this similarity, in that FileInputStream.read() and FileOutputStream.write() are native methods, meaning that they are implemented in terms of underlying functionality written in some other language, for example, C. In other words, low-level Java I/O immediately translates into calls to underlying libraries, somewhat like fread/fwrite calls being implemented in terms of read/write system calls.

Tabulating Word Frequencies

The final example is one that uses higher-level library routines to tabulate word frequencies. Input is a file containing a list of words, one per line, and the output is a list of words and their frequencies. In Java, one way of implementing this is:

import java.io.*;
import java.util.*;
public class count {
 public static void main(String args[])
 {
  if (args.length != 1) {
   System.err.println("error");
   System.exit(1);
  }
  HashMap hm = new HashMap();
  class Rec {int cnt;}
  try {
   FileReader fr = new FileReader(args[0]);
   BufferedReader br = new BufferedReader(fr);
   String s = null;
   while ((s = br.readLine()) != null) {
    Rec r = (Rec)hm.get(s);
    if (r == null) {
     r = new Rec();
     hm.put(s, r);
    }
    r.cnt++;
   }
   br.close();
  }
  catch (Throwable e) {
   System.err.println(e);
   System.exit(1);
  }
  Iterator iter = hm.entrySet().iterator();
  while (iter.hasNext()) {
   Map.Entry e = (Map.Entry)iter.next();
   System.out.println(((Rec)e.getValue()).cnt +
    " " + e.getKey());
  }
 }
}

and in C++, we would say:

#include <fstream>
#include <iostream>
#include <map>
#include <string>
using namespace std;
int main(int argc, char* argv[])
{
 if (argc != 2) {
  cerr << "error" << endl;
  return 1;
 }
 ifstream infile(argv[1]);
 if (!infile) {
  cerr << "error" << endl;
  return 1;
 }
 typedef map<string,int> MAP;
 MAP hm;
 string s;
 while (getline(infile, s))
  hm[s]++;
 for (MAP::const_iterator iter = hm.begin();
  iter != hm.end(); iter++)
   cout << iter->second << " " << iter->first << endl;
 return 0;
}

In both cases, we use hash tables to accumulate the words and their frequencies. For a list of around 900K words, running times are:

interpreter86
just-in-time27
C++25

When we compare languages with a benchmark such as this one, a more complex benchmark that uses library features, it becomes harder to analyze and draw conclusions. For example, each program uses a hash-table data structure provided in the standard libraries. There are tradeoffs to be made in designing such structures, such as whether you're trying to optimize for speed or for space. If you're optimizing for speed, you might be more aggressive in growing the table size when the table starts to get full.

One way of interpreting the relative times in this case is to look at this application as one that (1) uses low-level I/O, with similar performance across the three compilers, (2) uses high-level I/O (reading lines), which is affected by the issue of interpreter versus just-in-time versus native-code costs, and (3) uses hash-table data structures, with tradeoffs in representation and speed versus space.

Other Performance Areas

There are other performance areas we've not really looked at. For example, an important one in some contexts is invocation time. Using the above compilers, it takes about 15 milliseconds to invoke a null C++ program, and 325 for a null Java one. Another area that might be looked at is GUI performance.

Conclusion

The whole area of Java performance is changing rapidly. For example, Sun recently introduced its HotSpot compiler, and several vendors are working on native-code compilation for Java. Whether Java will ever be as fast or faster than languages like C or C++ is hard to say; it might be better to ask "Faster at what?" or ask whether taking a slight performance hit in return for added functionality (such as array subscript checking) is a worthwhile tradeoff to make. We already make such tradeoffs in other areas today, for example in using high-level languages in preference to assembly language.

 

?Need help? Use our Contacts page.
Last changed: 18 Nov. 1999 mc
Issue index
;login: index
USENIX home