Check out the new USENIX Web site.

NUMACROS: Data Parallel Programming on NUMA Multiprocessors

Hui Li and Kenneth C. Sevcik
Computer Systems Research Institute
University of Toronto
Toronto, CANADA


Data parallel programming has been widely used in developing scientific applications on various types of parallel machines: SIMD, MIMD distributed memory machines, and UMA shared memory machines. On NUMA shared memory machines, data locality is the key to good performance of parallel applications. In this paper, we propose a set of macros (NUMACROS) for data parallel programming on NUMA machines. NUMACROS attempts to achieve both ease of programming and data locality for performance. Programs written using NUMACROS are nearly as short and easily readable as sequential versions of the programs. To obtain data locality, data and loops are distributed and partitioned in a coordinated fashion among the processors. Although global address spaces facilitate data distribution on NUMA systems, a naive implementation of an application will suffer from high costs. To reduce the cost, a number of approaches have been proposed and evaluated. These include index precomputing, index checking, loop transformation, and others. Our experimental results, with the Hector multiprocessor, show that these approaches are effective. While such facilities will be provided by compilers in the long run, NUMACROS is a helpful interim step.

Download the full text of this paper in ASCII form (43,524 bytes).

To Become a USENIX Member, please see our Membership Information.