Check out the new USENIX Web site. next up previous
Next: Performance Impact of Copies Up: Performance Evaluation Previous: Experimental Platform

Application Performance

We ran the seven applications on our cluster of eight nodes. On each node, the application consists of two threads, the communication thread for handling incoming messages and the application thread that performs computation. We present the performance results for the problem sizes mentioned in Table 1 and then analyze the performance in detail.

Table 5 shows the speedups for the seven SPLASH-2 applications we used. LU and Ocean achieved speedups of 7.4 and 7.7 respectively, followed by Water-Spatial, Barnes and Water-Nsquared with speedups greater than 6. FFT comes next followed by Radix which has the worst speedup of the lot.


Table 5: Speedups on 8 nodes
Applications Speedup (8 nodes)
Barnes 6.3
FFT 5.8
LU 7.4
Ocean 7.7
Radix 4.3
Water-Nsquared 6.2
Water-Spatial 6.7


Figure 2: Normalized execution time breakdown on 8 nodes
\begin{figure*}\centerline{
\epsfxsize=4.7in \epsfbox{fig2.eps}
}\vskip-.2in
\end{figure*}

For the purpose of this study, we classify the applications according to their data access patterns and synchronization behavior. The application can be single writer or multiple writer, based on the number of concurrent writers on the same coherence unit (a page). The communication to computation ratio is determined by the granularity of data access. Fine grain access can introduce fragmentation and/or false sharing, resulting in an increase in the communication to computation ratio. Since all coherence events in the LRC protocols happen at synchronization points, the frequency of synchronization plays an important role in the performance. The average computation time between two consecutive synchronization events is a good measure of the frequency of synchronization.

LU and Ocean are single-writer applications with coarse-grain access. These applications exhibit good spatial locality with only one writer per shared page and hence achieve good speedups. FFT is a single-writer application with fine-grained access. The mismatch between the access granularity and the communication granularity prevents it from achieving a better speedup. Applications like Barnes-Spatial and Water-Spatial are multiple-writer with fine-grain access and coarse-grain synchronization. The high average time between synchronization events for these applications helps in achieving good performance. The relaxed consistency model and the multiple-writer support of HLRC helps these applications in achieving good speedups. Water-Nsquared and Radix are multiple-writer applications with coarse-grain access. In Water-Nsquared, since each process updates successively a large number of contiguous molecules, the access pattern is preserved at the page level which leads to a coarse-grain access pattern, which is well suited. Radix, however, does not achieve a good speedup due to a large amount of time spent in the barrier, which is caused by an imbalance.


next up previous
Next: Performance Impact of Copies Up: Performance Evaluation Previous: Experimental Platform
Murali Rangarajan 2000-08-09