Jason Cong, Peng Wei, and Cody Hao Yu, University of California, Los Angeles
FPGAs (field-programmable gate arrays) can be flexibly reconfigured to accelerate many computation kernels with orders-of-magnitude performance/watt improvement, making FPGA-based heterogeneous systems a promising approach to driving continuous performance and energy improvement in today's datacenters. However, the significant gains on computation kernels are often considerably offset by the extra data transfer overhead, resulting in considerably reduced system-wide speedup, or even slowdown. In this paper we propose a fully pipelined data transfer stack that achieves efficient JVM-FPGA communication through extensive pipelining. Also, we introduce a programming framework that automatically generates most of the pipeline code, freeing users from the bothersome details of FPGA management. Furthermore, we address the issue of multi-stage pipeline throughput optimization by formulating it into an integer linear programming problem and applying its solution for generating the optimal pipeline implementation. Experiments show that the proposed pipeline stack achieves 4.9x speedup for various computation kernels.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.