Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-Located Workshops
  • Program
  • Participate
    • Call for Papers
    • Instructions for Participants
  • Sponsorship
  • About
    • Workshop Organizers
    • Services
    • Questions
    • Help Promote
    • Past Workshops
  • Home
  • Attend
  • Program
  • Activities
  • Participate
  • Sponsorship
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner

help promote

USENIX ATC '15 button

Get more
Help Promote graphics!

USENIX Conference Policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Shoal: Smart Allocation and Replication of Memory For Parallel Programs

Stefan Kaestle, Reto Achermann, and Timothy Roscoe, ETH Zürich; Tim Harris, Oracle Labs, Cambridge

Modern NUMA multi-core machines exhibit complex latency and throughput characteristics, making it hard to allocate memory optimally for a given program’s access patterns. However, sub-optimal allocation can significantly impact performance of parallel programs.

We present an array abstraction that allows data placement to be automatically inferred from program analysis, and implement the abstraction in Shoal, a runtime library for parallel programs on NUMA machines. In Shoal, arrays can be automatically replicated, distributed, or partitioned across NUMA domains based on annotating memory allocation statements to indicate access patterns. We further show how such annotations can be automatically provided by compilers for high-level domain-specific languages (for example, the Green-Marl graph language). Finally, we show how Shoal can exploit additional hardware such as programmable DMA copy engines to further improve parallel program performance.

We demonstrate significant performance benefits from automatically selecting a good array implementation based on memory access patterns and machine characteristics. We present two case-studies: (i) Green-Marl, a graph analytics workload using automatically annotated code based on information extracted from the high-level program and (ii) a manually-annotated version of the PARSEC Streamcluster benchmark.

Stefan Kaestle, ETH Zürich

Reto Achermann, ETH Zürich

Timothy Roscoe, ETH Zürich

Tim Harris, Oracle Labs, Cambridge

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {190507,
author = {Stefan Kaestle and Reto Achermann and Timothy Roscoe and Tim Harris},
title = {Shoal: Smart Allocation and Replication of Memory For Parallel Programs},
booktitle = {2015 USENIX Annual Technical Conference (USENIX ATC 15)},
year = {2015},
isbn = {978-1-931971-225},
address = {Santa Clara, CA},
pages = {263--276},
url = {https://www.usenix.org/conference/atc15/technical-session/presentation/kaestle},
publisher = {USENIX Association},
month = jul
}
Download
Kaestle PDF
View the slides

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

Open Access Publishing Partners

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us