Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
      • HotCloud '15
      • HotStorage '15
  • Program
    • At a Glance
    • Technical Sessions
  • Activities
    • Birds-of-a-Feather Sessions
    • Poster Session
  • Participate
    • Call for Papers
    • Call for Practitioner Talks
    • Instructions for Participants
  • Sponsorship
  • About
    • Conference Organizers
    • Questions
    • Services
    • Help Promote
    • Past Conferences
  • Home
  • Attend
  • Program
  • Activities
  • Participate
  • Sponsorship
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner

help promote

USENIX ATC '15 button

Get more
Help Promote graphics!

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Callisto-RTS: Fine-Grain Parallel Loops
Tweet

connect with us

Callisto-RTS: Fine-Grain Parallel Loops

Authors: 

Tim Harris, Oracle Labs; Stefan Kaestle, ETH Zürich

Abstract: 

We introduce Callisto-RTS, a parallel runtime system designed for multi-socket shared-memory machines. It supports very fine-grained scheduling of parallel loops—down to batches of work of around 1K cycles. Fine-grained scheduling helps avoid load imbalance while reducing the need for tuning workloads to particular machines or inputs. We use per-core iteration counts to distribute work initially, and a new asynchronous request combining technique for when threads require more work. We present results using graph analytics algorithms on a 2-socket Intel 64 machine (32 h/w contexts), and on an 8-socket SPARC machine (1024 h/w contexts). In addition to reducing the need for tuning, on the SPARC machines we improve absolute performance by up to 39% (compared with OpenMP). On both architectures Callisto-RTS provides improved scaling and performance compared with a state-of-the-art parallel runtime system (Galois).

Tim Harris, Oracle Labs, Cambridge

Stefan Kaestle, ETH Zürich

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {191562,
author = {Tim Harris and Stefan Kaestle},
title = {{Callisto-RTS}: {Fine-Grain} Parallel Loops},
booktitle = {2015 USENIX Annual Technical Conference (USENIX ATC 15)},
year = {2015},
isbn = {978-1-931971-225},
address = {Santa Clara, CA},
pages = {45--56},
url = {https://www.usenix.org/conference/atc15/technical-session/presentation/harris},
publisher = {USENIX Association},
month = jul,
}
Download
Harris PDF
View the slides

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

Open Access Publishing Partner

© USENIX

  • Privacy Policy
  • Contact Us