NSF's PRObE CFP: 1000 Nodes for Systems Research Experiments

In our June issue of USENIX ;login: magazine, Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd contributed an article called PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research. Recently we heard back from Garth Gibson with the following Call for Proposals:

NSF's PRObE operates four clusters to support systems research at scale. The largest is Kodiak, which is 1000 nodes (two core x86, 8GB DRAM, two 1TB disks, 1GE, and 8Gbps IB) donated by Los Alamos National Laboratory.

Kodiak currently hosts researchers from Georgia Tech, Carnegie Mellon, and Los Alamos. Princeton researchers have published results from Kodiak at USENIX NSDI '13 (watch the open access conference video: Stronger Semantics for Low-Latency Geo-Replicated Storage). On PRObE staging clusters are researchers from U Central Florida, UT Austin, Georgia Tech, and Carnegie Mellon.

PRObE resources are intended for (infrastructure) systems researchers committed to public release of their research results, typically publishing in distributed systems (e.g., USENIX OSDI or SOSP), cloud computing (e.g., SOCC), supercomputing (e.g., SC or HPDC), storage (e.g., USENIX FAST), or networking (e.g., USENIX NSDI).

PRObE resources are managed by Emulab, a cluster manager for allocating physical nodes that has been in use for systems research for more than a decade (Brian White, An Experimental Environment for Distributed Systems and Networks, OSDI '02). Users start by porting and demonstrating their code on a 100-node staging cluster such as Denali built from the same equipment donation from Los Alamos. With demonstrated success on a staging cluster, and a compelling research goal, Kodiak can be requested and allocated, possibly exclusively, for hours or days.

To start using PRObE resources:

  • Visit nmc-probe.org to learn about the resources
  • Visit portal.nmc-probe.org to request a PRObE-specific Emulab account
  • Have a research leader or faculty member get an account and define a project on portal.nmc-probe.org
  • Use Portal to get onto Denali to allocate a single node experiment, log into that node to customize and resave the OS image for your project, then launch a multi-node experiment to demonstrate your system at <100 scale
  • Use https://www.nmc-probe.org/request/ to request a large allocation on Kodiak (this is a HotCRP paper review web site, where your paper is a short justification for your research, your preparedness for using Kodiak, and your credientials and appropriateness for using NSF resources)
  • And then PRObE managers will review, approve, and schedule your use of large allocations of Kodiak time

In a matter of weeks another style of large PRObE resource will come online. Susitna is 34 nodes of 64 core x86 processors, for a total of more than 2000 x86 cores. Susitna also has NVidia-donated K20 GPU coprocessors with 2496 cuda cores each, for a total of 84,864 cuda cores. With 128GB DRAM, a hard disk and an SSD each, Susitna nodes are interconnected by 40Gbps ethernet, 40Gbps infiniband, and 1Gbps ethernet.

NSF PRObE resources will be available for at least the next two years.

All uses of PRObE resources are obligated to publish their results, either in conferences or one their web sites, and acknowledge NSF PRObE resources used in these publications.