
Industry uses NUMA multicore machines for its servers. On NUMA machines, the conventional wisdom is to place threads close to the memory they access, and to collocate the threads that share data on the same CPU nodes. However, this is often not optimal. Indeed, modern NUMA machines have asymmetric interconnect links between CPU nodes, which can strongly affect performance, with best placement outperforming worst placement on nodes by a factor of almost two. We present the AsymSched algorithm, which uses CPU performance counters to measure performance and dynamically migrate threads and memory to achieve the best placement.
Download Article:
Article Section:
SYSTEMS
;login: issue: