DLion: Decentralized Distributed Deep Learning in Micro-Clouds


Rankyung Hong and Abhishek Chandra, University of Minnesota


Deep learning is a popular technique for building inference models and classifiers from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds, small-scale clouds deployed near edge devices in many different locations, provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {234825,
author = {Rankyung Hong and Abhishek Chandra},
title = {DLion: Decentralized Distributed Deep Learning in Micro-Clouds},
booktitle = {11th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 19)},
year = {2019},
address = {Renton, WA},
url = {https://www.usenix.org/conference/hotcloud19/presentation/hong},
publisher = {{USENIX} Association},
month = jul,