Shadow Puppets: Cloud-level Accurate AI Inference at the Speed and Economy of Edge


Srikumar Venugopal, Michele Gazzetti, Yiannis Gkoufas, and Kostas Katrinis, IBM Research


Extracting value from insights on unstructured data on the Internet of Things and Humans is a major trend in capitalizing on digitization. To date, the design space for doing AI inference on the edge has been highly binary: either consuming cloud-based inference services through edge APIs or running full-fledged deep models on edge devices. In this paper, we break this design space duality by proposing the Semantic Cache, an approach that blends best-of-breed features of the extreme ends of the current design space. Early evaluation results on a first prototype implementation of our semantic cache service on object classification tasks shows tremendous inference latency reduction, when compared to cloud-only inference, and high potential in scoring adequate accuracy for a plurality of AI use-cases.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {216787,
author = {Srikumar Venugopal and Michele Gazzetti and Yiannis Gkoufas and Kostas Katrinis},
title = {Shadow Puppets: Cloud-level Accurate {AI} Inference at the Speed and Economy of Edge},
booktitle = {{USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 18)},
year = {2018},
address = {Boston, MA},
url = {},
publisher = {{USENIX} Association},