OpML '19 Conference Program

OpML '19 Program Grid

View the program in mobile-friendly grid format.

Papers and Proceedings

The full Proceedings published by USENIX for the conference are available for download below. Individual papers can also be downloaded from the presentation page. Copyright to the individual works is retained by the author[s].

Proceedings Front Matter
Proceedings Cover | Title Page and List of Organizers | Message from the Program Co-Chairs | Table of Contents

Full Proceedings PDFs
 OpML '19 Full Proceedings (PDF)
 OpML '19 Proceedings Interior (PDF, best for mobile devices)

Downloads for Registered Attendees

(Sign in to your USENIX account to download these files.)

Attendee Files 
OpML '19 Attendee List (PDF)
OpML '19 Web Archive

May 20, 2019

8:00 am–9:00 am

Continental Breakfast and Badge Pickup

Sponsored by IBM

9:00 am–9:15 am

Opening Remarks

Program Co-Chairs: Bharath Ramsundar, Computable, and Nisha Talagala, Pyxeda AI

9:15 am–10:00 am

Keynote Address

Santa Clara Ballroom

Ray: A Distributed Framework for Emerging AI Applications

Michael I. Jordan, University of California, Berkeley

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this talk, we consider these requirements and present Ray—a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system’s control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications.

Michael I. Jordan, University of California, Berkeley

Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. He was a professor at MIT from 1988 to 1998. His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics. He is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. He is a Fellow of the AAAS (American Association for the Advancement of Science), AAAI (Association for the Advancement of Artificial Intelligence), ACM (Association for Computing Machinery), ASA (American Statistical Association), IEEE (Institute of Electrical and Electronic Engineers), IMS (Institute of Mathematical Statistics), ISBA (International Society for Bayesian Analysis), and SIAM (Society for Industrial and Applied Mathematics). His work has been cited over 125,000 times by other scientists all over the world and in 2016 he was identified as the “most influential computer scientist” based on analysis of the published literature by the Semantic Scholar project.

10:00 am–10:30 am

Break with Refreshments


10:30 am–12:30 pm

Presentations Track 1

Production Experiences and Learnings

Session Chair: Sean Grullon, GSK Healthcare

Winchester Room

Opportunities and Challenges Of Machine Learning Accelerators In Production

Rajagopal Ananthanarayanan, Peter Brandt, Manasi Joshi, and Maheswaran Sathiamoorthy, Google, Inc.

Available Media

The rise of deep learning has resulted in tremendous demand for compute power, with the FLOPS required for leading machine learning (ML) research doubling roughly every 3.5 months since 2012. This increase in demand for compute has coincided with the end of Moore’s Law.

As a result, major industry players such as NVIDIA, Intel, and Google have invested in ML accelerators that are purpose built for deep learning workloads.

ML accelerators present many opportunities and challenges in production environments. This paper discusses some high level observations from experience internally at Google.

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

Minjia Zhang, Samyam Rajbandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, and Yuxiong He, Microsoft AI and Research

Available Media

The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving deep learning models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

A Distributed Machine Learning For Giant Hogweed Eradication

Naoto Umemori and Masaru Dobashi, NTT DATA

Available Media

Giant Hogweed is a highly toxic plant originating in the Western Caucasus. It has spread across Central and Western Europe and there are sightings of Giant Hogweed reported from North America, too. Landowners are obliged to eradicate it, due to its toxicity and invasive nature in Europe. However, it is difficult for landowners to find and remove Giant Hogweed across large areas of land since it is a very cumbersome manual process.

To automate the process of detecting the Giant Hogweed by exploiting technologies like drones and image recognition/detection using Machine Learning is an effective way to address this problem. However, we had to solve issues like below.

  • How to estimate the habitat or geographical information of the Giant Hogweed from 4K size aerial photographs.
  • Data utilization and image learning/inference infrastructure are necessary since the amount of data of the aerial photographs to handle becomes Terabyte class.
  • On the other hand, if dedicated clusters are constructed for each process and used, the operation becomes complicated.

In this session, we will show how to integrate a drone, Apache Hadoop, Apache Spark, and TensorFlow to solve the above with the architecture and we will introduce it while referring to the processing method.

Naoto Umemori, NTT DATA

Naoto is a Senior Infrastructure Engineer and Deputy Manager at NTT DATA Corporation, working on technology and innovation area. He has spent around 10 years in the Platform and Infrastructure field, focusing mainly on the Open Source Software Technology Stack. Masaru is a senior IT infrastructure engineer/IT architect and manager of NTT DATA Corporation. He is responsible for the research and development of the data processing and analytics platform.

AIOps: Challenges and Experiences in Azure

Ze Li and Yingnong Dang, Microsoft Azure

While AI and Machine Learning are transforming the whole industry, AIOps is transforming cloud service building and operating. The value of AIOps can be broad and at least include service quality assurance, customer experience at scale, continuous COGS reduction, and boosting engineers’ productivity. Building AIOps solutions has unique challenges compared to applying AI and ML in other domains. In this talk, we will share the challenges we met in Azure in building AIOps solutions, and our experiences on solving these challenges. We will also share a few case studies, including (1) our disk failure prediction service that predict disk health condition and proactively live migrate the workloads to a healthy disk. (2) An end-to-end analytics service, for safe deployment in large-scale system infrastructure based on ensemble ranking and spatial/temporal algorithms using lambda architecture (3) An anomaly detection and auto diagnosis service that monitor the essential telemetrics in the cloud system.

Ze Li, Microsoft Azure

Dr. Ze Li is a data scientist in Microsoft Azure. Currently, he focus on using data driven and AI technologies to enable efficiently and effectively building and operating cloud service, such as safe deployment in large scale system, intelligent anomaly detection and pattern mining in cloud services. Previously, he worked as data scientist/engineer in Capital One and MicroStrategy, where he provided data driven solutions to improve efficiency in financial services and business intelligent services. He published more than 40 peer review papers in the field of data mining, distributed networks/systems and mobile computing. He hold a Ph.D degree in computer engineer from Clemson University.

Yingnong Dang, Microsoft Azure

Yingnong Dang is a Principal Data Scientist Manager in Microsoft Azure. Yingnong's focus is on building analytics and ML solutions for improving Azure Infrastructure availability and capacity, boosting engineering productivity, and increasing customer satisfaction. Yingnong and the team have a close partnership with Microsoft Research and academia. Before joining Azure in December 2013, Yingnong was a researcher in Microsoft Research Asia lab. His research areas include software analytics, data visualization, data mining, and human-compute interaction. As a researcher, he has transferred various technologies to Microsoft product teams including code clone analysis, crash dump analysis, performance trace analysis, etc. He owns 45+ U.S. patents and has published papers in top conferences including ICSE, FSE, VLDB, USENIX ATC, and NSDI.

Quasar: A High-Performance Scoring and Ranking Library

Andris Birkmanis and Fei Chen, LinkedIn

Available Media

Quasar is a part of LinkedIn's machine learning platform, focusing on two aspects: feature transformation (including scoring) and ranking. It serves many LinkedIn products which include feed, job recommendation, notification recommendation, people-you-may-know, search, etc. It is one of the leading Java-based, in-production, Internet-scale machine learning ranking libraries in industry. In this talk, we will provide an overview of Quasar and highlight the technical challenges and solutions when we built Quasar, as well as present the way of thinking about ranking.

Andris Birkmanis, LinkedIn

Andris Birkmanis is passionate about technology and knowledge, from biology to economics, from 3D graphics to machine learning. His journey started with hacking 8-bit computers, continued with becoming a professional software developer, and led him to building tools and platforms for machine learning.

Fei Chen, LinkedIn

Fei Chen is a Senior Engineering Manager at LinkedIn Data Organization. She leads a machine learning engineering team to standardize job and location-related LinkedIn data. Before that, she led the ranking infrastructure team to build Quasar, the ubiquitous scoring and ranking at LinkedIn. Prior to LinkedIn, she was a researcher at HP Labs, working on machine learning and data management. Her work has been published at top data management conferences, including CIDR, ICDE, SIGMOD, and VLDB.

AI from Labs to Production - Challenges and Learnings

Ramprakash Ramamoorthy, P Satya Madhuri, and Malini Christina Raj, Zoho Corporation

This talk is about the story of taking AI from research to deployment in the enterprise software world. The talk walks you through personnel, product management and technical challenges and learnings in retrofitting AI into commercial enterprise products. The talk also highlights how AI for the enterprise needs a different strategy than the typical B2C AI and touches upon the changes we made to convince the average enterprise user to come back and imbibe the AI feature into his everyday workflow, thereby driving our ROI.

Ramprakash Ramamoorthy, Zoho Corporation

Ramprakash Ramamoorthy leads the AI efforts for Zoho Corporation. Ramprakash has been instrumental in setting up Zoho's AI platform from scratch and comes with a rich 8+ years of experience in building AI for the enterprises at Zoho.

Malini Christina Raj, Zoho Corporation

Malini Christina Raj owns the AI stack for web security and log management at Zoho Corporation. She manages AI in our on-premise SIEM and endpoint management products.

Presentations Track 2

Production Experiences and Learnings

Session Chair: Boris Tvaroska, Lenovo

Stevens Creek Room

MLOp Lifecycle Scheme for Vision-based Inspection Process in Manufacturing

Junsung Lim, Hoejoo Lee, Youngmin Won, and Hunje Yeon, Samsung Research

Available Media

Recent advances in machine learning and the proliferation of edge computing have enabled manufacturing industry to integrate machine learning into its operation to boost productivity. In addition to building high performing machine learning models, stakeholders and infrastructures within the industry should be taken into an account in building an operational lifecycle. In this paper, a practical machine learning operation scheme to build the vision inspection process is proposed, which is mainly motivated from field experiences in applying the system in large scale corporate manufacturing plants. We evaluate our scheme in four defect inspection lines in production. The results show that deep neural network models outperform existing algorithms and the scheme is easily extensible to other manufacturing processes.

Deep Learning Vector Search Service

Jeffrey Zhu and Mingqin Li, Microsoft

Available Media

Over the last couple of years, search has evolved beyond simple keyword-based information retrieval to more complex scenarios, such as natural language queries, Question-and-Answer, and multimedia search. Deep learning models are used to encode user intent and context into vector representations, which are then searched against billions of other vectors to find the most relevant results.

Deep Learning Vector Search Service (DLVS) is a low latency, large scale, and highly efficient vector search system at Microsoft, primarily used within the Bing search engine. This talk will discuss the key innovations in approximate nearest neighbor (ANN) algorithm and distributed vector index serving platform necessary to achieve this scale and performance.

Jeffrey Zhu, Microsoft

Jeffrey Zhu is a program manager at Microsoft who drives the development of Bing's deep learning platform. This platform powers some of Bing's most innovative features, such as machine reading comprehension and visual search. It serves millions of deep learning model inferences per second and supports vector search over billions of vectors at low latency and high efficiency.

Mingqin Li, Microsoft

Mingqin Li is the software engineering manager at Microsoft, who leads Bing's deep learning platform. Low latency, large scale, and highly efficient deep learning vector search service are developed for various scenarios like web search, similar image search, question-and-answering, etc. She is also one of the key contributors to open source project SPTAG, which published the approximate nearest neighbor (ANN) algorithm used in vector search.

Signal Fabric—An AI-assisted Platform for Knowledge Discovery in Dynamic System

Souren Aghajanyan, Roman Batoukov, and Jian Zhang, Microsoft

Available Media

Cloud service system management can be challenging and costly because of its volume of components to monitor, dynamic traffic patterns, and velocity required to support mission-critical applications. In this talk, we’ll introduce Signal Fabric, an AI-assisted system intelligence platform for knowledge discovery in fast-changing environments. It has the capability of building dynamic ontology out of federated data sources. Signal Fabric utilizes composable micro-agent framework to continuously observe and learn topological relationship from signals/metrics in the system. It then leverages the captured knowledge to identify the right subset of signals to facilitate decision making.

Signal Fabric has been used to monitor billions of Windows devices and now is part of Azure observability platform.

Souren Aghajanyan, Microsoft

Souren Aghajanyan is a Group Engineering Manager in Azure PIE (Production Infrastructure Engineering) team working on the next generation observability platform for Microsoft. Prior to that, he was a founding member of Asimov and Xpert, a telemetry system that monitors the Windows ecosystem from devices to services. He worked from Windows to Xbox and on some interesting incubation efforts.

Roman Batoukov, Microsoft

Roman Batoukov is an AI/ML Engineering Manager in Azure PIE (Production Infrastructure Engineering) team working on the next generation of Azure observability platform for Microsoft. His previous experience includes telemetry pipelines (Asimov – Windows and Xbox), internet scale services and AI-assisted monitoring solutions.

Jian Zhang, Microsoft

Jian Zhang is a program manager at Microsoft. She received her Ph.D. in Electrical & Computer Engineering and M.S. in Management and Finance from the University of Florida. She is experienced in turning advanced technologies from research into products and services such as Azure, Office Online, and Microsoft Store Online. She holds Data Mining and Application Graduate Certificate from Stanford University, IoT Business and Application Certificate from MIT, and is a certified PMP (Program Management Professional).

Relevance Debugging and Explaining at LinkedIn

Daniel Qiu and Yucheng Qian, LinkedIn

Available Media

At LinkedIn, we provide value to our members by serving information most relevant to them. Because of the complexity of the distributed system, it's hard for our AI engineers to understand how does the machine learning model make the prediction to provide the value to our members in the online system.

In this talk, we will cover how we provide the infrastructure to instrument our online relevance serving system to help AI engineers better understand their machine learning models and debug issues, and introduce two debugging tools that we provide for search and feed to visualize the relevance information.

Daniel Qiu, LinkedIn

Daniel Qiu is a software engineer at Linkedin. He has been working on infrastructure and tools for debugging the relevance machine learning model to improve the productivity of AI engineers and deliver Linkedin’s value to our members. Before that he was a computer science student at UCLA.

Yucheng Qian, LinkedIn

Yucheng Qian is a senior software engineer at LinkedIn working on improving productivity of AI engineers by providing effective debugging, explaining, and monitoring solutions. Previously, Yucheng has had years of experience developing both consumer-facing and professional-facing applications and led the effort to create the first machine learning platform at Jobcase Inc.

Shooting the moving target: machine learning in cybersecurity

Ankit Arun and Ignacio Arnaldo, PatternEx

Available Media

We introduce a platform used to productionize machine learning models for detecting cyberthreats. To keep up with a diverse and ever-evolving threat landscape, it is of paramount importance to seamlessly iterate over the two pillars of machine learning: data and models. To satisfy this requirement, the introduced platform is modular, extensible, and automates the continuous improvement of the detection models. The platform counts more than 1000 successful model deployments at over 30 production environments.

Deep Learning Inference Service at Microsoft

Jonathan Soifer, Jason Li, Mingqin Li, Jeffrey Zhu, Yingnan Li, Yuxiong He, Elton Zheng, Adi Oltean, Maya Mosyak, Chris Barnes, Thomas Liu, and Junhua Wang, Microsoft

Available Media

This paper introduces the Deep Learning Inference Service, an online production service at Microsoft for ultra-low-latency deep neural network model inference. We present the system architecture and deep dive into core concepts such as intelligent model placement, heterogeneous resource management, resource isolation, and efficient routing. We also present production scale and performance numbers.

Tutorials Track

Lawrence/San Tomas/Lafayette Rooms

Using Amazon SageMaker to Operationalize Machine Learning

Kumar Venkateswar, Amazon

In this tutorial, we'll talk about how Amazon Web Services customers are using Amazon SageMaker, a fully-managed service for machine learning, to accelerate the time-to-production for their ML models. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the algorithm, tune and optimize it for deployment, make predictions, and take action.

There are multiple aspects to operationalizing ML, including security and standardized environments for data exploration and ingestion, scaling training from small datasets to large datasets, readying models for deployment, and monitoring/managing production deployments. We'll talk about how SageMaker helps with each of these stages.

Understanding of the full ML workflow, but otherwise none.

Kumar Venkateswar, Amazon

Kumar Venkateswar currently leads the Amazon SageMaker product management team. He came to Amazon to improve the machine learning platform through launching Amazon SageMaker, the Amazon Deep Learning AMI, and several features in Amazon Machine Learning which provided a better user experience for data scientists in AWS. He has over a decade of experience working in product management in cloud services, including on machine learning, search, storage, and high availability features in petabyte-scale environments. He is an alumnus of the University of Chicago Booth School of Business (General Management Program) and the University of Illinois at Urbana-Champaign (MS, BS Electrical Engineering).

Nvidia “NGC” Deep Learning Containers

Chetan Tekur and Fraser Gardiner, Nvidia

This tutorial will cover the Nvidia’s “NGC” containers for deep learning including: which Deep Learning Frameworks and utilities are provided in Nvidia NGC containers; how to access and use these containers; which GPUs and cloud services can run Nvidia NGC containers; sample and example code included in Nvidia NGC containers which implement Deep Learning models; latest features to simplify achieving optimum performance and support for multi-node training.

Chetan Tekur, Nvidia

Chetan Tekur is a field applications engineer and solutions architect at Nvidia, where he's focused on CSP and networking customers. Chetan has 11+ years of experience in HW industry focusing on pre-sales/post-sales support, technology evangelization, and customer management. He has a Master’s degree in Electrical Engineering from NC State University.

Fraser Gardiner, Nvidia

Fraser Gardiner is the Solutions Architecture Director that supports Nvidia’s Major Cloud Service Provider Partners. He has over 25 years of experience in Unix/Linux systems to support both mission-critical enterprise applications as well as industrial/hyperscale use cases. He has held senior technical or technical leadership roles at Apple, HP, and Oracle/Sun Microsystems.

12:30 pm–1:30 pm

Conference Luncheon

Magnolia Room
Sponsored by Parallel Machines

1:30 pm–2:30 pm

Presentations Track 1


Session Chair: Nisha Talagala, Pyxeda AI

Lawrence/San Tomas/Lafayette Rooms

How the Experts Do It: Production ML at Scale

Lead and moderator: Joel Young, LinkedIn

Panelists: Sandhya Ramu, Director, AI SRE, LinkedIn; Andrew Hoh, Product Manager, ML Infra and Applied ML, AirBNB; Aditya Kalro, Engineering Manager, AI Infra Services and Platform, Facebook; Faisal Siddiqi, Engineering Manager, Personalization Infrastructure, Netflix; Pranav Khaitan, Engineering Manager, Personalization and Dialog ML Infra, Google

Machine learning is driving virtually every major online service we use. In this panel, top experts from across the industry will discuss how they have learned to scale machine learning and its use in solving real-world problems. Come and learn strategies for managing the fast evolution of technologies, get insights into how deep learning is changing the serving game from productionizing large models and using GPUs, learn how these companies keep their incredibly complicated serving stacks operable 24x7, and the dimensions of scale that they worry about—dimensions ranging from the raw queries-per-second flowing through their systems, the growing size and complexity of the models, to the number of users across engineering building and fielding them. We’ll conclude with a discussion of how these experts measure success.

Presentations Track 2

Handling Heterogeneity, Distribution, and Scale

Session Chair: Swaminathan Sundararaman

Stevens Creek Room

Predictive Caching@Scale

Vaishnav Janardhan and Adit Bhardwaj, Akamai Technologies

Available Media

Increasing content and services available on the Internet has lead to substantial growth in network traffic. A large distributed caching platform facilitate low-latency and high throughput for web and video content over the public Internet. However, the edge-infrastructure cannot grow at the rate of traffic and maintain quality service at low cost. With increasing long-tail content footprints and performance sensitive users, content-agnostic caching schemes fail to evolve with changing traffic popularity profiles which leads to poor caching decisions. At Akamai we built a very high performant, cost sensitive, content aware caching system that uses Machine Learning, to run on our distributed delivery platform. The developed ML-based caching algorithm, Prediction Error-Correcting Caching (PeCC), is cost competitive with a classical algorithm like LRU even when deployed on commodity hardware while achieving cache-hit ratios close to theoretically optimal caching schemes. We will talk about the main challenges and details in deploying PeCC. Introduction to web-traffic and how to build Deep Neural Networks based caching to scale cost-effectively. Second, deployment of a very compute intensive DNN models alongside a real-time web proxy with very tight performance guarantees. We will also discuss some key takeaways from deploying ML for system scalability.

Vaishnav Janardhan, Akamai Technologies

Vaishnav Janardhan is a Principal Architect at Akamai and leads the efforts to use ML techniques to solve performance and scalability challenges on large distributed systems. Vaishnav previously worked on transitioning Akamai’s traditional web delivery platform into a video delivery platform to support the massive growth of the video over the Internet. He has publications and patents on domain-specific file-systems, micro-architectural cpu optimization, tcp congestion control to reduce tail latency of web-traffic, and hierarchical caching. Most recently he worked on re-writing the monolithic web-servers to work on hyper parallel cpu architectures and to support multi-tenant and diverse workload on Akamai platform.

Adit Bhardwaj, Akamai Technologies

Adit Bhardwaj is a Senior Software Engineer at Akamai. He is interested in leveraging data to optimize and solve engineering problems through Machine Learning. He received is B.Tech degree in Electrical Engineering and Computer Science from IIT Gandhinagar, India in 2014 and a Master’s degree in Electrical Engineering from UC San Diego in 2017. Most recently, he is developing Machine Learning systems for content aware caching which can scale a large distributed platform. Adit previously worked on constrained convex optimization problem of Low-rank matrix recovery using Lagrangian techniques for image composition problems like HDR imaging.

Towards Taming the Resource and Data Heterogeneity in Federated Learning

Zheng Chai, George Mason University; Hannan Fayyaz, York University; Zeshan Fayyaz, Ryerson University; Ali Anwar, Yi Zhou, Nathalie Baracaldo, and Heiko Ludwig, IBM Research–Almaden; Yue Cheng, George Mason University

Available Media

Machine learning model training often require data from multiple parties. However, in some cases, data owners cannot or are not willing to share their data due to legal or privacy constraints but would still like to benefit from training a model jointly with multiple parties. To this end, federated learning (FL) has emerged as an alternative way to do collaborative model training without sharing the training data. Such collaboration leads to more accurate and performant models than any party owning a partial set of all the data sources could hope to learn in isolation.

In this paper, we study the impact of resource (e.g., CPU, memory, and network resources) and data (e.g., training dataset sizes) heterogeneity on the training time of FL. Then, we discuss the research problems and their challenges involved in taming such resource and data heterogeneity in FL systems.

A Comprehensive View of a Large-Scale Deep Learning Offline Platform

Lukas Majercak, Shansi Ren, Jack Zhang, and Junhua Wang, Microsoft

In the past several years, based on the Apache OSS technologies, we have built a scalable and efficient ML/DL inference and training platform called MagnetTar (MT). Our MT platform has been running on top of tens of thousands of Windows servers and has been widely used across Bing, Bing Ads, and News. MT has empowered many critical large-scale offline batch and streaming scenarios.

Lukas Majercak, Microsoft

Lukas is a developer in the Bing MT team. Lukas has been a key contributor in many areas in MT, such as HDFS, Spark, and Deep-Learning Offline Processing, etc.

Shansi Ren, Microsoft

Shansi is an engineering manager in the Bing MT team. Prior to MT, he had worked on Index Update and Serving technologies for many years in Bing.

Tutorials Track

Winchester Room

Deep Learning Lifecycle Management with Kubernetes, REST, and Python

Boris Tvaroska, Lenovo

In this tutorial, we will learn the basics of using trained deep learning model in applications through REST API. We will deploy model with pure Python and with Tensorflow Serving. Each application should be as simple as possible. The most straightforward approach to build REST service with the keras model is just to put the model into web framework in python. Flask is the minimalistic framework and is a good choice for simple applications or MVP. To support multiple models, or multiple versions of one model at scale we will utilize TF Serving to build a scalable API able to serve hundreds and thousands of requests per second.

Python3, experience with Docker, and basic knowledge of REST web services.

2:30 pm–3:30 pm

Presentations Track 1


Session Chair: Bharath Ramsundar, Computable

Lawrence/San Tomas/Lafayette Rooms

Impact of Data Regulations and Bias on Operational ML

Lead and moderator: Sandeep Uttamchandani, Intuit

Panelists: Kapil Surlaker, LinkedIn; Sean Grullon, GSK; Sendil Thangavelu, Mosaic—Solar FinTech Company; Arthur Roberts, HealthIQ

Data is the oil that fuels ML models across Finance, Healthcare, Retail, and every vertical industry today. The data collected from customers either directly or indirectly (using behavioral analytics) is being increasingly regulated. EU’s General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), are among several upcoming regulations for data compliance. Additionally, checks-and-bounds are required to detect bias and improve trust in ML-based services.

This panel brings together top experts across the industry to discuss how they are dealing with regulations and bias, and its impact on real-world ML deployments. In particular, we unpack the key roadblocks in implementing regulatory compliance, tools/frameworks that are required, impact on ML model lifecycle. We wrap up with panelists sharing their emerging insights on detecting bias to avoid issues in ML-based services.

Presentations Track 2

Measuring and Diagnosing Production ML

Session Chair: Marius Seritan, LinkedIn

Stevens Creek Room

The Power of Metrics—How to Monitor and Improve ML Efficiency

Yan Yan and Zhilan Zweiger, Facebook

Available Media

This talk is about an ML operational tool born on account of the rapid development of ML training workload, the need for headlight to perform issues, and the seeking of best practices for ML training. It helps to make the most of the limited computing resources and assures that the production model is of efficiency, reliability, and scalability. You will know our motivation behind developing this tool, the challenges we have faced, its main features, use cases and how diverse users have leveraged this tool during their work to improve productivity.

Yan Yan, Facebook

Yan Yan is a production engineer, working at Facebook. She belongs to the Ads Ranking PE team that improves efficiency, reliability, and scalability for machine learning at Facebook. Her mission is to share her knowledge to help society by anticipating ML problems and solve the existing ones. Previously, she received an M.S. degree in computer science from UCLA.

Zhilan Zweiger, Facebook

Zhilan Zweiger is a staff engineer and tech lead in the Production Engineering team at Facebook. She is primarily responsible for reliability, efficiency, and scalability of the Ads Machine Learning infrastructure stack. Before that, she worked at Twitter in the Data Platform SRE team where she focused on the reliability of the big data, batch compute, and streaming compute environment. Zhilan holds a master's degree in Computer Science.

MPP: Model Performance Predictor

Sindhu Ghanta, Sriram Subramanian, Lior Khermosh, Harshil Shah, Yakov Goldberg, Swaminathan Sundararaman, and Drew Roselli, ParallelM; Nisha Talagala, Pyxeda AI

Available Media

Operations is a key challenge in the domain of machine learning pipeline deployments involving monitoring and management of real-time prediction quality. Typically, metrics like accuracy, RMSE etc., are used to track the performance of models in deployment. However, these metrics cannot be calculated in production due to the absence of labels. We propose using an ML algorithm - Model Performance Predictor, to track the performance of the models in deployment. We argue that an ensemble of such metrics can be used to create a score representing the prediction quality in production. This in turn facilitates formulation and customization of ML alerts, that can be escalated by an operations team to the data science team. Such a score automates monitoring and enables ML deployments at scale.

Manifold: A Model-Agnostic Visual Debugging Tool for Machine Learning at Uber

Lezhi Li, Yunfeng Bai, and Yang Wang, Uber Inc.

Available Media

Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, Uber’s in-house model-agnostic visualization tool for ML performance diagnosis and model debugging. Manifold utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. We demonstrate current applications of the Manifold on the classification and regression tasks at Uber and discuss other potential machine learning use scenarios where Manifold can be applied.

Yang Wang, Uber Inc.

Yang Wang is a Sr. Research Engineer leading the Machine Learning Visualization team at Uber. His research interests lie in Human-Computer Interaction and High-Performance Computing, specifically, methodologies and systems to model the Interpretability and Actionability of AI-aided decision-making processes. At Uber, Yang and team build ML infrastructures, publish & tech-transfer research papers, and work across business units to help Data Scientists, Engineers, and City-Ops accelerate their model iteration process. Besides his industrial job, Yang also provides academic services to multiple HCI and Machine Learning venues.

Tutorials Track

Winchester Room

ModelOps on AWS

John Calhoun, AWS

In this talk/tutorial we will discuss the principles of DevOps and how they apply to Machine Learning (aka ModelOps) followed by a tutorial on how to apply those principles on AWS using Amazon SageMaker.

Prerequisites: Some ML experience (this is not a Machine Learning 101 talk)

John Calhoun, AWS

John Calhoun is a machine learning specialist for AWS Public Sector. He works with our customers and partners to provide leadership on machine learning, helping them shorten their time to value when using AWS.

3:30 pm–4:00 pm

Break with Refreshments

Sponsored by LinkedIn

4:00 pm–6:00 pm

Presentations Track 1

Optimizing and Tuning

Session Chair: Fei Chen, LinkedIn

Lawrence/San Tomas/Lafayette Rooms

Low-latency Job Scheduling with Preemption for the Development of Deep Learning

Hidehito Yabuuchi, The University of Tokyo; Daisuke Taniwaki and Shingo Omura, Preferred Networks, Inc.

Available Media

Efficient job scheduling of trial-and-error (TE) jobs is a challenging problem in deep learning projects. Unfortunately, existing job schedulers to date do not feature well-balanced scheduling for the mixture of TE and best-effort (BE) jobs, or they can handle the mixture in limited situations at most. To fill in this niche, we present an algorithm that efficiently schedules both TE and BE jobs by selectively preempting the BE jobs that can be, when the time comes, resumed without much delay. In our simulation study with synthetic workloads, we were able to reduce the 95th percentile of the slowdown rates for the TE jobs in the standard FIFO strategy by 96.6% while compromising the median of the BE slowdown rates by only 18.0% and the 95th percentile by only 23.9%.

tensorflow-tracing: A Performance Tuning Framework for Production

Sayed Hadi Hashemi, University of Illinois at Urbana-Champaign and National Center for Supercomputing Applications; Paul Rausch; Benjamin Rabe, University of Illinois at Urbana-Champaign and National Center for Supercomputing Applications; Kuan-Yen Chou, University of Illinois at Urbana-Champaign; Simeng Liu, University of Illinois at Urbana-Champaign and National Center for Supercomputing Applications; Volodymyr Kindratenko, National Center for Supercomputing Applications; Roy H Campbell, University of Illinois at Urbana-Champaign

Available Media

The growing popularity of Deep Neural Networks (DNN) within the mainstream \cite{gartnerhype} has had a rapid transformative effect on clusters and data centers.

DNN training jobs are becoming one of the largest tenants within clusters, and often take hours to weeks to complete; and even a slight performance improvement can save substantial runtime costs. Despite this fact, the DNN specific performance tuning tools are yet to keep up with the needs of the new changes in production environments.

On one hand, the existing application-agnostic resource-level tools such as top, Nvidia Nsight (for GPU utilization), IPM (for MPI network monitoring) are too limited to predict or explain the behavior and performance of a job accurately. In DNN applications, there exists a complex relationship among resources. Even though measuring coarse metrics such as bandwidth, latency, and GPU/CPU utilization can draw an overall picture of cluster performance, these metrics are not easily translatable to application-level metrics and do not provide actionable insights on how to handle performance bottlenecks.

On the other hand, the short list of application-aware tools, such as MLModelScope \cite{dakkak2018mlmodelscope}, TensorBoard \cite{tensorboard}, and \texttt{tf.RunOptions} \cite{tensorflow-trace}, while able to provide actionable insights, are mainly designed for the need of application developers and are not intended for production use. Such tools require substantial modification to applications, and early planning as to what, when and how data should be collected.

In this article, we introduce \texttt{tensorflow-tracing}~to fill the gap between these two classes of performance tuning tools. To achieve this goal, \texttt{tensorflow-tracing}~addresses the following technical challenges:

\begin{itemize}[noitemsep,topsep=0pt,leftmargin=*] \item Collecting the application-level runtime metrics, such as the timing of each operation or the iteration time, needs explicitly expressed in the training job source code. To makes it possible to trace ML jobs without requiring any application modification, \texttt{tensorflow-tracing}~ \textit{monkeypatches} the \texttt{tensorflow} library at the system level. \item Collecting some metrics is expensive and have a significant overhead on the runtime. \texttt{tensorflow-tracing}~treats metrics differently; it collects low-overhead metrics automatically, while expensive ones are collected on demand through an admin interface. \item There is no easy way to exchange runtime metrics among users and admins --- \texttt{tensorflow-tracing}~facilities this through a portable file format and supporting tools to explore these metrics offline. \end{itemize}

The \texttt{tensorflow-tracing}~is publicly available under \texttt{Apache-2.0} license\footnote{\url{https://github.com/xldrx/tensorflow-tracer}}. It supports native TensorFlow \cite{tensorflow}, Horovod \cite{horovod}, and IBM PowerAI \cite{powerai} applications.

Fast, Reliable, Yet Catastrophically Failing!?! Safely Avoiding Incidents When Putting Machine Learning into Production

Ramin Keene, Fuzzbox

Safely releasing machine learning based services into production presents a host of challenges that even the most experienced SRE may not expect. We'll outline some severe outages seen in the wild, their causes, and detail how emergent cutting edge techniques from the DevOps and SRE world around "testing in prod", progressive delivery, and deterministic simulation are the PERFECT solution for increasing safety, resilience, and confidence for SREs operating and managing ML-based services at scale.

Ramin Keene, Fuzzbox

Ramin has spent the last 5 years working with data teams and large enterprises to put machine learning, a/b testing, and data science products into production. He’s made ALL the mistakes and then some, helping companies lose thousands, if not millions, of dollars along the way. He is currently based in Seattle and spends his time working on adversarial experimentation tools that target infrastructure and release artifacts to help teams inspect and learn about their software AFTER it has been baked and released.

Disdat: Bundle Data Management for Machine Learning Pipelines

Ken Yocum, Sean Rowan, and Jonathan Lunt, Intuit, Inc.; Theodore M. Wong, 23andMe, Inc.

Available Media

Modern machine learning pipelines can produce hundreds of data artifacts (such as features, models, and predictions) throughout their lifecycle. During that time, data scientists need to reproduce errors, update features, re-train on specific data, validate / inspect outputs, and share models and predictions. Doing so requires the ability to publish, discover, and version those artifacts.

This work introduces Disdat, a system to simplify ML pipelines by addressing these data management challenges. Disdat is built on two core data abstractions: bundles and contexts. A bundle is a versioned, typed, immutable collection of data. A context is a sharable set of bundles that can exist on local and cloud storage environments. Disdat provides a bundle management API that we use to extend an existing workflow system to produce and consume bundles. This bundle-based approach to data management has simplified both authoring and deployment of our ML pipelines.

TonY: An Orchestrator for Distributed Machine Learning Jobs

Anthony Hsu, Keqiu Hu, Jonathan Hung, Arun Suresh, and Zhe Zhang, LinkedIn

Available Media

Training machine learning (ML) models on large datasets requires considerable computing power. To speed up training, it is typical to distribute training across several machines, often with specialized hardware like GPUs or TPUs. Managing a distributed training job is complex and requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance. In this paper, we describe TonY, an open-source orchestrator for distributed ML jobs built at LinkedIn to address these challenges.

Transfer Learning for Performance Modeling of Deep Neural Network Systems

Md Shahriar Iqbal, University of South Carolina; Lars Kotthoff, University of Wyoming; Pooyan Jamshidi, University of South Carolina

Available Media

Modern deep neural network (DNN) systems are highly configurable with large a number of options that significantly affect their non-functional behavior, for example inference time and energy consumption. Performance models allow to understand and predict the effects of such configuration options on system behavior, but are costly to build because of large configuration spaces. Performance models from one environment cannot be transferred directly to another; usually models are rebuilt from scratch for different environments, for example different hardware. Recently, transfer learning methods have been applied to reuse knowledge from performance models trained in one environment in another. In this paper, we perform an empirical study to understand the effectiveness of different transfer learning strategies for building performance models of DNN systems. Our results show that transferring information on the most influential configuration options and their interactions is an effective way of reducing the cost to build performance models in new environments.

Presentations Track 2

Solutions and Platforms

Session Chair: Suresh Raman, Intuit

Stevens Creek Room

KnowledgeNet: Disaggregated and Distributed Training and Serving of Deep Neural Networks

Saman Biookaghazadeh, Yitao Chen, Kaiqi Zhao, and Ming Zhao, Arizona State University

Available Media

Deep Neural Networks (DNNs) have a significant impact on numerous applications, such as reinforcement learning, object detection, video processing, virtual/augmented reality, etc. The ever-changing environment forces the DNN models to evolve, accordingly. Also, the transition from the cloud-only to edge-cloud paradigm has made the deployment and training of these models challenging. Addressing these challenges requires new methods and systems for continuous training and distribution of these models in a heterogeneous environment. In this paper, we propose KnowledgeNet (KN), which is a new architectural technique for a simple disaggregation and distribution of the neural networks for both training and serving. Using KN, DNNs can be partitioned into multiple small blocks and be deployed on a distributed set of computational nodes. Also, KN utilizes the knowledge transfer technique to provide small scale models with high accuracy in edge scenarios with limited resources. Preliminary results are showing that our new method can ensure a state-of-the-art accuracy for a DNN model while being disaggregated among multiple workers. Also, by using knowledge transfer technique, we can compress the model by 62% for deployment, while maintaining the same accuracy.

Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform

Denis Baylor, Kevin Haas, Konstantinos Katsiapis, Sammy Leong, Rose Liu, Clemens Menwald, Hui Miao, Neoklis Polyzotis, Mitchell Trott, and Martin Zinkevich, Google Research

Available Media

Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google.

Reinforcement Learning Based Incremental Web Crawling

Vatsal Agarwal, Innoplexus AG

Current crawling engines face a challenge in keeping the data up to date. They need to keep checking every webpage, forum thread, social media handle, blogs & news for any updates. Some webpages update every few minutes and some not for months. We present an evolutionary learning framework for identifying the incremental changes on crawled webpage in a prioritized order, doing away with the "tabula rasa" view of learning. Our model learns heuristics based on the features from the webpage & frequency of updates. It generalizes on the past data and creates a prioritization threshold.

Vatsal Agarwal, Innoplexus AG

Vatsal leads artificial intelligence at Innoplexus AG, building cutting-edge technology for the pharmaceutical and life sciences industries. He works on the life sciences language-processing engine and the domain-wide ontology used in a variety of Innoplexus products & solutions.

Vatsal has more than a decade of experience in data science, software development, and bioinformatics. His primary focus is to bring advancements in artificial intelligence and big data to life sciences and help patients get faster, more efficient treatments. Vatsal has filed over 40 patent applications and written several peer-reviewed publications on machine learning and bioinformatics.

Katib: A Distributed General AutoML Platform on Kubernetes

Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, and Anubhav Garg, Cisco Systems; Yuji Oshima, NTT Software Innovation Center; Debo Dutta, Cisco Systems

Available Media

Automatic Machine Learning (AutoML) is a powerful mechanism to design and tune models. We present Katib, a scalable Kubernetes-native general AutoML platform that can support a range of AutoML algorithms including both hyper-parameter tuning and neural architecture search. The system is divided into separate components, encapsulated as micro-services. Each micro-service operates within a Kubernetes pod and communicates with others via well-defined APIs, thus allowing flexible management and scalable deployment at a minimal cost. Together with a powerful user interface, Katib provides a universal platform for researchers as well as enterprises to try, compare and deploy their AutoML algorithms, on any Kubernetes platform.

Machine Learning Models as a Service

Tobias Wenzel and Vigith Maurice, Intuit

Available Media

"We spent more time bringing the model to production than developing and training it." -Data Scientist, Intuit

The mission of our team at Intuit is to enable data scientists to deploy machine learning models with the push of a button and making them available in production at scale. We will be sharing challenges and solutions on orchestration, monitoring and diagnosing of machine learning models serving production traffic for TurboTax, Quickbooks and Mint.

Tobias Wenzel, Intuit

Tobias Wenzel is a Staff Software Engineer for the Intuit Machine Learning Platform in Mountian View, California. He has been working on the platform since its inception in 2016 and has helped design and build it from the ground up. He recently moved to the US from Germany to continue his work here in the bay area. Today he focusses on operational excellence of the platform and bringing it successfully through Intuit's seasonal business.

Vigith Maurice, Intuit

Vigith is a Principal Site Reliability Engineer for the Intuit Data Platform team in Mountain View, California. For the past 4 years, he has been a key driver for Intuit's journey to Big Data--first in Intuit data centers, and more recently in the Cloud. One of Vigith's current day-to-day focus areas is on the difficult and various challenges in building scalable monitoring solutions for both batch and high throughput systems. Previously, he lead various engineering initiatives at Yahoo.

Stratum: A Serverless Framework for the Lifecycle Management of Machine Learning-based Data Analytics Tasks

Anirban Bhattacharjee, Yogesh Barve, Shweta Khare, Shunxing Bao, and Aniruddha Gokhale, Vanderbilt University; Thomas Damiano, Lockheed Martin Advanced Technology Labs

Available Media

With the proliferation of machine learning (ML) libraries and frameworks, and the programming languages that they use, along with operations of data loading, transformation, preparation and mining, ML model development is becoming a daunting task. Furthermore, with a plethora of cloud-based ML model development platforms, heterogeneity in hardware, increased focus on exploiting edge computing resources for low-latency prediction serving and often a lack of a complete understanding of resources required to execute ML workflows efficiently, ML model deployment demands expertise for managing the lifecycle of ML workflows efficiently and with minimal cost. To address these challenges, we propose an end-to-end data analytics, a serverless platform called Stratum. Stratum can deploy, schedule and dynamically manage data ingestion tools, live streaming apps, batch analytics tools, ML-as-a-service (for inference jobs), and visualization tools across the cloud-fog-edge spectrum. This paper describes the Stratum architecture highlighting the problems it resolves.

Tutorials Track

Winchester Room

Accelerating the Machine Learning Lifecycle with MLflow

Andrew Chen, DataBricks

ML development brings many new complexities beyond traditional software development. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.

In this session, we introduce MLflow, an open source ML platform started by Databricks in 2018 that is designed to integrate easily with arbitrary ML libraries, deployment tools, and workflows. MLflow introduces simple abstractions to package reproducible pipelines, track results, and encapsulate models that streamline sharing and productionizing ML. The project has a fast-growing open source community, with 80 contributors from over 40 companies, and integrations with Python, Java, R, and dozens of ML libraries and services. We show how to set up MLflow and execute various workflows in it based on best practices from current users.

Consistent Multi-Cloud AI Lifecycle Management with Kubeflow

Debo Dutta and Xinyuan Huang, Cisco

The journey or the AI/ML lifecycle consists of several steps ranging from accessing the data to training the models and then deploying it. This process is an involved one and is a subject of rapid engineering (especially in open source) and research (e.g. OpML). In this tutorial, we articulate the technical challenges faced during the AI/ML lifecycle management by a variety of persona ranging from the ML scientist to the ML DevOps engineer. We introduce a consistent platform across multiple clouds called Kubeflow, to help solve the challenges faced in multi-cloud AI/ML lifecycle management.

Debo Dutta, Cisco

Debo is a distinguished engineer at Cisco where he incubates and now leads an AI/ML systems team. His team’s efforts include major contributions to Kubeflow and neural architecture search (autoML).

Xinyuan Huang, Cisco

Xinyuan Huang is a software engineer at Cisco, where he works on research and development of AI/ML systems. He is an active member in the Kubeflow community and owner of the Kubebench project.

6:00 pm–7:00 pm

Poster Session and Happy Hour