Automated Troubleshooting of Live Site Issues

Tuesday, May 23, 2017 - 3:00pm3:25pm

Sriram Srinivasan, PayPal India Private Ltd.


Troubleshooting of live site issues can be challenging especially when our production stack is made up of over 2000 applications and services. PayPal’s SRE team is also involved in troubleshooting and driving resolution of the various live site issues reported by the customer and merchant support teams. Today to troubleshoot a live site issue, we go to multiple places depending on the issue at hand. Predominantly we go and look into the Centralized Application Logs. Then we also check the various data sources and the in-house alerts. There is so much of information to look for. A lot of effort goes into gathering data about the failed attempt/transaction from various sources internal to PayPal. Thus we needed an automated way to troubleshoot issues. So we have developed an Auto Troubleshooting Platform that aggregates the data from all the underlying data sources, troubleshoots and records the results. The Platform is built in a way that anyone can post any type of ticket and get it troubleshooted automatically. Auto Troubleshooting Results will be available in minutes and the same can be seen through a portal. In this talk, I will highlight the journey that we have undertaken in making this happen.

Sriram Srinivasan, PayPal India Private Ltd.

Sriram Srinivasan is a technologist with over 14 years of experience in Software Development. He worked in multiple teams at PayPal India Private Ltd. in various aspects of the software development lifecycle, including conception, design, development and supporting products. In his current role as Architect at PayPal's SRE team, he got an opportunity to design and develop an Auto Troubleshooting Platform.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {202767,
author = {Sriram Srinivasan},
title = {Automated Troubleshooting of Live Site Issues},
year = {2017},
publisher = {USENIX Association},
month = may

Presentation Video 

Presentation Audio