Website Maintenance Alert
Due to scheduled maintenance on Wednesday, October 16, from 10:30 am to 4:30 pm Pacific Daylight Time (UTC -7), parts of the USENIX website (e.g., conference registration, user account changes) may not be available. We apologize for the inconvenience.
If you are trying to register for LISA19, please complete your registration before or after this time period.
Fast and Interactive Analytics over Hadoop Data with Spark
MATEI ZAHARIA, MOSHARAF CHOWDHURY, TATHAGATA DAS, ANKUR DAVE, JUSTIN MA, MURPHY MCCAULEY, MICHAEL J. FRANKLIN, SCOTT SHENKER, AND ION STOICA
The past few years have seen tremendous interest in large-scale data analysis, as data volumes in both industry and research continue to outgrow the processing speed of individual machines. Google’s MapReduce model and its open source implementation, Hadoop, kicked off an ecosystem of parallel data analysis tools for large clusters, such as Apache’s Hive and Pig engines for SQL processing; however, these tools have so far been optimized for one-pass batch processing of on-disk data, which makes them slow for interactive data exploration and for the more complex multi-pass analytics algorithms that are becoming common.