The past few years have seen tremendous interest in large-scale data analysis, as data volumes in both industry and research continue to outgrow the processing speed of individual machines. Google’s MapReduce model and its open source implementation, Hadoop, kicked off an ecosystem of parallel data analysis tools for large clusters, such as Apache’s Hive and Pig engines for SQL processing; however, these tools have so far been optimized for one-pass batch processing of on-disk data, which makes them slow for interactive data exploration and for the more complex multi-pass analytics algorithms that are becoming common.
Download Article:
Article Section:
NETWORKED SYSTEMS
;login: issue:
- Log in to post comments