- FAST '14 Home
- Conference Organizers
- Registration Information
- At a Glance
- Training Program
- Technical Sessions
- Purchase the Box Set
- Students and Grants
- Help Promote!
- For Participants
- Call for Papers
- Past Conferences
You are here
Introduction to Apache Hadoop and Its Ecosystem
Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.
This session is intended for those who are new to Hadoop and are seeking to understand what Hadoop is, the ways that organizations are using it, and how it compares to and integrates with other systems. It assumes no prior knowledge of Hadoop, and explanations of technical topics like MapReduce and HDFS replication are clear and concise, making it appropriate for anyone attending the conference.
- What Hadoop is and how organizations are using it
- How the HDFS filesystem provides reliability and high throughput
- How MapReduce enables parallel processing on large data sets
- Explanations of some popular open source tools that integrate with Hadoop
- Typical architecture of a Hadoop cluster
- Considerations for hosting a Hadoop cluster
- Emerging trends in the design and implementation of Hadoop