You are here
Architecting Applications on Hadoop
Grand Ballroom C
During the first half of the tutorial, we will provide an intro to Apache Hadoop and the ecosystem. In the second half, we will show, using an end-to-end application of clickstream analytics, how users can:
- Model data in Hadoop, select optimal storage formats for data stored in Hadoop
- Move data between Hadoop and external systems such as relational databases and logs
- Access and process data in Hadoop
- Orchestrate and scheduling workflows on Hadoop
Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.
Students should bring laptops with a copy of the of the Cloudera Quickstart VM (or access to a working alternate VM or Hadoop cluster). The VM can be downloaded from here.
These are a 64-bit VMs. They requires a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won't be available.
|CDH and Cloudera Manager Version||RAM Required by VM||File Size|
|CDH 5 and Cloudera Manager 5||4 GB||3 GB|
|CDH 4, Cloudera Impala, Cloudera Search, and Cloudera Manager 4||4 GB||2 GB|