Apache Kafka from Scratch
Thurgood Marshall East
Apache Kafka is a publish/subscribe messaging system that is in use within companies including LinkedIn, Twitter, Netflix, and many others. It is used to build Extract, Transform, and Load (ETL) pipelines, collect metrics and logs, and queue data between applications, often providing the main backbone for moving data within big data infrastructures. This tutorial will focus on how to get started with Kafka, including working with ZooKeeper, which it depends on. We will cover installation, configuring retention and replication, and creating simple applications for producing and consuming messages.
This tutorial is designed for engineers, both operations and development, who are new to Apache Kafka and publish/subscribe messaging. The only prerequisite knowledge for participating is the ability to install software and execute basic shell commands. The ability to write basic Python programs is helpful, but not required. Full working versions of all scripts used in the tutorial will be provided.
Participants will leave the tutorial with an understanding of how to set up Apache ZooKeeper and Apache Kafka, and create message producers and consumers, having completed this work on their own systems. This will allow them to set up a publish/subscribe messaging infrastructure that can be used for myriad applications, including monitoring, logging, queuing, and tracking user-generated events.
ZooKeeper
- What is ZooKeeper?
- What is it NOT?
- Standalone Setup
Apache Kafka
- Publish/Subscribe Messaging
- Kafka Architecture
- Installing Kafka
Producing Messages
- Message Schema
- Using the Console Producer
- Producing Inside Applications
Consuming Messages
- Using the Console Consumer
- Consuming Inside Applications
- Limitations of Non-Java Consumers
Kafka Clusters
- Adding a Second Broker (partner work)
- Replicating Partitions
- Creating Multiple Partitions
Message Retention
- Retention by Size
- Retention by Key (log compacted)
Use Cases
- Monitoring
- Log Collection
- User-generated Events
- Queuing
The exact requirements are to be determined. Each attendee will be required to bring a laptop, preferably Mac or Linux, on which they can install required software, including a Java development kit (if not already installed), Apache ZooKeeper, and Apache Kafka. More detailed instructions will be provided prior to the conference.
connect with us