Mastering Apache Kafka: A Hands-On Beginner's Guide
Written on
Introduction to Hands-On Learning with Kafka
“Practical experience outweighs theoretical knowledge” — Vishnudevananda Saraswati.
To effectively grasp a new technology, engaging in hands-on practice is incredibly valuable. By actively participating, learners can bridge the gap between theory and practice. Many accomplished engineers favor this experiential approach, and we will adopt a similar methodology while exploring Apache Kafka.
Prior discussions have covered the essential fundamentals of Apache Kafka necessary for practical application. If you're unfamiliar with key terms like Producers, Consumers, Events, and Topics, it would be beneficial to review introductory material before proceeding.
Before we start our hands-on journey, there’s one crucial concept we need to cover: ZooKeeper.
Understanding Apache ZooKeeper
ZooKeeper plays a vital role in the Kafka ecosystem. It is tasked with maintaining essential metadata, including cluster configuration and consumer details. ZooKeeper tracks brokers and partitions, manages them, and sends notifications to the Kafka server about events like broker failures or new topics. It's important to note that a Kafka server cannot operate without a ZooKeeper instance.
With this understanding, we can now set up our own Kafka cluster on local systems. Let's begin by downloading the necessary software.
Setting Up Apache Kafka
First, create a new directory for our Kafka installation. You can choose any name; I will use "kafka."
$ mkdir ~/kafka
$ cd ~/kafka
Once inside the directory, we'll download the latest version of Apache Kafka. We'll use wget for this, but you can also download it manually from the Apache website. Don't forget to move the file to the ~/kafka directory.
Next, we need to extract the downloaded file to access its contents:
$ tar -xvzf kafka_2.13-3.2.0.tgz
After running this command, a new directory should appear. You can verify this with the following command:
$ ls
kafka_2.13-3.2.0 kafka_2.13-3.2.0.tgz
Now that we've completed the download, change into the newly created folder and prepare for the next steps.
$ cd kafka_2.13-3.2.0
Starting the Kafka ZooKeeper
The initial step in launching our Kafka cluster is to activate ZooKeeper. Apache Kafka includes a bash script for this purpose located at bin/zookeeper-server-start.sh. Let’s execute it using the following command:
$ bin/zookeeper-server-start.sh
USAGE: bin/zookeeper-server-start.sh [-daemon] zookeeper.properties
The script requires a properties file, which it indicates by the usage message. Kafka provides a default configuration file for ZooKeeper located at config/zookeeper.properties. You can check the contents of this file, which includes properties like the port ZooKeeper operates on and where it keeps its metadata.
For now, we can use this default configuration without modification. Run the following command to start ZooKeeper:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
Keep this shell session open as we will need it throughout this tutorial.
Starting the Kafka Server
Next, we will bring up the Kafka server. Similarly, a shell script is available for this task at bin/kafka-server-start.sh. We will execute it as follows:
$ bin/kafka-server-start.sh
USAGE: bin/kafka-server-start.sh [-daemon] server.properties [--override property=value]*
Again, a properties file is needed. Kafka supplies a default properties file located at config/server.properties. Let's pass this file to the script:
$ bin/kafka-server-start.sh config/server.properties
At this point, the Kafka server will begin listening on port 9092.
Now that our setup is complete, we can create a new topic, publish messages to it, and consume them.
Creating Your First Topic
In this section, we will establish our very first topic. Kafka provides a script to manage topics, found at bin/kafka-topics.sh. To create a topic named "my-first-topic," run the following command:
$ bin/kafka-topics.sh --create --topic my-first-topic --bootstrap-server localhost:9092
The --create flag indicates our intent to create a new topic, while the --bootstrap-server parameter specifies where the Kafka server is running.
Producing Messages to the New Topic
Now it's time to send some messages to our newly created topic. We will use another script located at bin/kafka-console-producer.sh. Here’s how to start the message publication process:
$ bin/kafka-console-producer.sh --topic my-first-topic --bootstrap-server localhost:9092
After executing this command, a prompt (>) will appear in your terminal. You can start typing messages and hit enter to publish them:
> First Event in my topic
> Second Event in my topic
You can continue publishing messages or exit by pressing Ctrl + C.
Consuming Messages from the New Topic
Having published messages, the next step is to consume them. Kafka provides a consumer script located at bin/kafka-console-consumer.sh. Run it with the following command:
$ bin/kafka-console-consumer.sh --topic my-first-topic --from-beginning --bootstrap-server localhost:9092
Upon executing this command, the consumer will read and display messages from the topic:
First Event in my topic
Second Event in my topic
Feel free to publish more messages, and they will appear in the consumer output.
Congratulations! You have now completed a comprehensive example of running Kafka. You can successfully initiate ZooKeeper and Kafka Server, create topics, and publish and consume messages. Many additional configurations can be explored, such as multiple producers and consumers interacting with the same topic. Readers are encouraged to experiment further.
In conclusion, I hope you found this guide helpful. To continue your learning journey, check out additional articles, and don’t forget to follow me for updates. You can also connect with me on Twitter.
Beginner's Guide to Apache Kafka with Practical Projects
This video offers a hands-on introduction to Apache Kafka, guiding you through essential concepts and practical implementations.
Exploring Kafka in 10 Minutes: Your First Application
This quick tutorial helps you build your first Kafka application in under 10 minutes, ensuring a smooth entry into the world of Kafka.