Tutorials on Apache Camel, Apache Kafka, JBoss Fuse

Search our tutorials

This is our first tutorial about Apache Kafka . We will start learning what Apache Kafka and why we need in our Enterprise Integration layer.

Apache Kafka .is a real-time publish-subscribe solution messaging system: open source, distributed, partitioned, replicated, commit-log based with a publish-subscribe schema.

Every message in Kafka topics is a collection of bytes. This collection is represented as an array. Producers are the applications that store information in Kafka queues. They send messages to Kafka topics that can store all types of messages. Every topic is further differentiated into partitions. Each partition stores messages in the sequence in which they arrive. There are two major operations that producers/consumers can perform in Kafka. Producers append to the end of the write-ahead log files. Consumers fetch messages from these log files belonging to a given topic partition. Physically, each topic is spread over different Kafka brokers, which host one or two partitions of each topic.

Kafka Use cases

In which layer in the architecture should you put Kafka? Here are some real Enterprise use cases:
Commit logs: . What happens when your system does not have a log system? In these cases, you can use Kafka. Many times systems do not have logs, simply because it’s not possible to handle a large data volume. The stories of application servers falling simply because they could not write their logs correctly with the verbosity needed by the business are more common than it seems. Kafka can also help to start and restart fallen log servers.
Log aggregation:  Contrary to common belief, much of the work of the onsite support team is on log analysis. Kafka provides a system for log management, plus it can also handle heterogeneous aggregation of several logs. Kafka can physically collect the logs and remove cumbersome details such as file location or format. In addition, it provides low latency and supports multiple data sources while making distributed consumption.
Messaging:  Enterprise Systems are often heterogeneous, and instead of rewriting them, you have to translate between them. Often the vendor’s adapters might not be an option for a company; for such cases, Kafka is the solution because it is open source and can handle more volume than many traditional commercial brokers.
Stream processing: In some business cases, the process of collecting information goes through several stages. An obvious example is when a broker is used not only to gather information but also to transform it. This is the real meaning and success of the Enterprise Service Bus (ESB) architectures. With Kafka, the information can be collected and further enriched; this (very well paid) enrichment process is known as stream processing .
Record user activity . Many marketing and advertising companies are interested in recording all the customer activity on a web page. This seems a luxury, but until recently, it was very difficult to keep track of the clicks that a user makes on a site. For those tasks where the data volume is huge, you can use Kafka for real-time process and monitoring. All of this seems good, but who is using Kafka today? Here are some examples:

• LinkedIn: Used for activity stream and operational metrics. 
• Uber: Relied on Kafka data feeds to bulk-load log data into Amazon S3 to stream change-data logs from the local data centers .
• Twitter: Handling five billion sessions a day in real time requires Kafka to handle their stream processing infrastructure.
• Netflix: Kafka is the backbone of Netflix’s data pipeline for real-time monitoring and event processing.

Kafka architecture

In Kafka, there are three types of clusters:

  • Single node–single broker
  • Single node–multiple broker
  • Multiple node–multiple broker

A Kafka cluster has five main components:
Topic . A category or feed name in which messages are published by the message producers. Topics are partitioned; each partition is represented by an ordered immutable messages sequence. The cluster has a partitioned log for each topic. Each message in the partition has a unique sequential id called an offset.
Broker . A Kafka cluster has one or more physical servers in which each one may have one or more server processes running. Each server process is called a broker . The topics live in the broker processes.
Producer . Publishes data to topics by choosing the appropriate partition in the topic. For load balancing, the messages allocation to the topic partition can be done in a round-robin mode or by defining a custom function.
Consumer . Applications or processes subscribed to topics and process the feed of published messages.
ZooKeeper . ZooKeeper is the coordinator between the broker and the consumers. ZooKeeper coordinates the distributed processes through a shared hierarchical name space of data registers; these registers are called znodes .

what is kafka

Continue learning Apache Kafka in the next tutorial: Kafka tutorial #2: Getting started with Kafka

FREE WildFly Application Server - JBoss - Quarkus - Drools Tutorials