Apache Kafka vs RabbitMQ


RabbitMQ is an open-source message-broker software that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Streaming Text Oriented Messaging Protocol (STOMP), MQ Telemetry Transport (MQTT), and other protocols.

Written in Erlang, the RabbitMQ server is built on the Open Telecom Platform framework for clustering and failover. Client libraries to interface with the broker are available for all major programming languages. The source code is released under the Mozilla Public License.

Messaging

In RabbitMQ, messages are stored until a receiving application connects and receives a message off the queue. The client can either ack (acknowledge) the message when it receives it or when the client has completely processed the message. In either situation, once the message is acked, it’s removed from the queue.

unlike most messaging systems, the message queue in Kafka is persistent. The data sent is stored until a specified retention period has passed, either a period of time or a size limit. The message stays in the queue until the retention period/size limit is exceeded, meaning the message is not removed once it’s consumed. Instead, it can be replayed or consumed multiple times, which is a setting that can be adjusted.

Protocol

RabbitMQ supports several standardized protocols such as AMQP, MQTT, STOMP, etc, where it natively implements AMQP 0.9.1. The use of a standardized message protocol allows you to replace your RabbitMQ broker with any AMQP based broker.

Kafka uses a custom protocol, on top of TCP/IP for communication between applications and the cluster. Kafka can’t simply be removed and replaced, since its the only software implementing this protocol.

The ability of RabbitMQ to support different protocols means that it can be used in many different scenarios. The newest version of AMQP differs drastically from the officially supported release, 0.9.1. It is unlikely that RabbitMQ will deviate from AMQP 0.9.1. Version 1.0 of the protocol released on October 30, 2011 but has not gained widespread support from developers. AMQP 1.0 is available via plugin.

Pull vs Push approach

RabbitMQ is push-based, while Kafka is pull-based. With push-based systems, messages are immediately pushed to any subscribed consumer. In pull-based systems, the brokers waits for the consumer to ask for data. If a consumer is late, it can catch up later.

Routing

RabbitMQ’s benefits is the ability to flexibly route messages. Direct or regular expression-based routing allows messages to reach specific queues without additional code. RabbitMQ has four different routing options: direct, topic, fanout, and header exchanges. Direct exchanges route messages to all queues with an exact match for something called a routing key. The fanout exchange can broadcast a message to every queue that is bound to the exchange. The topics method is similar to direct as it uses a routing key but allows for wildcard matching along with exact matching.

Kafka does not support routing; Kafka topics are divided into partitions which contain messages in an unchangeable sequence. You can make use of consumer groups and persistent topics as a substitute for the routing in RabbitMQ, where you send all messages to one topic, but let your consumer groups subscribe from different offsets.

Message Priority

RabbitMQ supports priority queues, a queue can be set to have a range of priorities. The priority of each message can be set when it is published. Depending on the priority of the message it is placed in the appropriate priority queue. Here follows a simple example: We are running database backups every day, for our hosted database service. Thousands of backup events are added to RabbitMQ without order. A customer can also trigger a backup on demand, and if that happens, a new backup event is added to the queue, but with a higher priority.

A message cannot be sent with a priority level, nor be delivered in priority order, in Kafka. All messages in Kafka are stored and delivered in the order in which they are received regardless of how busy the consumer side is.

License

RabbitMQ was originally created by Rabbit Technologies Ltd. The project became part of Pivotal Software in May 2013. The source code for RabbitMQ is released under the Mozilla Public License. The license has never changed (as of Nov. 2019).

Kafka was originally created at LinkedIn. It was given open-source status and passed to the Apache Foundation in 2011. Apache Kafka is covered by the Apache 2.0 license. 

Maturity

RabbitMQ has been in the market for a longer time than Kafka – 2007 & 2011 respectively. Both RabbitMQ and Kafka are “mature”, they both are considered to be reliable and scalable messaging systems.

Ideal use case

Kafka is ideal for big data use cases that require the best throughput, while RabbitMQ is ideal for low latency message delivery, guarantees on a per-message basis, and complex routing.

Summary

ToolApache KafkaRabbitMQ
Message orderingMessages are sent to topics by message key.
Provides message ordering due to its partitioning.
Not supported.
Message lifetimeKafka persists messages and is a log, this is managed by specifying a retention policyRabbitMQ is a queue, so messages are removed once they are consumed, and acknowledgment is provided.
Delivery GuaranteesRetains order only inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fails or passes.Atomicity is not guaranteed
Message prioritiesNot supportedIn RabbitMQ, priorities can be specified for consuming messages on basis of high and low priorities

References

Apache Kafka – Java Producer & Consumer


Apache Kakfa is an opensource distributed event streaming platform which works based on publish/subscribe messaging system. That means, there would be a producer who publishes messages to Kafka and a consumer who reads messages from Kafka. In between, Kafka acts like a filesystem or database commit log.

In this article we will discuss about writing a Kafka producer and consumer using Java with customized serialization and deserializations.

Kakfa Producer Application:

Producer is the one who publish the messages to Kafka topic. Topic is a partitioner in Kafka environment, it is very similar to a folder in a file system. In the below example program, messages are getting published to Kafka topic ‘kafka-message-count-topic‘.

package com.malliktalksjava.kafka.producer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

import com.malliktalksjava.kafka.constants.KafkaConstants;
import com.malliktalksjava.kafka.util.CustomPartitioner;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.LongSerializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class KafkaSampleProducer {

    static Logger log = LoggerFactory.getLogger(KafkaSampleProducer.class);

    public static void main(String[] args) {
        runProducer();
    }

    static void runProducer() {
        Producer<Long, String> producer = createProducer();

        for (int index = 0; index < KafkaConstants.MESSAGE_COUNT; index++) {
            ProducerRecord<Long, String> record = new ProducerRecord<Long, String>(KafkaConstants.TOPIC_NAME,
                    "This is record " + index);
            try {
                RecordMetadata metadata = producer.send(record).get();
                //log.info("Record sent with key " + index + " to partition " + metadata.partition() +
                 //       " with offset " + metadata.offset());
                System.out.println("Record sent with key " + index + " to partition " + metadata.partition() +
                        " with offset " + metadata.offset());
            } catch (ExecutionException e) {
                log.error("Error in sending record", e);
                e.printStackTrace();
            } catch (InterruptedException e) {
                log.error("Error in sending record", e);
                e.printStackTrace();
            }
        }
    }

    public static Producer<Long, String> createProducer() {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.KAFKA_BROKERS);
        props.put(ProducerConfig.CLIENT_ID_CONFIG, KafkaConstants.CLIENT_ID);
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class.getName());
        return new KafkaProducer<>(props);
    }
}

Kakfa Consumer Program:

Consumer is the one who subscribe to Kafka topic to read the messages. There are different ways to read the messages from Kafka, below example polls the topic for every thousend milli seconds to fetch the messages from Kafka.

package com.malliktalksjava.kafka.consumer;

import java.util.Collections;
import java.util.Properties;

import com.malliktalksjava.kafka.constants.KafkaConstants;
import com.malliktalksjava.kafka.producer.KafkaSampleProducer;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.LongDeserializer;
import org.apache.kafka.common.serialization.StringDeserializer;

import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class KafkaSampleConsumer {
    static Logger log = LoggerFactory.getLogger(KafkaSampleProducer.class);

    public static void main(String[] args) {
        runConsumer();
    }

    static void runConsumer() {
        Consumer<Long, String> consumer = createConsumer();
        int noMessageFound = 0;
        while (true) {
            ConsumerRecords<Long, String> consumerRecords = consumer.poll(1000);
            // 1000 is the time in milliseconds consumer will wait if no record is found at broker.
            if (consumerRecords.count() == 0) {
                noMessageFound++;
                if (noMessageFound > KafkaConstants.MAX_NO_MESSAGE_FOUND_COUNT)
                    // If no message found count is reached to threshold exit loop.
                    break;
                else
                    continue;
            }

            //print each record.
            consumerRecords.forEach(record -> {
                System.out.println("Record Key " + record.key());
                System.out.println("Record value " + record.value());
                System.out.println("Record partition " + record.partition());
                System.out.println("Record offset " + record.offset());
            });

            // commits the offset of record to broker.
            consumer.commitAsync();
        }
        consumer.close();
    }

        public static Consumer<Long, String> createConsumer() {
            Properties props = new Properties();
            props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaConstants.KAFKA_BROKERS);
            props.put(ConsumerConfig.GROUP_ID_CONFIG, KafkaConstants.GROUP_ID_CONFIG);
            props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
            props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
            props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, KafkaConstants.MAX_POLL_RECORDS);
            props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
            props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, KafkaConstants.OFFSET_RESET_EARLIER);

            Consumer<Long, String> consumer = new KafkaConsumer<>(props);
            consumer.subscribe(Collections.singletonList(KafkaConstants.TOPIC_NAME));
            return consumer;
        }
}

Messages will be published to a Kafka partition called Topic. A Kafka topic is sub-divided into units called partitions for fault tolerance and scalability.

Every Record in Kafka has key value pairs, while publishing messages key is optional. If you don’t pass the key, Kafka will assign its own key for each message. In Above example, ProducerRecord<Integer, String> is the message that published to Kafka has Integer type as key and String as value.

Message Model Class: Below model class is used to publish the object. Refer to below descriptions on how this class being used in the application.

package com.malliktalksjava.kafka.model;

import java.io.Serializable;

public class Message implements Serializable{

    private static final long serialVersionUID = 1L;

    private String id;
    private String name;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

Constants class: All the constants related to this application have been placed into below class.

package com.malliktalksjava.kafka.constants;

public class KafkaConstants {

    public static String KAFKA_BROKERS = "localhost:9092";

    public static Integer MESSAGE_COUNT=100;

    public static String CLIENT_ID="client1";

    public static String TOPIC_NAME="kafka-message-count-topic";

    public static String GROUP_ID_CONFIG="consumerGroup1";

    public static String GROUP_ID_CONFIG_2 ="consumerGroup2";

    public static Integer MAX_NO_MESSAGE_FOUND_COUNT=100;

    public static String OFFSET_RESET_LATEST="latest";

    public static String OFFSET_RESET_EARLIER="earliest";

    public static Integer MAX_POLL_RECORDS=1;
}

Custom Serializer: Serializer is the class which converts java objects to write into disk. Below custom serializer is converting the Message object to JSON String. Serialized message will be placed into Kafka Topic, this message can’t be read until it is deserialized by the consumer.

package com.malliktalksjava.kafka.util;

import java.util.Map;
import com.malliktalksjava.kafka.model.Message;
import org.apache.kafka.common.serialization.Serializer;
import com.fasterxml.jackson.databind.ObjectMapper;

public class CustomSerializer implements Serializer<Message> {

    @Override
    public void configure(Map<String, ?> configs, boolean isKey) {

    }

    @Override
    public byte[] serialize(String topic, Message data) {
        byte[] retVal = null;
        ObjectMapper objectMapper = new ObjectMapper();
        try {
            retVal = objectMapper.writeValueAsString(data).getBytes();
        } catch (Exception exception) {
            System.out.println("Error in serializing object"+ data);
        }
        return retVal;
    }
    @Override
    public void close() {

    }
}

Custom Deserializer: Below custom deserializer, converts the serealized object coming from Kafka into Java object.

package com.malliktalksjava.kafka.util;

import java.util.Map;

import com.malliktalksjava.kafka.model.Message;
import org.apache.kafka.common.serialization.Deserializer;

import com.fasterxml.jackson.databind.ObjectMapper;

public class CustomObjectDeserializer implements Deserializer<Message> {

    @Override
    public void configure(Map<String, ?> configs, boolean isKey) {
    }

    @Override
    public Message deserialize(String topic, byte[] data) {
        ObjectMapper mapper = new ObjectMapper();
        Message object = null;
        try {
            object = mapper.readValue(data, Message.class);
        } catch (Exception exception) {
            System.out.println("Error in deserializing bytes "+ exception);
        }
        return object;
    }

    @Override
    public void close() {
    }
}

Custom Partitioner: If you would like to do any custom settings for Kafka, you can do that using the java code. Below is the sample custom partitioner created as part of this applicaiton.

package com.malliktalksjava.kafka.util;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import java.util.Map;

public class CustomPartitioner implements Partitioner{

    private static final int PARTITION_COUNT=50;

    @Override
    public void configure(Map<String, ?> configs) {

    }

    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        Integer keyInt=Integer.parseInt(key.toString());
        return keyInt % PARTITION_COUNT;
    }

    @Override
    public void close() {
    }
}

Here is the GIT Hub link for the program: https://github.com/mallikarjungunda/kafka-producer-consumer

Hope you liked the details, please share your feedback in comments.

Apache Kafka – Environment Setup


Apache Kakfa is an opensource distributed event streaming platform which works based on publish/subscribe messaging system. That means, there would be a producer who publishes messages to Kafka and a consumer who reads messages from Kafka. In between, Kafka acts like a filesystem or database commit log.

In this post we will setup kafka local environment, create topic, publish and consume messages using console clients.

Step 1: Download latest version of Apache Kafka from Apache Kafka website: https://kafka.apache.org/downloads.

Extract the folder into your local and navigate to the folder in Terminal session (if Mac) or command line (if windows):

$ tar -xzf kafka_2.13-3.1.0.tgz 
$ cd kafka_2.13-3.1.0

Step 2: Run Kafka in your local:

Run zookeeper using the below command terminal/command line window 1:

# Start the ZooKeeper service
$ bin/zookeeper-server-start.sh config/zookeeper.properties

Run Kafka using the below command in another terminal or command line:

# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

Note: You must have Java8 or above in your machine to run Kafka.

Once above two services are run successfully in local, you are set with running Kafka in your local machine.

Step 3: Create topic in Kafka to produce/consume the message in another terminal or command like. In below example, topic name is ‘order-details’ and kafka broker is running in my localhost 9092 port.

$ bin/kafka-topics.sh --create --topic order-details --bootstrap-server localhost:9092

If needed, use describe topic to understand more details about topic created above:

$ bin/kafka-topics.sh --describe --topic order-details --bootstrap-server localhost:9092

Output looks like below:
Topic: order-details	PartitionCount: 1	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: order-details	Partition: 0	Leader: 0	Replicas: 0	Isr: 0

Step 4: Write events to topic

Run the console producer client to write a few events into your topic. By default, each line you enter will result in a separate event being written to the topic.

$ bin/kafka-console-producer.sh --topic order-details --bootstrap-server localhost:9092
Order 1 details
Order 2 details

Step 5: Read events from Kafka

Open another terminal session/command line and run the console consumer client to read the events you just created:

$ bin/kafka-console-consumer.sh --topic order-details --from-beginning --bootstrap-server localhost:9092
Order 1 details
Order 2 details

Conclusion:

By completing all the above steps, you learned about setting up kafka environment, creating topics, producing the messages using console producer and consming the message using console consumer.