Apache Pulsar is a messaging system built on the publish-subscribe pattern, which consists of broker, Apache BookKeeper, producer, consumer, and other components.
Apache Pulsar adopts the computing-storage separation architecture, where the computing logic related to message publishing and subscription is implemented in brokers, and data is stored on the bookie nodes in an Apache BookKeeper cluster.
Topic is a category name where messages are stored and published. Producers write messages to topics, and consumers read the messages from these topics.
Pulsar topics are divided into partitioned topic and non-partitioned topic, with the latter referring to a topic with 1 partition. In fact, topic is a virtual concept in Pulsar, and a 3-partition topic actually refers to three partitioned topics, and messages sent to a 3-partition topic will be sent to these three partitioned topics.
For example, if a producer sends a message to a 3-partition topic named
my-topic, the message is actually sent to the 3 partitioned topics
my-topic-partition-2 evenly or according to a rule (if a key is specified).
When partitioned topics persistently store data, partition is a logical concept, and the actual storage unit is segment.
As shown below, data in the
Topic1-Part2 partition consists of N segments, each of which is evenly distributed and stored on multiple bookie nodes in the Apache BookKeeper cluster and has 3 replicas.
Logical partitions and physical partitions are compared as follows:
Physical partition: computing and storage are coupled, fault tolerance requires copying physical partitions, and capacity expansion requires migrating physical partitions to implement load balancing.
Logical partition: physical segment where the computing layer is isolated from the storage layer. With this structure, Apache Pulsar has the following strengths: