How Do You Troubleshoot Production or Consumption Issues with the Sarama Go Cluster Client?
Currently, the sarama cluster is no longer maintained and has multiple known defects. In the underlying consumer client code, PartitionConsumer calls FetchRequest to unpack FetchResponse. However, the sarama cluster implementation does not follow the standard protocol, resulting in a defect where even when the server returns complete metadata, the client may parse incomplete metadata. This can cause production or consumption failures, with the error characteristic: response did not contain all the expected topic/partition blocks.
Specific scenario examples:
During the specification adjustment of a customer's instance, the addition of nodes triggers client metadata updates. If the metadata returns test's three partitions and a's two partitions in the sequence test-0, a-0, test-1, a-1, test-2, although the broker returns complete metadata, there is a probability of disordered sequence. When parsing this metadata, the sarama cluster client may interpret incomplete metadata (for example, it can only parse a-1 and test-2), triggering the aforementioned error where the response did not contain all expected partition metadata, ultimately causing production or consumption failures.
In this scenario, clients implementing the standard protocol (such as Java and confluent go) do not encounter this issue. Therefore, before performing a specification adjustment, it is recommended that customers check whether they are using this client, as specification adjustments frequently trigger metadata updates. If they are, it is strongly recommended to upgrade to the confluent go client before performing the specification adjustment. If upgrading is not possible, restarting the client immediately after the issue occurs may restore normal operation.