tencent cloud

Compress Data
Last updated: 2025-09-19 15:54:39
Compress Data
Last updated: 2025-09-19 15:54:39

Scenarios

Data compression can reduce network IO transfer volume and disk space. You can learn about supported message formats for data compression in this document and configure data compression based on requirements.

Message Format

Currently, CKafka supports two types of message formats: V1 and V2 (introduced in 0.11.0.0). CKafka is compatible with message formats from versions 0.9, 0.10, 1.1, 2.4, 2.8, and 3.2.
Different versions correspond to different configurations. The details are as follows:
Message format conversion is mainly for compatibility with earlier versions of consumer programs. In a CKafka cluster, multiple versions of message formats (V1/V2) are usually saved simultaneously.
The broker side converts new version messages to the old version format, which involves decompression and recompression of messages.
Message format conversion significantly impacts performance. Except for adding additional decompression operations, it also causes CKafka to lose its excellent Zero-copy feature. Therefore, ensure message format unification.
Zero-copy: Avoid expensive kernel-mode data copying when transmitting data between disk and network, thereby achieving rapid data transmission.

Compression Algorithm Comparison

For performance impact on CPU and stability considerations, the official recommendation is to use the Snappy algorithm.
Analysis process is as follows:
To evaluate a compression algorithm, there are two key metrics: compression ratio and compression/decompression throughput. CKafka versions prior to 2.1.0 support three compression algorithms: GZIP, Snappy, and LZ4. In actual usage of CKafka, the performance metrics of these three algorithms compare as follows:
Compression ratio: LZ4 > GZIP > Snappy
Throughput: LZ4 > Snappy > GZIP
Physical Resources usage is as follows:
Bandwidth: Snappy occupies the most network bandwidth as it has the lowest compression ratio.
CPU: Snappy uses more CPU during compression, while GZIP uses more CPU during decompression.
Therefore, the recommended order of the three compression algorithms is normally LZ4 > GZIP > Snappy.
After long-term production environment tests, the above conclusion holds true in most cases. However, in specific extreme cases, the LZ4 compression algorithm can cause CPU load to increase.
Analysis shows the business source data content may not be the same as, leading to different performance of compression algorithms. Therefore, we recommend users sensitive to CPU metrics adopt the more stable Snappy compression algorithm.
Note:
Note: CKafka does not recommend using GZIP compression algorithm. Enabling GZIP compression consumes additional CPU on the Kafka server. Based on performance testing data, if GZIP compression is enabled, it is advisable to reserve about 75% bandwidth buffer (reservation ratio is for reference only; actual use requires monitoring specific monitoring data for judgment).
For example: For an instance with 40MB/s bandwidth, after enabling GZIP compression, we recommend increasing the bandwidth to 40/(1-75%) = 160MB/s.

Configuring Data Compression

Producers can configure data compression using the following method:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

// After startup, each message collection produced by the Producer will be compressed, enabling excellent network transmission bandwidth saving and disk usage reduction on the Kafka Broker side.

// Note that different versions correspond to different configurations. Compression is not allowed in version 0.9 and below. Gzip compression format is not supported by default in version 1.1 and below.

props.put("compression.type", "snappy");
Producer<String, String> producer = new KafkaProducer<>(props);

In most cases, after receiving a message from the producer, the broker only stores it as it is without making any modifications.

Notes

When data is sent to CKafka, compression.codec cannot be set.
GZIP compression format is not supported by default in version 1.1 and below.
GZIP compression consumes relatively high CPU usage. Using GZIP can cause all messages to be Invalid. CKafka does not recommend using GZIP compression.
Enabling GZIP increases CPU usage and becomes a bandwidth bottleneck. If enabled, we recommend increasing the linger.ms and batch.size settings on the producer side.
When using the LZ4 compression method, the program cannot run normally. Possible reasons: incorrect message format. Please check the CKafka Version and confirm whether the format of messages is correct.
Different CKafka clients have different SDK settings. You can query (for example, C/C++ Client instructions) through the open-source community and set the version of the message format.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback