During daily business operations, your Redis instance may experience performance degradation, access timeouts, and poor user experience when you fail to handle big key or hot key issues in time. This may even cause a large-scale failure of instances. This document describes the causes of big keys and hot keys as well as relevant troubleshooting and optimization solutions.
Definition
Big key
A big key is one that has large value and takes up a large space. Essentially, this is a big value issue. The following are some common examples of data structure types in Redis.
For a value of string type, the value exceeds 10 MB (the data value is too large).
For a value of set type, the number of members is 10,000 (large number of members).
For a value of list type, the number of members is 10,000 (large number of members).
For a hash format value, the number of members is 1,000, but the total value size of all member variables is 1,000 MB (the total size of the members is too large).
Hot key
A hot key is one that gets more accesses than other keys over a period of time and has high QPS in a specific Redis instance. It also refers to a key with high CPU or bandwidth utilization. Common examples are shown below.
When the total QPS (command executions per second) of the Redis instance is 10,000 and one of the keys has 7,000 accesses per second, it could be a hot key.
When a hash-formatted key containing 2000 fields sends a large number of hgetall operation requests per second, it could be a hot key.
When a key containing 10,000 fields sends a large number of zrange operation requests per second, it could be a hot key.
Symptoms and Impacts
Big key
Memory usage is uneven
In the Redis cluster architecture, the memory utilization of a certain data shard is far higher than that of other data shards, and the memory resources cannot be balanced. In addition, Redis memory may reach the upper limit defined by the maxmemory parameter, causing important keys to be evicted and even the memory to overflow.
Timeout blocking occurs when request response time rises
As Redis adopts a single-threaded architecture, it takes a long time to operate a big Key, which may cause request blocking.
Data sync has interrupted or master-replica switch is being performed.
When the memory is insufficient, the master database will be blocked for a long time if you evict a big key or rename it, which may cause sync interruption or master-replica switch.
Network is congested
A big key occupies 1 MB, and 1000 accesses per second will result in 1000 MB of traffic, which may cause the bandwidth of the instance or LAN to be fully occupied. This slows down its own services while also affecting other services.
Hot key
The CPU utilization of the instance stays high, compromising the overall service performance.
Due to the uneven distribution of requests under the cluster architecture and the increased access pressure on nodes with hot keys, the data shard may experience an exhaustion of connections or even go down. Even if expansion is performed in this case, there will be a great waste of resources.
The highly concentrated hotspot cache traffic surpasses the capacity of Redis, making it easy to cause the cache and database breakdown and thus triggering an avalanche of the system.
Cause Analysis
Big key
The key-value pair in Redis is improperly set, such as using a key of string type to store large-volume binary file-type data. This results in a particularly large key value.
For list, set, and similar structures, invalid data are not cleaned up in time, resulting in a continuous increase in members within the Key.
Before business launch, the business analysis is inaccurate, and the members in the key are not reasonably split, resulting in too many members in an individual key.
Hot key
A sudden surge in unexpected traffic, such as a blockbuster product, a hot news story with soaring visits, flooding likes from a host's event in a live stream, or guild battles in a game zone involving numerous players.
Troubleshooting
Tencent Cloud Distributed Cache is connected to DBbrain’s performance optimization feature, helping you quickly find big keys and hot keys in the database.
Solutions
Big key
1. Clear invalid data
For list and set types, the content in them continues to increase during usage. However, since previously stored data has become invalid, list and set need to be cleared regularly.
2. Compress the value of the corresponding big key
Compress the value through serialization or compression to reduce its size. However, if the value remains particularly large after compression, splitting should be used to resolve it.
3. Split big key
By splitting the big key into the key-value pairs of multiple small keys, and the corresponding value size and the number of split members are more reasonable after splitting, and then store it. You can use get or mget to obtain stored key-value pairs in batches.
4. Real-time monitoring of Distributed Cache memory, bandwidth, and Key growth trends
You can monitor the memory usage and network bandwidth usage inDistributed Cache through the monitoring system as well as the growth rate of memory usage within a fixed period of time. When the set threshold is exceeded, an alarm notification is triggered for troubleshooting. For specific information on monitoring metrics, see Monitoring at Five-Second Granularity. For directions on setting alarm thresholds, see Configuring Alarms. Hot key
You can use the read/write separation architecture. If the generation of hot keys comes from read requests, then read/write separation is a good solution. When using the read/write separation architecture, the read request pressure in each Distributed Cache instance can be reduced by continuously increasing the number of replica nodes. For more information, see Enabling/Disabling Read/Write Separation.