tencent cloud

Columnar Storage Secondary Index
Last updated:2025-05-09 11:51:24
Columnar Storage Secondary Index
Last updated: 2025-05-09 11:51:24

Index Introduction

An index is an important capability for database query acceleration. To fully meet the query needs of different users and improve the overall database performance, the read-only analysis engine has supported the secondary index capability based on columnar storage since version 2.2410.1.0 (including version 2.2410.1.0).
Generally speaking, creating an index can significantly reduce the data query volume for low selectivity predicate queries on high-cardinality columns, significantly optimizing the query speed.
Currently, the read-only analysis engine supports three types of indexes: Zonemap index, Bloom filter index, and Bitmap index.
Note:
Currently, the capability to create indexes by yourself is not available. If you want to experience the index feature, submit a ticket.

Zonemap Index

Zonemap index is a built-in index that does not need the special attention from users. It automatically maintains statistics for each column and records information such as the maximum value, the minimum value, and whether there is NULL for each data block.
For scenarios such as equality queries, range queries, and IS NULL, you can determine whether the data file and data block contain data that meets the conditions with the information such as the maximum value and minimum value. If not, skip reading the corresponding file or data block. In this way, unnecessary I/O operations can be reduced, effectively accelerating the query process.

Bloom Filter Index

A Bloom filter index is a skip index based on Bloom filter, which uses Bloom filter to skip data blocks that do not meet the specified conditions of equality queries, so as to reduce I/O operations and achieve query acceleration.
A Bloom filter is a fast search algorithm for multi-hash function maps proposed by Bloom in 1970. It is usually applicable for scenarios where it is necessary to quickly determine whether an element is part of a collection, but there is no strict requirement for 100% accuracy. Bloom Filter has the following features:
A high space efficiency probabilistic data structure used to check whether an element is in a collection.
For a call to detect the existence of an element, the Bloom filter will tell the caller either of the results: it may exist or it definitely does not exist.

Applicable Scenarios

The Bloom filter index can accelerate equality queries (including = and IN), and it works well for high-cardinality fields.

Limitations

The Bloom filter index has no effect on queries other than = and IN, such as !=, NOT INT, >, and <.
The Bloom filter index only supports the INT type with a maximum length of 256, String type, Decimal type with a maximum length of 256, and Time, Date, DateTime field types.
Index creation for expressions is not supported, nor is multi-column join index.
The first column of a single primary key column or a multi-field join primary key does not support the creation of a Bloom filter index.

Indexing

When you are executing SQL, if a Bloom filter index is created on the fields in the equality predicate or IN predicate within the where clause, the index will be automatically applied for query acceleration during querying.

Bitmap Index

Bitmap index is an index represented by a bitmap which is created for every key value of the column. Compared with other indexes, the advantage of Bitmap Index is that it occupies very little storage and is very fast to create and use, while the disadvantage is that the lock granularity of modification operation is large and not suitable for scenarios with frequent updates.

Applicable Scenarios

It is suitable to be created for columns with high repetition, 100 to 100,000 recommended, such as Occupation and Prefecture-Level City columns. If the repetition is too high, there is no obvious advantage compared to other types of indexes, and if too low, the spatial efficiency and performance will be greatly reduced.
Specific types of queries, such as logical operations like count, or, and, only require bit operations. For example, query with a combination of multiple conditions: select count(*) from table where city = 'Nanjing' and job = 'doctor' and Type = 'iphone' and gender ='male'. For such scenarios, if a Bitmap index is created on each query condition column, the database can perform efficient bit operations to precisely locate the required data and reduce disk I/O operations. The smaller the filtered result set, the more pronounced the advantage of Bitmap index becomes.
It is suitable for analysis scenarios such as ad-hoc query and multi-dimensional analysis. If there is a table with 100 columns and users use 20 columns as query conditions (using any columns among these 20 columns) to create 20 Bitmap indexes on these columns, then all queries can be applied to the indexes.

Scenarios Not Applicable

Columns with low repetition, for example, Identity Number and Mobile Number columns.
Columns with high repetition, for example, Gender column. For such columns, you can create a Bitmap index, but it is recommended to use it to filter in conjunction with other conditions rather than using it as the sole query condition.
Columns that often need to be updated or modified.

Limitations

The Bitmap index supports expressions such as =, !=, >, <, >=, <=, in, is null, is not null, but multiple predicates can only be connected by "and".
The Bitmap index only supports INT type with a maximum length of 256, String type, Decimal type with a maximum length of 256, and Time, Date, DateTime field types.
Index creation for expressions is not supported, nor is multi-column join index.
The first column of a single primary key column or a multi-field join primary key does not support the creation of a Bitmap index.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback