tencent cloud

Tencent Cloud TCHouse-D

Product Introduction
Overview
Concepts
Cluster Architecture
Strengths
Scenarios
Purchase Guide
Billing Overview
Renewal Instructions
Overdue Policy
Refund Instructions
Configuration Adjustment Billing Instructions
Getting Started
Using Tencent Cloud TCHouse-D Through the Console
Using Tencent Cloud TCHouse-D Through a Client
Operation Guide
Cluster Operation
Monitoring and Alarm Configuration
Account Privilege Management
Data Management
Query Management
Modify Configurations
Node Management
Log Analysis
SQL Studio
Enabling Resource Isolation
Development Guide
Design of Data Table
Importing Data
Exporting Data
Basic Feature
Query Optimization
Ecological Expansion Feature
API Documentation
History
Introduction
API Category
Making API Requests
Cluster Operation APIs
Database and Table APIs
Cluster Information Viewing APIs
Hot-Cold Data Layering APIs
Database and Operation Audit APIs
User and Permission APIs
Resource Group Management APIs
Data Types
Error Codes
Cloud Ecosystem
Granting CAM Policies to Sub-accounts
Query Acceleration for Tencent Cloud DLC
Practical Tutorial
Basic Feature Usage
Advanced Features Usage
Resource Specification Selection and Optimization Suggestions
Naming Specifications and Limits to the Database and Data Table
Table Design and Data Import
Query Optimization
Suggested Usage to Avoid
Accessing TCHouse-D via JDBC over the Public Network
Performance Testing
TPC-H Performance Testing
SSB Performance Testing
TPC-DS Performance Testing
FAQs
Common Operational Issues
Common Errors
Contact Us
Glossary
Product Policy
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement

Caching Table or Partition to Memory

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2024-06-27 11:11:11
Doris supports caching all data of a table or a specified partition to memory, which can improve query performance. However, because memory capacity is limited, it is best to use tables with small volume of data.

Usage

Caching an Entire Table

For a new table, you can add the configuration "in_memory"="true" in PROPERTIES during table creation, such as:
CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "user id",
`date` DATE NOT NULL COMMENT "data import date",
`city` VARCHAR(20) COMMENT "user's city",
`age` SMALLINT COMMENT "user's age",
`sex` TINYINT COMMENT "user's gender",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
`cost` BIGINT SUM DEFAULT "0" COMMENT "total consumption of user",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "user's maximum stay time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user's minimum stay time"
)
AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory"="true"
);
For existing tables, modify the table configuration and set "in_memory"="true":
ALTER TABLE example_db.my_table set ("in_memory" = "true");

Caching a Specified Partition

Both official and temporary partitions can be cached. For a new partition, add setting "in_memory"="true" when creating the partition:
ALTER TABLE example_db.my_table ADD [TEMPORARY] PARTITION p1 VALUES LESS THAN ("2020-02-01") ("in_memory" = "true", "replication_num" = "1")
For existing partitions, modify the configuration and set "in_memory"="true":
ALTER TABLE example_db.my_table MODIFY PARTITION (p1, p2, p4) SET("in_memory"="true");

Removing Data from Cache

When there is no need to cache data for speeding up computation, it is necessary to release the cached data from memory in a timely manner to reduce unnecessary data release. To remove cache, you only need to modify the configuration of the corresponding table or partition to "in_memory"="false".
ALTER TABLE example_db.my_table set ("in_memory" = "false");
ALTER TABLE example_db.my_table MODIFY PARTITION (p1, p2, p4) SET("in_memory"="false");

Notes

in_memory attribute

If the table is created with the "in_memory" = "true" attribute, Doris will try to cache the data blocks of the table in the storage engine's PageCache to reduce disk IO. However, this attribute does not guarantee that the data blocks will always reside in memory, and it is merely a best-effort indicator.

Related Configuration Parameters

The caching method of memory table is using Doris page cache, which needs to ensure the page cache feature is enabled and appropriate memory is allocated. Related parameters are adjusted by be.conf.
disable_storage_page_cache
Type: bool
Description: Whether to use the page cache for index caching, this configuration is only effective in BETA storage format.
Default value: false
chunk_reserved_bytes_limit
Description: The reserved bytes limit for the Chunk Allocator is typically set as a percentage of mem_limit. The default unit is byte, values must be multiples of 2 and greater than 0. If the value is larger than physical memory, it will be set to the size of physical memory. Increasing this variable can enhance performance but will there will be more idle memory that other modules can't use.
Default value: 10%

Best Practice

Try comparing the performance difference before and after enabling memory tables.

Environment preparations

Test version: 1.2.6
FE node configuration: 4 cores, 16GB, 1 node, 400GB disk
BE node configuration: 4 cores, 16GB, 3 nodes, 300GB disk
configuration :chunk_reserved_bytes_limit=30%

Start the test

Using the supplier table from the TPC-H test set, where the supplier table is a memory table, and supplier1 table is a non-memory table.
enter image description here

Both are 250M in size with 1 million records.
enter image description here

Using mysqlslap for stress testing, the test SQL is a full transverse of the supplier.
[root@9 data]# tail test.sql
select * from supplier where s_suppkey=1;
...
select * from supplier where s_suppkey=999991;
select * from supplier where s_suppkey=999992;
select * from supplier where s_suppkey=999993;
select * from supplier where s_suppkey=999994;
select * from supplier where s_suppkey=999995;
select * from supplier where s_suppkey=999996;
select * from supplier where s_suppkey=999997;
select * from supplier where s_suppkey=999998;
select * from supplier where s_suppkey=999999;
[root@9 data]# tail test2.sql
select * from supplier1 where s_suppkey=1;
...
select * from supplier1 where s_suppkey=999991;
select * from supplier1 where s_suppkey=999992;
select * from supplier1 where s_suppkey=999993;
select * from supplier1 where s_suppkey=999994;
select * from supplier1 where s_suppkey=999995;
select * from supplier1 where s_suppkey=999996;
select * from supplier1 where s_suppkey=999997;
select * from supplier1 where s_suppkey=999998;
select * from supplier1 where s_suppkey=999999;
Stress test SQL:
mysqlslap -h127.0.0.1 -uadmin -P9030 -pxxxxx --iterations=1 --concurrency=500 --number-of-queries=1000000 --create-schema=tpch_100_d --query="/data/test1.sql" --delimiter=";";

Test Results

Average time for memory table is 160 seconds

enter image description here


Average time for non-memory table is 260 seconds

enter image description here

Memory table improves by 38% compared with non-memory table.

 Usage Recommendations

The data volume in the configuration memory table should not be too large, and it is recommended to be less than 300M.
The main consideration is that the page cache is a globally shared Cache, and a single table data being too large will occupy the space of other sql page caches. At the same time, the memory table does not guarantee that all pages of the table are loaded into memory. It is a best-effort indicator to the system. It is easy to mutually occupy other table data page caches, reducing the memory residence time of the page cache, and therefore lowering the hit rate of the page cache.

Ajuda e Suporte

Esta página foi útil?

comentários