Resource Specification Selection and Optimization Suggestions

Last updated:2024-08-02 12:44:13

Resource Specification Selection and Optimization Suggestions

Last updated: 2024-08-02 12:44:13

This document will introduce how to choose the instance specifications of Tencent Cloud TCHouse-D and provide optimization suggestions when resources are insufficient.
Note:
For different types of businesses, it is recommended to configure resource isolation policies or split clusters, for example, one cluster for real-time report business, and one cluster for real-time risk control business.
When a business supports multiple ToB tenants simultaneously, it is recommended to isolate resources or split clusters based on the actual situation to reduce mutual interference, for example, providing SaaS services for 200 tenants simultaneously, spliting into 4 clusters, and each supporting 50 tenants.
Resource Specifications and Adaptation Scenes
When purchasing a Tencent Cloud TCHouse-D cluster, you need to select the computing resource specifications and storage resource specifications of the FE node and BE node, and choose whether to enable high availability.
Resource Specifications and Recommended Scenes
Model Type
Compute Node Specifications
Recommended Storage Type
Recommended Scenes
Standard
4-core 16 GB
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Limited to POC feature testing or personal learning use, mainly used to experience and test product capabilities
﻿
8-core 32 GB
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Recommended for the test environment, supporting medium data scale and rather complex data analysis
﻿
16-core 64 GB
High Performance Cloud Disk
SSD Cloud Disk
Enhanced SSD Cloud Disk
Recommended for the production environment, supporting data analysis of larger scale and more complex scenes, as well as high concurrency scenes
﻿
32 cores and above
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Recommended for the production environment, supporting large-scale, highly complex data analysis, high concurrency, and other scenes
High Availability and Node Quantity Suggestions
Scene
High Availability Selection
Recommended Minimum Number of FE Nodes
Recommended Minimum Number of BE Nodes
POC feature testing
Non-high availability
1
3
Production scenario (query high availability)
Read high availability
3 FE nodes at least
3 BE nodes at least, on-demand scaling
Production scenario (query-write high availability)
Read-write high availability
5 FE nodes at least
3 BE nodes at least, on-demand scaling
Cross-AZ high availability scenario
Read-write high availability + 3 AZ deployment
5 FE nodes at least
3 BE nodes at least, scaling in increments of 3
Examples of Resource Specification Selection
Note:
The following content is for reference only. The performance may vary greatly in different business scenes.
1. Scene 1: Product feature verification and simple data analysis
FE: High availability not enabled, single node, 4-core 16 GB
BE: 3 nodes, 4-core 16 GB per node
2. Scene 2: Simple query of small- to medium-sized data, such as hundreds of GB of data, less than 1,000 QPS
FE: High availability not enabled, single node, 8-core 32 GB.
BE: 3 nodes, 8-core 32 GB per node
3. Scene 3: Production scene, TB-level data volume, involving complex queries such as multi-table join and GROUP BY
FE: High availability enabled, 3 nodes, 16-core 64 GB per node
BE: 3 nodes, 16-core 64 GB per node
4. Scene 4: Production business, TB-level data volume, complex queries, and a large number of high-concurrency point queries
FE: High availability enabled, 3 nodes, 16-core 64 GB per node
BE: 6 nodes, 16-core 64 GB per node
Resource Monitoring and Optimization Suggestions
Operations such as large-scale data import, data query, concurrent query, and multi-taweweweqweqweqweqeble join will cause a large amount of CPU and memory usage. If the CPU/memory utilization continues to exceed 85%, the cluster will become unstable. It is recommended to optimize the business or change the configuration.wewaeqeadwadasdwadasdwawdaw
Resource Usage Monitoring
You can go to Cluster Management> Cluster Monitoring to check the CPU and memory usage of each BE and FE node, as shown in the following figure.
Cluster Monitoring > BE metrics 
﻿
﻿
﻿
Cluster Monitoring > FE metrics 
﻿
﻿
﻿
Resource Scale-out Suggestions
When the CPU and memory usage of FE and BE exceeds 85% continuously, you need to consider upgrade or scale out resources.
Note:
The main reasons for the high CPU and memory usage of FE and BE are as follows:
High usage of FE CPU: Multiple concurrent queries and a large number of complex queries.
High usage of FE memory: Too much metadata (unreasonable partitioning) and frequent table deletion.
High usage of BE CPU: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
High usage of BE memory: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
Common Scenes
Resource Consumption Performance
Optimization Suggestions for the Usage Continuously Exceeding 85%
Too much data continuously imported
The CPU and memory of FE and BE will be highly used.
If the bottleneck is FE: Vertical upgrade is recommended.
If the bottleneck is BE: Vertical upgrade is recommended.
Frequent point checks/high concurrency
The CPU of FE and BE will be highly used.
If the bottleneck is FE: Vertical upgrade is recommended.
If the bottleneck is BE: Vertical upgrade is recommended.
Frequent metadata changes and deletions
The memory of FE will be highly used.
It is recommended to upgrade FE vertically and increase memory.
Many multi-table join/aggregation queries
The CPU and memory of BE will be highly used.
It is recommended to horizontally scale out BE. Vertical upgrade is also an option.
Data multi-concurrency writing
The CPU and memory of BE will be highly used.
It is recommended to horizontally scale out BE. Vertical upgrade is also an option.
Cluster Scaling Must-Knows
Operation Type
Must-Knows
Scale-out
During the horizontal scale-out process, system reading and writing are still possible, but there may be some jitters. The operation takes about 5 to 15 minutes. Choose to perform it during non-business peak hours.
When both the amount of data storage and the amount of queries increase relatively, horizontal scale-out is the preferred option.
Scale-in
Only one type of nodes can be selected for scale-in operation at a time, such as FE scale-in only or BE scale-in only.
FE scale-in: Multiple FE nodes can be scaled in at one time.
BE scale-in: Scaling in multiple BE nodes at one time may result in data loss or be time-consuming. It is recommended to scale the node in one by one.
During the scale-in process, system reading and writing are still possible, but there may be some jitters.
Vertical upgrade/downgrade
The scale up/down system cannot be read or written.
Computing specifications can be upgraded or downgraded; storage specifications can only be upgraded.
The results of specification adjustment are effective for all nodes in a cluster.
Business Optimization Suggestions
Optimization Type
Optimization Instructions
Usage recommendations
If you often perform point queries on a column and the column has a high cardinality, it is recommended to create a Bloom filter index on this column.
If you often perform fixed-mode aggregate queries on a table, it is recommended to create a materialized view on this table.
It is recommended to divide partitions and buckets reasonably according to business scenes to avoid excessive FE memory usage due to too many partitions and buckets.
For SQL queries of general data exploration, if not all data is needed, it is recommended to add a limit number for the records returned, which can also speed up the query.
It is recommended to use CSV for data import and avoid JSON format.
Try-to-avoid
Avoid select * queries. 
Avoid enabling profiles globally (This will result in more resource consumption. It is recommended to enable profiles for the demanding SQL statements).
When creating a table: Avoid enabling merge_on_write (this feature is not yet mature).
When creating a table: Avoid enabling auto bucket (this feature is not yet mature).
When creating a table: Avoid opening a dynamic Schema table (this feature is not yet mature).
Avoid the join of multiple large tables. When multiple large tables are joined:
Every two large tables can be joined through Colocation Join.
Or use pre-aggregate tables and indexes to speed up queries.
Parameter optimization
When a SQL statement involves multiple concurrent operations, it is recommended to increase the parallel_fragment_exec_instance_num parameter. The default value of this parameter is 200. It can be increased by multiples (such as 400 and 800). It is recommended to control it within 2,000.
It is recommended to control the compaction speed. If the monitoring metric base_compaction_score exceeds 200 and continues to rise (for details, see the Cluster Monitoring-BE Indicators-BE page), you can increase the compaction_task_num_per_disk parameter configuration (the system default is 2, which can be increased to 4 or greater).
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Feedback

Model Type	Compute Node Specifications	Recommended Storage Type	Recommended Scenes
Standard	4-core 16 GB	High Performance Cloud Disk SSD Enhanced SSD Cloud Disk	Limited to POC feature testing or personal learning use, mainly used to experience and test product capabilities
		8-core 32 GB	High Performance Cloud Disk SSD Enhanced SSD Cloud Disk	Recommended for the test environment, supporting medium data scale and rather complex data analysis
		16-core 64 GB	High Performance Cloud Disk SSD Cloud Disk Enhanced SSD Cloud Disk	Recommended for the production environment, supporting data analysis of larger scale and more complex scenes, as well as high concurrency scenes
		32 cores and above	High Performance Cloud Disk SSD Enhanced SSD Cloud Disk	Recommended for the production environment, supporting large-scale, highly complex data analysis, high concurrency, and other scenes

Scene	High Availability Selection	Recommended Minimum Number of FE Nodes	Recommended Minimum Number of BE Nodes
POC feature testing	Non-high availability	1	3
Production scenario (query high availability)	Read high availability	3 FE nodes at least	3 BE nodes at least, on-demand scaling
Production scenario (query-write high availability)	Read-write high availability	5 FE nodes at least	3 BE nodes at least, on-demand scaling
Cross-AZ high availability scenario	Read-write high availability + 3 AZ deployment	5 FE nodes at least	3 BE nodes at least, scaling in increments of 3

Common Scenes	Resource Consumption Performance	Optimization Suggestions for the Usage Continuously Exceeding 85%
Too much data continuously imported	The CPU and memory of FE and BE will be highly used.	If the bottleneck is FE: Vertical upgrade is recommended. If the bottleneck is BE: Vertical upgrade is recommended.
Frequent point checks/high concurrency	The CPU of FE and BE will be highly used.	If the bottleneck is FE: Vertical upgrade is recommended. If the bottleneck is BE: Vertical upgrade is recommended.
Frequent metadata changes and deletions	The memory of FE will be highly used.	It is recommended to upgrade FE vertically and increase memory.
Many multi-table join/aggregation queries	The CPU and memory of BE will be highly used.	It is recommended to horizontally scale out BE. Vertical upgrade is also an option.
Data multi-concurrency writing	The CPU and memory of BE will be highly used.	It is recommended to horizontally scale out BE. Vertical upgrade is also an option.

Operation Type	Must-Knows
Scale-out	During the horizontal scale-out process, system reading and writing are still possible, but there may be some jitters. The operation takes about 5 to 15 minutes. Choose to perform it during non-business peak hours. When both the amount of data storage and the amount of queries increase relatively, horizontal scale-out is the preferred option.
Scale-in	Only one type of nodes can be selected for scale-in operation at a time, such as FE scale-in only or BE scale-in only. FE scale-in: Multiple FE nodes can be scaled in at one time. BE scale-in: Scaling in multiple BE nodes at one time may result in data loss or be time-consuming. It is recommended to scale the node in one by one. During the scale-in process, system reading and writing are still possible, but there may be some jitters.
Vertical upgrade/downgrade	The scale up/down system cannot be read or written. Computing specifications can be upgraded or downgraded; storage specifications can only be upgraded. The results of specification adjustment are effective for all nodes in a cluster.

Optimization Type	Optimization Instructions
Usage recommendations	If you often perform point queries on a column and the column has a high cardinality, it is recommended to create a Bloom filter index on this column. If you often perform fixed-mode aggregate queries on a table, it is recommended to create a materialized view on this table. It is recommended to divide partitions and buckets reasonably according to business scenes to avoid excessive FE memory usage due to too many partitions and buckets. For SQL queries of general data exploration, if not all data is needed, it is recommended to add a limit number for the records returned, which can also speed up the query. It is recommended to use CSV for data import and avoid JSON format.
Try-to-avoid	Avoid select * queries. Avoid enabling profiles globally (This will result in more resource consumption. It is recommended to enable profiles for the demanding SQL statements). When creating a table: Avoid enabling merge_on_write (this feature is not yet mature). When creating a table: Avoid enabling auto bucket (this feature is not yet mature). When creating a table: Avoid opening a dynamic Schema table (this feature is not yet mature). Avoid the join of multiple large tables. When multiple large tables are joined: Every two large tables can be joined through Colocation Join. Or use pre-aggregate tables and indexes to speed up queries.
Parameter optimization	When a SQL statement involves multiple concurrent operations, it is recommended to increase the `parallel_fragment_exec_instance_num` parameter. The default value of this parameter is 200. It can be increased by multiples (such as 400 and 800). It is recommended to control it within 2,000. It is recommended to control the compaction speed. If the monitoring metric `base_compaction_score` exceeds 200 and continues to rise (for details, see the Cluster Monitoring-BE Indicators-BE page), you can increase the compaction_task_num_per_disk parameter configuration (the system default is 2, which can be increased to 4 or greater).

tencent cloud

Resource Specifications and Adaptation Scenes

Resource Specifications and Recommended Scenes

High Availability and Node Quantity Suggestions

Examples of Resource Specification Selection

Resource Monitoring and Optimization Suggestions

Resource Usage Monitoring

Resource Scale-out Suggestions

Cluster Scaling Must-Knows

Business Optimization Suggestions