tencent cloud

Resource Specification Selection and Optimization Suggestions
Last updated: 2024-08-02 12:44:13
Resource Specification Selection and Optimization Suggestions
Last updated: 2024-08-02 12:44:13
This document will introduce how to choose the instance specifications of Tencent Cloud TCHouse-D and provide optimization suggestions when resources are insufficient.
Note:
For different types of businesses, it is recommended to configure resource isolation policies or split clusters, for example, one cluster for real-time report business, and one cluster for real-time risk control business.
When a business supports multiple ToB tenants simultaneously, it is recommended to isolate resources or split clusters based on the actual situation to reduce mutual interference, for example, providing SaaS services for 200 tenants simultaneously, spliting into 4 clusters, and each supporting 50 tenants.

Resource Specifications and Adaptation Scenes

When purchasing a Tencent Cloud TCHouse-D cluster, you need to select the computing resource specifications and storage resource specifications of the FE node and BE node, and choose whether to enable high availability.

Resource Specifications and Recommended Scenes

Model Type
Compute Node Specifications
Recommended Storage Type
Recommended Scenes
Standard
4-core 16 GB
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Limited to POC feature testing or personal learning use, mainly used to experience and test product capabilities
8-core 32 GB
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Recommended for the test environment, supporting medium data scale and rather complex data analysis
16-core 64 GB
High Performance Cloud Disk
SSD Cloud Disk
Enhanced SSD Cloud Disk
Recommended for the production environment, supporting data analysis of larger scale and more complex scenes, as well as high concurrency scenes
32 cores and above
High Performance Cloud Disk
SSD
Enhanced SSD Cloud Disk
Recommended for the production environment, supporting large-scale, highly complex data analysis, high concurrency, and other scenes

High Availability and Node Quantity Suggestions

Scene
High Availability Selection
Recommended Minimum Number of FE Nodes
Recommended Minimum Number of BE Nodes
POC feature testing
Non-high availability
1
3
Production scenario (query high availability)
Read high availability
3 FE nodes at least
3 BE nodes at least, on-demand scaling
Production scenario (query-write high availability)
Read-write high availability
5 FE nodes at least
3 BE nodes at least, on-demand scaling
Cross-AZ high availability scenario
Read-write high availability + 3 AZ deployment
5 FE nodes at least
3 BE nodes at least, scaling in increments of 3

Examples of Resource Specification Selection

Note:
The following content is for reference only. The performance may vary greatly in different business scenes.
1. Scene 1: Product feature verification and simple data analysis
FE: High availability not enabled, single node, 4-core 16 GB
BE: 3 nodes, 4-core 16 GB per node
2. Scene 2: Simple query of small- to medium-sized data, such as hundreds of GB of data, less than 1,000 QPS
FE: High availability not enabled, single node, 8-core 32 GB.
BE: 3 nodes, 8-core 32 GB per node
3. Scene 3: Production scene, TB-level data volume, involving complex queries such as multi-table join and GROUP BY
FE: High availability enabled, 3 nodes, 16-core 64 GB per node
BE: 3 nodes, 16-core 64 GB per node
4. Scene 4: Production business, TB-level data volume, complex queries, and a large number of high-concurrency point queries
FE: High availability enabled, 3 nodes, 16-core 64 GB per node
BE: 6 nodes, 16-core 64 GB per node

Resource Monitoring and Optimization Suggestions

Operations such as large-scale data import, data query, concurrent query, and multi-taweweweqweqweqweqeble join will cause a large amount of CPU and memory usage. If the CPU/memory utilization continues to exceed 85%, the cluster will become unstable. It is recommended to optimize the business or change the configuration.wewaeqeadwadasdwadasdwawdaw

Resource Usage Monitoring

You can go to Cluster Management> Cluster Monitoring to check the CPU and memory usage of each BE and FE node, as shown in the following figure.
Cluster Monitoring > BE metrics



Cluster Monitoring > FE metrics




Resource Scale-out Suggestions

When the CPU and memory usage of FE and BE exceeds 85% continuously, you need to consider upgrade or scale out resources.
Note:
The main reasons for the high CPU and memory usage of FE and BE are as follows:
High usage of FE CPU: Multiple concurrent queries and a large number of complex queries.
High usage of FE memory: Too much metadata (unreasonable partitioning) and frequent table deletion.
High usage of BE CPU: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
High usage of BE memory: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
Common Scenes
Resource Consumption Performance
Optimization Suggestions for the Usage Continuously Exceeding 85%
Too much data continuously imported
The CPU and memory of FE and BE will be highly used.
If the bottleneck is FE: Vertical upgrade is recommended.
If the bottleneck is BE: Vertical upgrade is recommended.
Frequent point checks/high concurrency
The CPU of FE and BE will be highly used.
If the bottleneck is FE: Vertical upgrade is recommended.
If the bottleneck is BE: Vertical upgrade is recommended.
Frequent metadata changes and deletions
The memory of FE will be highly used.
It is recommended to upgrade FE vertically and increase memory.
Many multi-table join/aggregation queries
The CPU and memory of BE will be highly used.
It is recommended to horizontally scale out BE. Vertical upgrade is also an option.
Data multi-concurrency writing
The CPU and memory of BE will be highly used.
It is recommended to horizontally scale out BE. Vertical upgrade is also an option.

Cluster Scaling Must-Knows

Operation Type
Must-Knows
Scale-out
During the horizontal scale-out process, system reading and writing are still possible, but there may be some jitters. The operation takes about 5 to 15 minutes. Choose to perform it during non-business peak hours.
When both the amount of data storage and the amount of queries increase relatively, horizontal scale-out is the preferred option.
Scale-in
Only one type of nodes can be selected for scale-in operation at a time, such as FE scale-in only or BE scale-in only.
FE scale-in: Multiple FE nodes can be scaled in at one time.
BE scale-in: Scaling in multiple BE nodes at one time may result in data loss or be time-consuming. It is recommended to scale the node in one by one.
During the scale-in process, system reading and writing are still possible, but there may be some jitters.
Vertical upgrade/downgrade
The scale up/down system cannot be read or written.
Computing specifications can be upgraded or downgraded; storage specifications can only be upgraded.
The results of specification adjustment are effective for all nodes in a cluster.

Business Optimization Suggestions

Optimization Type
Optimization Instructions
Usage recommendations
If you often perform point queries on a column and the column has a high cardinality, it is recommended to create a Bloom filter index on this column.
If you often perform fixed-mode aggregate queries on a table, it is recommended to create a materialized view on this table.
It is recommended to divide partitions and buckets reasonably according to business scenes to avoid excessive FE memory usage due to too many partitions and buckets.
For SQL queries of general data exploration, if not all data is needed, it is recommended to add a limit number for the records returned, which can also speed up the query.
It is recommended to use CSV for data import and avoid JSON format.
Try-to-avoid
Avoid select * queries.
Avoid enabling profiles globally (This will result in more resource consumption. It is recommended to enable profiles for the demanding SQL statements).
When creating a table: Avoid enabling merge_on_write (this feature is not yet mature).
When creating a table: Avoid enabling auto bucket (this feature is not yet mature).
When creating a table: Avoid opening a dynamic Schema table (this feature is not yet mature).
Avoid the join of multiple large tables. When multiple large tables are joined:
Every two large tables can be joined through Colocation Join.
Or use pre-aggregate tables and indexes to speed up queries.
Parameter optimization
When a SQL statement involves multiple concurrent operations, it is recommended to increase the parallel_fragment_exec_instance_num parameter. The default value of this parameter is 200. It can be increased by multiples (such as 400 and 800). It is recommended to control it within 2,000.
It is recommended to control the compaction speed. If the monitoring metric base_compaction_score exceeds 200 and continues to rise (for details, see the Cluster Monitoring-BE Indicators-BE page), you can increase the compaction_task_num_per_disk parameter configuration (the system default is 2, which can be increased to 4 or greater).


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback