Compared to the traditional MySQL InnoDB engine, TDSQL Boundless provides a compression ratio of up to 3.81. This article offers evaluation of disk capacity for users planning database migration or model selection.
Interpretation of Core Advantages: Why Can TDSQL Boundless Achieve Such a High Compression Ratio?
TDSQL Boundless adopts an LSM structure for data storage. Due to the append-only nature of LSM, it avoids intra-page fragmentation caused by frequent random writes from in-place updates in B+ trees. Simultaneously, the characteristic of background compaction in LSM eliminates the overhead of compressing data with every write, significantly reducing the performance impact of compression. Thus, compared to MySQL InnoDB, TDSQL Boundless achieves a data compression ratio of up to 3.81 while delivering comparable performance.
After examining the kernel, we present empirical data to quantitatively compare the storage footprint of TDSQL Boundless versus MySQL when handling identical datasets. This validates the high compression ratio's effectiveness across different data models (OLTP, OLAP), providing scientific guidance for capacity planning of disk during MySQL migration.
Test Overview
Test Environment
|
Cloud platform. | Tencent Cloud |
Instance Specifications | 16-Core CPU/32 GB of Memory/Enhanced SSD 300 GB |
Comparison Database
Control group: MySQL 8.0 (using the InnoDB storage engine, default configuration)
Experimental group: TDSQL Boundless (with built-in high-efficiency engine for data compression)
Key Configuration
MySQL 8.0 default configuration of the InnoDB engine:
innodb_file_per_table=ON
innodb_page_size=16K
innodb_page_compression=OFF
Test Data Set
Sysbench benchmark
benchmark test for TPC-C
TPC-H benchmark test
Testing Plan
Preparing Test Data
Sysbench benchmark: Initialize 32 tables with 10 million records per table.
TPC-C benchmark test: Initialize 1000 Warehouses.
TPC-H benchmark test: Initialize 8 tables (with indexes) and generate a 100GB test dataset.
Test Results and Data Analysis
Comparison of Storage Space Usage (Unit: GB)
|
Sysbench | 72.6GB | 36.77GB |
TPC-C (1000 Warehouses) | 77.8GB | 43.5GB |
TPC-H (100GB) | 128.54GB | 33.78GB |
TPC-H (100GB) (with indexes) | 166.42GB | 45.29GB |
Comparison of Data Compression Ratio (MySQL as Baseline 100%)
|
Sysbench | 100% | ~50.64% |
TPC-C (1000 Warehouses) | 100% | ~55.91% |
TPC-H (100GB) | 100% | ~26.27% |
TPC-H (100GB) (with indexes) | 100% | ~27.21% |
Migration Guide: Recommendations for Disk Capacity Planning
If you are migrating from MySQL to TDSQL Boundless and wish to evaluate the disk requirements of the target database, please follow these simple guidelines:
Step 1: Evaluate Disk Usage of MySQL
Log in to your MySQL instance and run the following command to view the total data size:
SELECT table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024 / 1024, 2) AS 'Size(GB)'
FROM information_schema.TABLES
GROUP BY table_schema;
Step 2: Select Compression Ratio Based on Business Scenario
|
General OLTP services (orders, users, and so on) | Plan based on 50% of MySQL space. |
Logs, monitoring, time-series data | Plan for 30% of MySQL space. |
Data warehouse, reporting and analytics (AP) | Plan for 30% of MySQL space. |
Step 3: Calculate Target Capacity of the Disk
Required disk space is approximately equal to the source MySQL capacity multiplied by the planning coefficient.