Tencent Cloud Elastic MapReduce (EMR) ClickHouse provides cloud-hosted services for ClickHouse, a popular open-source column-oriented database management system (DBMS). With features such as convenient ClickHouse cluster deployment, configuration modification, and monitoring and alarming, it is a secure and stable OLAP solution for both individual and corporate users. Moreover, it has excellent query performance and is therefore ideal for online data query and analysis scenarios.
ClickHouse is available in single-node, multi-node, and multi-replica architectures for your choice according to your business needs.
Out-of--the-box services such as monitoring, log search, and parameter modification are provided in the console.
EMR ClickHouse has complete data import and export capabilities. You can easily import data from data sources such as COS, HDFS, Kafka, and MySQL to the ClickHouse cluster or vice versa.
By giving full play to multi-core concurrency (with the aid of high-efficiency SIMD instruction set and vectorized execution engine) and leveraging distributed technologies and accelerated computing, ClickHouse provides real-time analysis capabilities. As an open-source solution, ClickHouse's benchmark test shows that its processing is 100–1,000 times faster than that of traditional methods, with a 50–200 MB/s high throughput during real-time import.
Based on the well-designed column-oriented storage and efficient data compression algorithm, ClickHouse is an optimal scheme for building large-scale data warehouses, as it provides a compression ratio of up to 1,000%, greatly improving the data storage and computing capabilities of a single server and reducing the use costs.
The table engine (i.e., table type) determines the following:
MergeTree engines are the most universal and powerful table engines for high-load tasks. A common feature among them is quick data insertion with subsequent backend data processing. They support data replication (with Replicated* versions of engines), partitioning, and some features not supported by other engines.
Engines in this family:
MergeTreein that it removes duplicate entries with the same primary key value.
MergeTree. The difference is that when merging data parts for
SummingMergeTreetables, ClickHouse replaces all the rows with the same primary key with one row which contains summarized values for the columns with the numeric data type. If the primary key is composed in a way that a single key value corresponds to a large number of rows, this significantly reduces storage volume and speeds up data query.
MergeTreewhile changing the logic for data parts merging. ClickHouse replaces all rows with the same primary key (within one data part) with a single row that stores a combination of states of aggregate functions.
MergeTreeand adds the logic of rows collapsing to data parts merge algorithm.
CollapsingMergeTree, it uses a different collapsing algorithm that allows inserting data in any order with multiple threads.
Log engines are lightweight engines with minimum functionality. They are most effective when you need to quickly write many small tables (up to around 1 million rows) and read them later as a whole.
Engines in this family:
TinyLogin that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads. For concurrent data access, the read operations can be performed at the same time, while write operations block reads and one another. It does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. It is appropriate for temporary data, write-once tables, and testing or demonstration purposes.
Integration engines are for integration with other data storage and processing systems.
SELECTqueries on data stored on remote MySQL servers.
MergeTree) does not store data itself but allows reading from any number of other tables at the same time.
Nulltable is written to, data is ignored. When a
Nulltable is read from, the response is empty. However, you can create a materialized view on a
Nulltable, so the data written to the table will be forwarded to the view.
Jointable data is permanently retained in the memory.
CREATE VIEWquery). It does not store data but only stores the specified
SELECTquery. When reading from a table, it runs this query (and deletes all unnecessary columns from the query).
Virtual column is an integral table engine attribute that is defined in the engine source code.
You should not specify virtual columns in the
CREATE TABLE query and you cannot see them in
SHOW CREATE TABLE and
DESCRIBE TABLE query results. Virtual columns are also read-only, so you can't insert data into virtual columns.
To select data from a virtual column, you must specify its name in the
SELECT * does not return values from virtual columns.
If you create a table with a column that has the same name as one of the table virtual columns, the virtual column becomes inaccessible. To help avoid conflicts, virtual column names are usually prefixed with an underscore.