HBase Model
HBase Data Model
HBase's data model is a multidimensional mapping that can be represented as:
(row_key, column family, column, version) → value
row_key: the unique identifier for a row, sorted lexicographically.
column family: a collection of columns, which is the basic unit of physical storage. Each column family contains an unlimited number of columns and supports dynamic addition. The same column can contain multiple data versions, and the version number is usually represented by a timestamp.
column: a specific field within a column family, consisting of the column family name and the column qualifier (Qualifier).
version: the version number of data, usually represented by a timestamp.
value: the actual stored data.
cell: the smallest unit of data storage, uniquely identified by (row_key, column family, column, version).
HBase Storage Structure
Namespace: a logical container for tables, used to organize and manage tables. Tables belong to a specific Namespace.
Table: The basic unit of data storage in HBase. Tables are divided into multiple Regions based on the range of row key (Row Key). Each Region is responsible for storing a portion of the data.
Region: Each Region contains multiple Stores. Each Store corresponds to a column family (Column Family).
Store: Store is the basic unit for data storage within a Region, corresponding one-to-one with a column family. Each Store maintains an independent LSM structure, including MemStore and StoreFile. The number of column families is typically limited, and it is recommended not to exceed 3-5.
MemStore: MemStore is an in-memory write cache that provides high write performance.
StoreFile: StoreFile is a storage file on disk, based on the HFile format, supporting efficient reading and compression.
Block: Block is the basic storage unit within a StoreFile, supporting efficient random reads.
HBase's storage structure can be represented by the following hierarchical relationships:
Table
├── Region (divided by row key range)
│ ├── Store (each column family corresponds to one Store)
│ │ ├── MemStore (in-memory write cache)
│ │ └── StoreFile (on-disk storage file)
│ │ └── Block (data block in the file)
│ └── ...
└── ...
TDSQL Boundless Data Model
Role Mapping
HBase Master is analogous to the MC (Metadata Cluster) in TDSQL Boundless.
HBase Region Server is analogous to the TDStore node in TDSQL Boundless.
Table Mapping Rules
One-to-many mapping: One HBase Table corresponds to multiple tables in TDSQL Boundless.
Column family mapping: Each Column Family corresponds to one TDSQL Boundless table, with the table name formatted as HBase table name_column family name.
Column mapping: A specific version of each column in HBase corresponds to a row of data in the TDSQL Boundless table.
Primary key design: The primary key of the TDSQL Boundless table is HBase Row Key + Column Qualifier + Version.
Mapping Example
Assuming the HBase table ht1 contains two column families cf1 and cf2, TDSQL Boundless will create two tables internally: ht1_cf1 and ht1_cf2.
HBase data example
|
row1 | cf1 | a | 100 | v1 |
row1 | cf1 | b | 100 | v2 |
row1 | cf1 | b | 110 | v3 |
row1 | cf2 | c | 120 | v4 |
row2 | cf1 | d | 120 | v5 |
row2 | cf2 | d | 130 | v6 |
TDSQL Boundless table data
create table ht1_cf1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V MediumBlob NOT NULL,
primary key(K, Q, T)) HBase;
create table ht1_cf2 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V MediumBlob NOT NULL,
primary key(K, Q, T)) HBase;
table ht1_cf1
|
row1 + a + 100 | v1 |
row1 + b + 100 | v2 |
row1 + b + 110 | v3 |
row2 + d + 120 | v5 |
table ht1_cf2
|
row1 + c + 120 | v4 |
row2 + d + 130 | v6 |