产品概述
基本概念
集群架构
产品优势
应用场景
__DORIS_DELETE_SIGN__实现,因为我们只是在 unique 模型上做批量删除,因此只需要增加一个类型为 bool 聚合函数为 replace 的隐藏列即可。在 BE 各种聚合写入流程都和正常列一样,读取方案有两个:
在 FE 遇到 select * 等扩展时去掉__DORIS_DELETE_SIGN__列,并且默认加上 __DORIS_DELETE_SIGN__ != true 的条件, BE 读取时都会加上一列进行判断,通过条件确定是否删除。DELETE ON 表达式的值,其他的聚合行为和 replace 的聚合列相同。__DORIS_DELETE_SIGN__ != true 的条件,BE 不感知这一过程,正常执行。enable_batch_delete_by_default=true 重启 FE 后新建表的都支持批量删除,此选项默认为 false。ALTER TABLE tablename ENABLE FEATURE "BATCH_DELETE" 来启用批量删除。本操作本质上是一个 schema change 操作,操作立即返回,可以通过show alter table column 来确认操作是否完成。SET show_hidden_columns=true ,之后使用desc tablename,如果输出中有__DORIS_DELETE_SIGN__ 列则支持,如果没有则不支持。Stream Load 的写法在 header 中的 columns 字段增加一个设置删除标记列的字段, 示例 -H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"。Broker Load 的写法在 PROPERTIES 处设置删除标记列的字段,语法如下:LOAD LABEL db1.label1([MERGE|APPEND|DELETE] DATA INFILE("hdfs://abc.com:8888/user/palo/test/ml/file1")INTO TABLE tbl1COLUMNS TERMINATED BY ","(tmp_c1,tmp_c2, label_c3)SET(id=tmp_c2,name=tmp_c1,)[DELETE ON label_c3=true])WITH BROKER 'broker'("username"="user","password"="pass")PROPERTIES("timeout" = "3600");
Routine Load的写法在 columns字段增加映射,映射方式同上,语法如下:CREATE ROUTINE LOAD example_db.test1 ON example_tbl[WITH MERGE|APPEND|DELETE]COLUMNS(k1, k2, k3, v1, v2, label),WHERE k1 > 100 and k2 like "%doris%"[DELETE ON label=true]PROPERTIES("desired_concurrent_number"="3","max_batch_interval" = "20","max_batch_rows" = "300000","max_batch_size" = "209715200","strict_mode" = "false")FROM KAFKA("kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092","kafka_topic" = "my_topic","kafka_partitions" = "0,1,2,3","kafka_offsets" = "101,0,0,200");
Stream Load 外的导入操作在 Doris 内部有可能乱序执行,因此在使用MERGE 方式导入时如果不是Stream Load,需要与 load sequence 一起使用,具体的语法可以参照 Sequence 列 相关的文档。DELETE ON 条件只能与 MERGE 一起使用。mysql> SET show_hidden_columns=true;Query OK, 0 rows affected (0.00 sec)mysql> DESC test;+-----------------------+--------------+------+-------+---------+---------+| Field | Type | Null | Key | Default | Extra |+-----------------------+--------------+------+-------+---------+---------+| name | VARCHAR(100) | No | true | NULL | || gender | VARCHAR(10) | Yes | false | NULL | REPLACE || age | INT | Yes | false | NULL | REPLACE || __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |+-----------------------+--------------+------+-------+---------+---------+4 rows in set (0.00 sec)
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: APPEND" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: DELETE" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 3 | 2 | tom | 2 || 4 | 3 | bush | 3 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
3,2,tom,0
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
site_id=1 的行的 key 列相同的行。curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: MERGE" -H "delete: siteid=1" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 5 | 3 | helen | 3 || 1 | 1 | jim | 2 |+--------+----------+----------+------+
2,1,grace,23,2,tom,21,1,jim,2
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 2 | 1 | grace | 2 || 3 | 2 | tom | 2 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
Query OK表示成功。mysql> delete from test_tbl PARTITION p1 where k1 = 1;Query OK, 0 rows affected (0.04 sec){'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'VISIBLE', 'txnId':'4005'}
mysql> delete from test_tbl PARTITION p1 where k1 = 1;Query OK, 0 rows affected (0.04 sec){'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'COMMITTED', 'txnId':'4005', 'err':'delete job is committed but may be taking effect later' }
affected rows:表示此次删除影响的行,由于 Doris 的删除目前是逻辑删除,因此对于这个值是恒为0。
label:自动生成的 label,是该导入作业的标识。每个导入作业,都有一个在单 database 内部唯一的 Label。
status:表示数据删除是否可见,如果可见则显示VISIBLE,如果不可见则显示COMMITTED。
txnId:这个 Delete job 对应的事务 ID。
err:字段会显示一些本次删除的详细信息。mysql> delete from test_tbl partition p1 where k1 > 80;ERROR 1064 (HY000): errCode = 2, detailMessage = {错误原因}
(tablet=replica) 。mysql> delete from test_tbl partition p1 where k1 > 80;ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas from job: 4005, Unfinished replicas:10000=60000, 10001=60000, 10002=60000
ERROR 1064 (HY000) ,则表示删除失败。Query OK,则表示删除执行成功。status 为 COMMITTED,表示数据仍不可见,用户可以稍等一段时间再用 show delete 命令查看结果。status 为 VISIBLE,表示数据删除成功。tablet_delete_timeout_second
delete 自身的超时时间是可受指定分区下 tablet 的数量弹性改变的,此项配置为平均一个 tablet 所贡献的 timeout 时间,默认值为2。
假设此次删除所指定分区下有5个 tablet,那么可提供给 delete 的 timeout 时间为10秒,由于低于最低超时时间30秒,因此最终超时时间为30秒。load_straggler_wait_second
如果用户预估的数据量确实比较大,使得5分钟的上限不足时,用户可以通过此项调整 timeout 上限,默认值为300。TIMEOUT = MIN(load_straggler_wait_second, MAX(30, tablet_delete_timeout_second * tablet_num))
query_timeout
因为 delete 本身是一个 SQL 命令,因此删除语句也会受 session 限制,timeout 还受 Session 中的 query_timeout 值影响,可以通过 SET query_timeout = xxx 来增加超时时间,单位是秒。max_allowed_in_element_num_of_delete
如果用户在使用 in 谓词时需要占用的元素比较多,用户可以通过此项调整允许携带的元素上限,默认值为1024。SHOW DELETE [FROM db_name]
mysql> show delete from test_db;+-----------+---------------+---------------------+-----------------+----------+| TableName | PartitionName | CreateTime | DeleteCondition | State |+-----------+---------------+---------------------+-----------------+----------+| empty_tbl | p3 | 2020-04-15 23:09:35 | k1 EQ "1" | FINISHED || test_tbl | p4 | 2020-04-15 23:09:53 | k1 GT "80" | FINISHED |+-----------+---------------+---------------------+-----------------+----------+2 rows in set (0.00 sec)
文档反馈