产品概述
基本概念
集群架构
产品优势
应用场景
SELECT INTO OUTFILE 语句可以将查询结果导出到文件中。目前支持通过 Broker 进程, 通过 S3 协议, 或直接通过 HDFS 协议,导出到远端存储,如 HDFS,S3,BOS,COS(腾讯云)上,语法如下:query_stmtINTO OUTFILE "file_path"[format_as][properties]
file_path``file_path 指向文件存储的路径以及文件前缀。如 hdfs://path/to/my_file_。
最终的文件名将由 my_file_,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:my_file_abcdefg_0.csvmy_file_abcdefg_1.csvmy_file_abcdegf_2.csv
[format_as]FORMAT AS CSV
[properties]
指定相关属性。目前支持通过 Broker 进程, 或通过 S3 协议进行导出。broker.。hdfs.。("broker.prop_key" = "broker.prop_val", ...)or("hdfs.fs.defaultFS" = "xxx", "hdfs.hdfs_user" = "xxx")or("AWS_ENDPOINT" = "xxx", ...)
("key1" = "val1", "key2" = "val2", ...)
column_separator:列分隔符,仅对 CSV 格式适用。默认为 \\t。line_delimiter:行分隔符,仅对 CSV 格式适用。默认为 \\n。max_file_size:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。schema:PARQUET 文件schema信息。仅对 PARQUET 格式适用。导出文件格式为PARQUET时,必须指定schema。set enable_parallel_outfile = true;。be_instance_num * parallel_fragment_exec_instance_num。explain select xxx from xxx where xxx into outfile "s3://xxx" format as csv properties ("AWS_ENDPOINT" = "xxx", ...);
RESULT FILE SINK 出现在 PLAN FRAGMENT 1 中,就说明导出并发开启成功了。
如果 RESULT FILE SINK 出现在 PLAN FRAGMENT 0 中,则说明当前查询不能进行并发导出 (当前查询不同时满足并发导出的三个条件)。并发导出的规划示例:+-----------------------------------------------------------------------------+| Explain String |+-----------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 2> | <slot 3> | <slot 4> | <slot 5> || PARTITION: UNPARTITIONED || || RESULT SINK || || 1:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS:`k1` + `k2` || PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1` || || RESULT FILE SINK || FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_ || STORAGE TYPE: S3 || || 0:OlapScanNode || TABLE: multi_tablet |+-----------------------------------------------------------------------------+
hdfs://path/to/result.txt。指定导出格式为 CSV。使用 Broker_Doris 并设置 kerberos 认证信息。指定列分隔符为 ,,行分隔符为 \\n。SELECT * FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS CSVPROPERTIES("broker.name" = "Broker_Doris","broker.hadoop.security.authentication" = "kerberos","broker.kerberos_principal" = "doris@YOUR.COM","broker.kerberos_keytab" = "/home/doris/my.keytab","column_separator" = ",","line_delimiter" = "\\n","max_file_size" = "100MB");
result_0.csv。如果大于 100MB,则可能为 result_0.csv, result_1.csv, ...。hdfs://path/to/result.parquet。指定导出格式为 PARQUET。使用 Broker_Doris 并设置 kerberos 认证信息。SELECT c1, c2, c3 FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS PARQUETPROPERTIES("broker.name" = "Broker_Doris","broker.hadoop.security.authentication" = "kerberos","broker.kerberos_principal" = "doris@YOUR.COM","broker.kerberos_keytab" = "/home/doris/my.keytab","schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2");
schema。hdfs://path/to/result.txt。默认导出格式为 CSV。使用 Broker_Doris 并设置 hdfs 高可用信息。使用默认的行列分隔符。WITHx1 AS(SELECT k1, k2 FROM tbl1),x2 AS(SELECT k3 FROM tbl2)SELEC k1 FROM x1 UNION SELECT k3 FROM x2INTO OUTFILE "hdfs://path/to/result_"PROPERTIES("broker.name" = "Broker_Doris","broker.username"="user","broker.password"="passwd","broker.dfs.nameservices" = "my_ha","broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2","broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port","broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port","broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
result_0.csv。如果大于 1GB,则可能为 result_0.csv, result_1.csv, ...。bos://bucket/result.txt。指定导出格式为 PARQUET。使用 Broker_Doris 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。导出完成后,生成一个标识文件。SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1INTO OUTFILE "bos://bucket/result_"FORMAT AS PARQUETPROPERTIES("broker.name" = "Broker_Doris","broker.bos_endpoint" = "http://bj.bcebos.com","broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx","broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy","schema"="required,int32,k1;required,byte_array,k2");
cos://${bucket_name}/path/result.txt。指定导出格式为 csv。
导出完成后,生成一个标识文件。select k1,k2,v1 from tbl1 limit 100000into outfile "cosn://my_bucket/export/my_file_"FORMAT AS CSVPROPERTIES("broker.name" = "Broker_Doris","broker.fs.cosn.userinfo.secretId" = "xx", "broker.fs.cosn.userinfo.secretKey" = "xx", "broker.fs.cosn.bucket.endpoint_suffix" = "cos.<REGION>.myqcloud.com","column_separator" = ",","line_delimiter" = "\\n","max_file_size" = "1024MB","success_file_name" = "SUCCESS")
my_file_0.csv。如果大于 1GB,则可能为 my_file_0.csv, result_1.csv, ...。
在 cos 上验证set enable_parallel_outfile = true;select k1 from tb1 limit 1000into outfile "s3://my_bucket/export/my_file_"format as csvproperties("AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com","AWS_ACCESS_KEY" = "xxxx","AWS_SECRET_KEY" = "xxx","AWS_REGION" = "<REGION>")
my_file_{fragment_instance_id}_。set enable_parallel_outfile = true;select k1 from tb1 order by k1 limit 1000into outfile "s3://my_bucket/export/my_file_"format as csvproperties("AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com","AWS_ACCESS_KEY" = "xxxx","AWS_SECRET_KEY" = "xxx","AWS_REGION" = "<REGION>")
hdfs://path/to/result.txt。指定导出格式为 CSV。使用并设置 kerberos 认证信息。SELECT * FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS CSVPROPERTIES("hdfs.fs.defaultFS" = "hdfs://namenode:port","hdfs.hadoop.security.authentication" = "kerberos","hdfs.kerberos_principal" = "doris@YOUR.COM","hdfs.kerberos_keytab" = "/home/doris/my.keytab");
mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";+------------+-----------+----------+--------------------------------------------------------------------+| FileNumber | TotalRows | FileSize | URL |+------------+-----------+----------+--------------------------------------------------------------------+| 1 | 2 | 8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |+------------+-----------+----------+--------------------------------------------------------------------+1 row in set (0.05 sec)
+------------+-----------+----------+--------------------------------------------------------------------+| FileNumber | TotalRows | FileSize | URL |+------------+-----------+----------+--------------------------------------------------------------------+| 1 | 3 | 7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ || 1 | 2 | 4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |+------------+-----------+----------+--------------------------------------------------------------------+2 rows in set (2.218 sec)
mysql> SELECT * FROM tbl INTO OUTFILE ...ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
SET query_timeout=xxx 进行设置。max_file_size。\\N,即 NULL。ST_Point 的输出类型为 VARCHAR,但实际输出值为经过编码的二进制字符。当前这些函数会输出乱码。对于地理函数,请使用 ST_AsText 进行输出。文档反馈