Overview
Concepts
Cluster Architecture
Strengths
Scenarios
SELECT INTO OUTFILE statement can export query results to a file. Currently, it supports exporting to remote storage such as HDFS, S3, BOS, and COS (Tencent Cloud) through the Broker process, the S3 protocol, or direct through the HDFS protocol. The syntax is as follows:query_stmtINTO OUTFILE "file_path"[format_as][properties]
file_path``file_path points to the path and file prefix of Cloud File Storage. Such as hdfs://path/to/my_file_.
The final filename will be composed of my_file_, file number and file format suffix. The file number starts from 0, and the quantity is the number of times the file is divided, such as:my_file_abcdefg_0.csvmy_file_abcdefg_1.csvmy_file_abcdegf_2.csv
[format_as]FORMAT AS CSV
[properties]
Specify related attributes. Currently, it supports exporting through the Broker process, or through the S3 protocol.broker..hdfs..("broker.prop_key" = "broker.prop_val", ...)or("hdfs.fs.defaultFS" = "xxx", "hdfs.hdfs_user" = "xxx")or("AWS_ENDPOINT" = "xxx", ...)
("key1" = "val1", "key2" = "val2", ...)
column_separator: Column separator, only applicable to CSV format. Default is \\t.line_delimiter: Row separator, only applicable to CSV format. Default is \\n.max_file_size: Maximum size of a single file. The default is 1GB. The range is between 5MB and 2GB. Files larger than this size will be split.schema: PARQUET file schema information. Only applicable to PARQUET format. When the files are exported in the PARQUET format, you must specify schema.set enable_parallel_outfile = true;.be_instance_num * parallel_fragment_exec_instance_num.explain select xxx from xxx where xxx into outfile "s3://xxx" format as csv properties ("AWS_ENDPOINT" = "xxx", ...);
RESULT FILE SINK appearing in PLAN FRAGMENT 1, it means that the concurrent export was successful.
If RESULT FILE SINK appears in PLAN FRAGMENT 0, it means that the current query cannot be exported concurrently (the current query does not simultaneously satisfy the three conditions of concurrent export).Concurrent Export Planning Example:+-----------------------------------------------------------------------------+| Explain String |+-----------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 2> | <slot 3> | <slot 4> | <slot 5> || PARTITION: UNPARTITIONED || || RESULT SINK || || 1:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS:`k1` + `k2` || PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1` || || RESULT FILE SINK || FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_ || STORAGE TYPE: S3 || || 0:OlapScanNode || TABLE: multi_tablet |+-----------------------------------------------------------------------------+
hdfs://path/to/result.txt. Specify the export format as CSV. Use Broker_Doris and set Kerberos authentication information. Specify the column separator as , and the row separator as \\n.SELECT * FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS CSVPROPERTIES("broker.name" = "Broker_Doris","broker.hadoop.security.authentication" = "kerberos","broker.kerberos_principal" = "doris@YOUR.COM","broker.kerberos_keytab" = "/home/doris/my.keytab","column_separator" = ",","line_delimiter" = "\\n","max_file_size" = "100MB");
result_0.csv. If it is larger than 100MB, it may be: result_0.csv, result_1.csv, ....hdfs://path/to/result.parquet. Specify the export format as PARQUET. Use Broker_Doris and set Kerberos authentication information.SELECT c1, c2, c3 FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS PARQUETPROPERTIES("broker.name" = "Broker_Doris","broker.hadoop.security.authentication" = "kerberos","broker.kerberos_principal" = "doris@YOUR.COM","broker.kerberos_keytab" = "/home/doris/my.keytab","schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2");
schema.hdfs://path/to/result.txt. The default export format is CSV. Use Broker_Doris and set hdfs high availability information. Use the default row and column separators.WITHx1 AS(SELECT k1, k2 FROM tbl1),x2 AS(SELECT k3 FROM tbl2)SELEC k1 FROM x1 UNION SELECT k3 FROM x2INTO OUTFILE "hdfs://path/to/result_"PROPERTIES("broker.name" = "Broker_Doris","broker.username"="user","broker.password"="passwd","broker.dfs.nameservices" = "my_ha","broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2","broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port","broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port","broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
result_0.csv. If it is larger than 1GB, it may be: result_0.csv, result_1.csv, ....bos://bucket/result.txt. Specify the export format as PARQUET. Use Broker_Doris and set hdfs for high availability. The PARQUET format does not need to specify a column divider. After the export is finished, an identification file will be generated.SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1INTO OUTFILE "bos://bucket/result_"FORMAT AS PARQUETPROPERTIES("broker.name" = "Broker_Doris","broker.bos_endpoint" = "http://bj.bcebos.com","broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx","broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy","schema"="required,int32,k1;required,byte_array,k2");
cos://${bucket_name}/path/result.txt. Specify the export format as csv.
After the export is completed, an identification file will be generated.select k1,k2,v1 from tbl1 limit 100000into outfile "cosn://my_bucket/export/my_file_"FORMAT AS CSVPROPERTIES("broker.name" = "Broker_Doris","broker.fs.cosn.userinfo.secretId" = "xx", "broker.fs.cosn.userinfo.secretKey" = "xx", "broker.fs.cosn.bucket.endpoint_suffix" = "cos.<REGION>.myqcloud.com","column_separator" = ",","line_delimiter" = "\\n","max_file_size" = "1024MB","success_file_name" = "SUCCESS")
my_file_0.csv. If it is larger than 1GB, it might be: my_file_0.csv, result_1.csv, ....
Verify on cosset enable_parallel_outfile = true;select k1 from tb1 limit 1000into outfile "s3://my_bucket/export/my_file_"format as csvproperties("AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com","AWS_ACCESS_KEY" = "xxxx","AWS_SECRET_KEY" = "xxx","AWS_REGION" = "<REGION>")
my_file_{fragment_instance_id}_.set enable_parallel_outfile = true;select k1 from tb1 order by k1 limit 1000into outfile "s3://my_bucket/export/my_file_"format as csvproperties("AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com","AWS_ACCESS_KEY" = "xxxx","AWS_SECRET_KEY" = "xxx","AWS_REGION" = "<REGION>")
hdfs://path/to/result.txt. Specify the export format as CSV. Use this and set the kerberos authentication information.SELECT * FROM tblINTO OUTFILE "hdfs://path/to/result_"FORMAT AS CSVPROPERTIES("hdfs.fs.defaultFS" = "hdfs://namenode:port","hdfs.hadoop.security.authentication" = "kerberos","hdfs.kerberos_principal" = "doris@YOUR.COM","hdfs.kerberos_keytab" = "/home/doris/my.keytab");
mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";+------------+-----------+----------+--------------------------------------------------------------------+| FileNumber | TotalRows | FileSize | URL |+------------+-----------+----------+--------------------------------------------------------------------+| 1 | 2 | 8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |+------------+-----------+----------+--------------------------------------------------------------------+1 row in set (0.05 sec)
+------------+-----------+----------+--------------------------------------------------------------------+| FileNumber | TotalRows | FileSize | URL |+------------+-----------+----------+--------------------------------------------------------------------+| 1 | 3 | 7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ || 1 | 2 | 4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |+------------+-----------+----------+--------------------------------------------------------------------+2 rows in set (2.218 sec)
mysql> SELECT * FROM tbl INTO OUTFILE ...ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
SET query_timeout=xxx.max_file_size.\\N, that is NULL.ST_Point, output types as VARCHAR, but the actual output values are encoded binary characters. These cases may lead to garbled outputs. For geographic functions, please use ST_AsText for output.Esta página foi útil?
Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.
comentários