tencent cloud

Feedback

COS Data Import

Last updated: 2022-03-02 12:13:32

    This document describes how to import data from COS to Cloud Data Warehouse.

    Prerequisites

    1. The COS data source and the Cloud Data Warehouse cluster must be in the same VPC.
    2. The access-key-id and access-key-secret entered in the table function must have permission to read the corresponding oss-file-path.
    3. The oss-file-path parameter needs to meet the OSS path format specification, which is oss://<bucket-name <path-to-file=""> typically.

    Directions

    In the following example, data is imported from the S3 system (with COS as an example in this document) to Cloud Data Warehouse.

    1. Log in to Cloud Data Warehouse and create an S3 table.
      CREATE TABLE cos_engine_table
      (
      `int_id` UInt32
      )
      ENGINE = S3('http://clickhouse-xxx.myqcloud.com/clickhouse-xxx/cos/data.csv.gz', 'CSV', 'gzip')
      
    • Sample S3 engine parameters:
      The S3 table engine provides integration with the Amazon S3 ecosystem. Its parameter follows the format of S3(path, [aws_access_key_id, aws_secret_access_key,] format, [compression]).
      • path: bucket URL with file path. In read-only mode, it supports *, ?, {abc,def}, and {N..M} wildcards, where N and M are digits and 'abc' and 'def' are strings.
      • format: file format.
      • aws_access_key_id and aws_secret_access_key: permanent credentials of COS account. You can use them to authenticate your request. These parameters are optional. If no credentials are specified, credentials will be read from the configuration file. For more information, see Using S3 to Store Data.
      • compression: compression type. Supported values are none, gzip/gz, brotli/br, xz/LZMA, and zstd/zst. This parameter is optional. The compression type is automatically detected based on the file extension by default.
    1. Create a target table.

      • If your cluster has one replica:

        CREATE TABLE test.test on cluster default_cluster
        (
        `int_id` UInt32
        )
        engine = MergeTree()
        order by int_id;
        
      • If your cluster has two replicas:

        create table test.test on cluster default_cluster 
        (
        `int_id` UInt32
        )
        engine = ReplicatedMergeTree('/clickhouse/tables/test/test/{shard}', '{replica}')
        order by int_id;
        
      • Create a distributed table:

        create table test.test_dis on cluster default_cluster
        AS test.test
        engine = Distributed('default_cluster', 'test', 'test', rand());
        
    1. Write data to the target table.
      INSERT INTO test.test SELECT * FROM cos_engine_table;
      
    1. Query the data.
      select * from test.test
      
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support