tencent cloud

Feedback

HDFS Data Import

Last updated: 2022-03-02 12:12:55

    This document describes how to import data from HDFS to Cloud Data Warehouse.

    Prerequisites

    1. Read permissions of HDFS are required for HDFS data access. See Access Control Overview for how to set permissions.
    2. The HDFS instance and Cloud Data Warehouse cluster must be in the same VPC.

    Directions

    1. Log in to Cloud Data Warehouse and create an HDFS table.
      CREATE TABLE hdfs_engine_table
      (
      `int_id` UInt32
      )
      ENGINE = ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
      
    Reference

    ENGINE = HDFS(URI, format)
    URI is the URI of the entire file in HDFS, and format specifies an available file format. For more formats, see Formats for Input and Output Data. A path URI may contain glob wildcards. In this case, the table will be read-only.

    1. Create a ClickHouse target table.

      • If your cluster has one replica:

        CREATE TABLE test.test on cluster default_cluster
        (
        `int_id` UInt32
        )
        engine = MergeTree()
        order by int_id;
        
      • If your cluster has two replicas:

        create table test.test on cluster default_cluster
        (
        `int_id` UInt32
        )
        engine = ReplicatedMergeTree('/clickhouse/tables/test/test/{shard}', '{replica}')
        order by int_id;
        
      • Create a distributed table:

        create table test.test_dis on cluster default
        AS test.test
        engine = Distributed('default_cluster', 'test', 'test', rand());
        
    2. Write data to the target table.

      INSERT INTO test.test SELECT * FROM hdfs_engine_table;
      
    3. Query the data.

      select * from test.test
      
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support