tencent cloud


Connecting to Hive Through Python

Last updated: 2020-10-10 15:01:07

    Apache Hadoop Hive is integrated with the Thrift service. Thrift, created by Facebook, is an interface definition language and binary communication protocol used for defining and creating services for numerous languages. HiveServer2 is based on Thrift, allowing many languages such as Java and Python to call Hive's APIs.

    This section describes how to connect to HiveServer2 through Python.

    1. Preparations for Development

    • Confirm that you have activated Tencent Cloud and created an EMR cluster. When creating the EMR cluster, select the Hive component on the software configuration page.
    • Hive and its dependencies are installed in the EMR cluster directory /usr/local/service/.

    2. Viewing Parameters

    First, log in to any node (preferably a master one) in the EMR cluster. For more information about how to log in to EMR, please see Logging in to a Linux Instance. Here, you can use WebShell to log in. Click Login on the right of the desired CVM instance and then enter the login page. The default username is root, and the password is the one you set when you created the EMR cluster. Once your credentials have been validated, you can access the command-line interface.

    Run the following command in EMR command-line interface to switch to the Hadoop user, then go to the Hive installation folder:

    [root@172 ~]# su hadoop
    [hadoop@172 root]$ cd /usr/local/service/hive/
    [hadoop@172 hive]$

    View the parameters that are required for the program:

    [hadoop@172 hive]$ vim conf/hive-site.xml

    Here, $hs2host is the hostID of your HiveServer2, and $hs2port is the port number of your HiveServer2.

    3. Using Python to Manipulate Hive

    To manipulate Hive with a Python program, you need to install pip:

    [hadoop@172 hive]$ su
    [root@172 hive]# pip install pyhs2

    Switch back to the Hadoop user after the installation is completed. Create a Python file named hivetest.py in the /usr/local/service/hive/ directory and add the following code:

    #!/usr/bin/env python
    import pyhs2
    import sys
    default_encoding = 'utf-8'
    conn = pyhs2.connect(host='$hs2host',
    tablename = 'HiveByPython'
    cur = conn.cursor()
    print 'show the databases: '
    print cur.getDatabases()
    print "\n"
    print 'show the tables in default: '
    cur.execute('show tables')
    for i in cur.fetch():
            print i
    cur.execute('drop table if exists ' + tablename)
    cur.execute('create table ' + tablename + ' (key int,value string)')
    print "\n"
    print 'show the new table: '
    cur.execute('show tables ' +"'" +tablename+"'")
    for i in cur.fetch():
            print i
    print "\n"
    print "contents from " + tablename + ":";
    cur.execute('insert into ' + tablename + ' values (42,"hello"),(48,"world")')
    cur.execute('select * from ' + tablename)
    for i in cur.fetch():
            print i


    The parameters $hs2host and $hs2port in the program should be replaced with the values of the hostID and port number of the HiveServer2 you queried.

    After connecting to HiveServer2, this program outputs all the databases first and then displays the tables in the "default" database. Create a table named "hivebypython", insert two data entries into it, and output them. Run the program:

    [hadoop@172 hive]$ python hivetest.py
    show the databases: 
    [['default', ''], ['hue_test', ''], ['test', '']]
    show the tables in default: 
    show the new table: 
    contents from HiveByPython:
    [42, 'hello']
    [48, 'world']
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support