Release Notes and Announcements

Release Notes

Announcements

Security Announcements

Product Introduction

Overview

Strengths

Architecture

Features

Use Cases

Constraints and Limits

Technical Support Scope

Product release

Purchase Guide

EMR on CVM Billing Instructions

EMR on TKE Billing Instructions

EMR Serverless HBase Billing Instructions

Getting Started

EMR on CVM Quick Start

EMR on TKE Quick Start

EMR on CVM Operation Guide

Planning Cluster

Administrative rights

Configuring Cluster

Managing Cluster

Managing Service

Monitoring and Alarms

TCInsight

EMR on TKE Operation Guide

Introduction to EMR on TKE

Configuring Cluster

Cluster Management

Service Management

Monitoring and Ops

Application Analysis

EMR Serverless HBase Operation Guide

EMR Serverless HBase Product Introduction

Quotas and Limits

Planning an Instance

Managing an Instance

Monitoring and Alarms

Development Guide

EMR Development Guide

Hadoop Development Guide

Spark Development Guide

Hbase Development Guide

Phoenix on Hbase Development Guide

Hive Development Guide

Presto Development Guide

Sqoop Development Guide

Hue Development Guide

Oozie Development Guide

Flume Development Guide

Kerberos Development Guide

Knox Development Guide

Alluxio Development Guide

Kylin Development Guide

Livy Development Guide

Kyuubi Development Guide

Zeppelin Development Guide

Hudi Development Guide

Superset Development Guide

Impala Development Guide

Druid Development Guide

TensorFlow Development Guide

Kudu Development Guide

Ranger Development Guide

Kafka Development Guide

Iceberg Development Guide

StarRocks Development Guide

Flink Development Guide

JupyterLab Development Guide

MLflow Development Guide

Practical Tutorial

Practice of EMR on CVM Ops

Data Migration

Practical Tutorial on Custom Scaling

API Documentation

History

Introduction

API Category

Cluster Resource Management APIs

Cluster Services APIs

User Management APIs

Data Inquiry APIs

Scaling APIs

Configuration APIs

Other APIs

Serverless HBase APIs

YARN Resource Scheduling APIs

Making API Requests

Data Types

Error Codes

FAQs

EMR on CVM

Service Level Agreement

Custom Functions UDF

PDF

フォーカスモード

フォントサイズ

最終更新日: 2024-10-30 11:37:25

This document introduces custom functions (UDF) as well as their development and usage process.
UDF Classification
 UDF Classification
Description
UDF (User Defined Scalar Function)
A custom scalar function, commonly referred to as UDF. It has a one-to-one relationship between input and output, meaning that it reads one row of data and writes out a single output value.
UDTF (User Defined Table-valued Function)
A custom table-valued function, which is used in scenes where a single function call outputs multiple rows of data. It is also the only type of custom function that can return multiple fields.
UDAF (User Defined Aggregation Function)
A custom aggregation function where the relationship between input and output is many-to-one. It aggregates multiple input records into a single output value and can be used in conjunction with the GROUP BY statement in SQL.
For more details, see the community documentation: UDF,UDAF, and UDTF.
Developing UDF
Use an IDE to create a Maven project. The basic project information is as follows; you can customize the groupId and artifactId:
<groupId>org.example</groupId>
<artifactId>hive-udf</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
Add pom dependency:
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
 <groupId>org.apache.hive</groupId>
 <artifactId>hive-exec</artifactId>
 <version>3.1.3</version>
 <exclusions>
 <exclusion>
 <groupId>org.pentaho</groupId>
 <artifactId>*</artifactId>
 </exclusion>
 </exclusions>
</dependency>
Create a class with a name you can customize. This document takes nvl as an example:
Method 1: Extend UDF and override the evaluate method:
package org.example;
﻿
import org.apache.hadoop.hive.ql.exec.UDF;
﻿
public class nvl extends UDF {
 public String evaluate(final String s) {
 if (s == null) { return null; }
 return s + ":HelloWorld";
 }
}
Method 2 (recommended for scenes with complex parameters): Extend GenericUDF and override initialize, evaluate, and getDisplayString methods:
package org.example;
﻿
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
﻿
@Description(name = "nvl",
        value = "nvl(value, default_value) - Returns default value if value is null else returns value",
        extended = "Example: SELECT nvl(null, default_value);")
public class MyUDF extends GenericUDF {
﻿
    private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver;
    private ObjectInspector[] argumentOIs;
﻿
    /**
     * Determine the return type based on the parameter types of the function.
     */
    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        argumentOIs = arguments;
        if(arguments.length != 2) {
            throw new UDFArgumentException("The operator 'NVL' accepts 2 arguments.");
        }
        returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true);
        if(!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1]))) {
            throw new UDFArgumentTypeException(2, "The 1st and 2nd args of function NLV should have the same type, "
                    + "but they are different: \\""+arguments[0].getTypeName()+"\\" and \\"" + arguments[1].getTypeName() + "\\"");
        }
        return returnOIResolver.get();
    }
﻿
    /**
     * Calculate the result. The final result’s data type will be determined based on the return type specified in the initialize method.
     */
    public Object evaluate(DeferredObject[] arguments) throws HiveException {
        Object retVal = returnOIResolver.convertIfNecessary(arguments[0].get(), argumentOIs[0]);
        if(retVal == null) {
            retVal = returnOIResolver.convertIfNecessary(arguments[1].get(), argumentOIs[1]);
        }
        return retVal;
    }
﻿
    /**
     * Get the string to display in the explain
     */
    public String getDisplayString(String[] children) {
        StringBuilder builder = new StringBuilder();
        builder.append("if ");
        builder.append(children[0]);
        builder.append(" is null ");
        builder.append("returns ");
        builder.append(children[1]);
        return builder.toString();
    }
}
For method 2, package the custom code into a jar file. Execute the following command in the directory containing pom.xml to create the jar file.
mvn clean package -DskipTests
The target directory will contain the hive-udf-1.0-SNAPSHOT.jar file, indicating that the UDF development work is complete.
Using UDF
Upload the generated JAR to the EMR cluster Master node:
scp ./target/hive-udf-1.0-SNAPSHOT.jar root@${master_public_ip}:/usr/local/service/hive
Switch to the Hadoop user and execute the following command to upload the JAR to HDFS:
su hadoop
hadoop fs -put ./hive-udf-1.0-SNAPSHOT.jar /
View the jar uploaded to HDFS:
hadoop fs -ls /
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /data
drwxrwx--- - hadoop supergroup 0 2023-08-22 09:20 /emr
-rw-r--r-- 2 hadoop supergroup 3235 2023-08-22 15:39 /hive-udf-1.0-SNAPSHOT.jar
drwx-wx-wx - hadoop supergroup 0 2023-08-22 09:20 /tmp
drwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /user
Connect to Hive:
hive
Execute the following command to create a function using the generated JAR package:
hive> create function nvl as "org.example.MyUDF" using jar "hdfs:///hive-udf-1.0-SNAPSHOT.jar";
Note:
1. nvl is the name of the UDF function.
2. org.example.MyUDF is the fully qualified name of the class created in the project.
3. hdfs:///user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar is the path to the JAR package uploaded to HDFS.
If the following information appears, it indicates that the creation was successful:
Added [/data/emr/hive/tmp/1b0f12a6-3406-4700-8227-37dec721297b_resources/hive-udf-1.0-SNAPSHOT.jar] to class path
Added resources: [hdfs:///hive-udf-1.0-SNAPSHOT.jar]
OK
Time taken: 1.549 seconds
You can also verify whether the function was successfully created by executing the command SHOW FUNCTIONS LIKE 'nvl'.
Execute the following command to use the UDF function. The function can be accessed in the same way as that of built-in functions, by directly using the function name:
hive> select nvl("tur", "def");
OK
tur
Time taken:0.344 seconds, Fetched:1 row(s)
hive> select nvl(null, "def");
OK
def
Time taken:0.471 seconds, Fetched:1 row(s)
﻿

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Elastic MapReduce

Custom Functions UDF

UDF Classification

Developing UDF

Using UDF

ヘルプとサポート

UDF Classification	Description
UDF (User Defined Scalar Function)	A custom scalar function, commonly referred to as UDF. It has a one-to-one relationship between input and output, meaning that it reads one row of data and writes out a single output value.
UDTF (User Defined Table-valued Function)	A custom table-valued function, which is used in scenes where a single function call outputs multiple rows of data. It is also the only type of custom function that can return multiple fields.
UDAF (User Defined Aggregation Function)	A custom aggregation function where the relationship between input and output is many-to-one. It aggregates multiple input records into a single output value and can be used in conjunction with the GROUP BY statement in SQL.