How to Use Hive UDFs with Druid
Metatron distributed Druid supports SQL like Apache Druid. It also provides more usability by supporting the Hive’s UDF and UDAF. In this article, we will see how to register and use UDFs written in Hive with the Metatron distributed Druid.
Note: You must download the latest hotfix Metatron distributed Druid for using the UDF plugin. Some bugs exist in previous versions.
First, prepare the UDF jar file used by Hive. In this example, we will create a simple UDF function by referring to the following URL.
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-java-udf
If you create a UDF by referring to the URL above and complete the build, the Example UDF-1.0-SNAPSHOT.jar file is generated. Copy this file to $ DRUID_HOME / extensions / druid-hive-udf-extensions.
> cp ExampleUDF-1.0-SNAPSHOT.jar $DRUID_HOME/extensions/druid-hive-udf-extensions
Then create hive.function.properties in the $ DRUID_HOME/extensions/druid-hive-udf-extensions directory and write the following to register the UDF in Druid.
tolower=com.microsoft.examples.ExampleUDF
Finally, modify the configuration file of Druid. Add the hive udf plugin to the $ DRUID_HOME/conf/druid/_common/ common.runtime.properties file as follows:
druid.extensions.loadList=["druid-jdbc-firehose", "druid-histogram", "druid-datasketches", "druid-hive-extensions", "mysql-metadata-storage", "druid-hdfs-storage", "druid-range", "druid-area", "druid-stats", "druid-jdbc-firehose", "druid-orc-extensions", "druid-kafka-indexing-service", "druid-lucene-extensions", "druid-geotools-extensions", "druid-hive-udf-extensions"]
Now restart the server.
stop-single.sh
start-single.sh
When Druid starts up successfully, the following message will be displayed in the log.
...
2019-11-20T05:09:12,626 INFO [main] io.druid.hive.HiveUDFDruidModule - > 'tolower' is registered with class com.microsoft.examples.ExampleUDF
...
Now check if UDF works properly using node shell.
bin/node.sh tools shell
>sql
sql>select tolower(City) from sales_geo limit 10;
[EXPR$0]
--------
[houston]
[naperville]
[naperville]
[naperville]
[philadelphia]
[henderson]
[athens]
[henderson]
[henderson]
[henderson]
Use Hive UDAF through the registration in a similar way. If you activate a Hive UDF plugin, many already developed UDFs will be registered. Check out the list below and use the functions you need. If you are running a distributed environment, do the same for all Druid nodes.
sql> select * from sys.functions;
[name, type, external]
----------------------
[$assign, UDF, false]
[$assignfirst, UDF, false]
[$avg, UDF, false]
[$delta, UDF, false]
[$dense_rank, UDF, false]
[$first, UDF, false]
[$histogram, UDF, false]
[$irr, UDF, false]
[$lag, UDF, false]
[$last, UDF, false]
[$lead, UDF, false]
[$max, UDF, false]
[$mean, UDF, false]
[$min, UDF, false]
[$mirr, UDF, false]
[$next, UDF, false]
[$npv, UDF, false]
[$nth, UDF, false]
[$percentile, UDF, false]
[$prev, UDF, false]
[$rank, UDF, false]
[$row_num, UDF, false]
…
[tolower, UDF, true]
[toradians, UDF, false]
[trim, UDF, false]
[trunc, UDF, false]
[truncate, UDF, false]
[truncatedrecent, UDF, true]
[ulp, UDF, false]
[unix_timestamp, UDF, false]
[upper, UDF, false]
[variance, UDAF, true]
[weekofweekyear, UDF, false]
[weekyear, UDF, false]
[year, UDF, false]
[||, UDF, false]
> Retrieved 483 rows in 22 msec
If you have any questions about the post, leave a comment or post questions to our forum. Hope this post helps you!
One Response
[…] If you can’t create the measurement you want using the calculation formula, you can develop your own function using the Hive UDF/UDAF add-on provided by Metatron distribution Druid. For the Hive UDF/UDAF add-on, see “How to Use Hive UDFs with Druid“. […]