Exporting data using Druid SQL
Metatron distribution Druid provides a function to export JSON query results to files in various formats such as ORC, Parquet, JSON, CSV, Excel, etc. The syntax for this is specified in the context of the Druid JSON query and is as follows.
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "covid_worldwide_test_data"
},
...
"outputColumns": [
"Province_State",
"SUM(positive)"
],
"context": {
"forwardContext": {
"cleanup": "true",
"columns": "Province_State,SUM(positive)",
"format": "excel",
"maxRowsPerSheet": 1000000
},
"forwardURL": "file:///var/folders/8l/fmdg5m9d5_3bx2zlyw3vc30w0000gn/T//MFD-709c2e76-f81a-4b34-aa29-66e54dca6349",
...
}
}
The description and values for each setting are as follows.
Required | Description | Value | |
cleanup | O | Set whether to overwrite the data stored in forwardURL | true | false |
format | O | File format to export | excel | csv | json | parquet | orc |
forwardURL | O | Directory to save exported files | |
columns | O | Column list to be exported It is specified among the names in ouputColumns. | |
maxRowsPerSheet | Number of records to be exported per sheet when format is excel | ||
wrapAsList | Whether to export the result in json list format when format is json | true | false | |
withHeader | Whether to include column name as header when format is csv | true | false |
The same results can be achieved using Druid SQL. Below is the syntax.
insert overwrite into directory '/Users/workspace/test' as 'ORC|PARQUET|CSV|JSON|EXCEL' with ('withHeader' => 'true'|'wrapAsList' => 'true'|maxRowsPerSheet => ‘50000’)
The following is an example of exporting an SQL query to EXCEL.
insert overwrite into directory '/Users/i1befree/workspace/test' as 'EXCEL' with ('maxRowsPerSheet' => '5000')
SELECT * FROM "druid"."covid_worldwide_test_data"