Kepler.gl on Druid Spatial Query

polaris Druid August 24, 2020 | 1

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 764

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 766

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 769

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 764

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 766

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 769

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 764

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 766

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 769

Notice: Undefined offset: 1 in /data/httpd/www/html/wp-includes/media.php on line 70

Metatron distributed Druid provides GIS indexing function for WKT format files, and supports various GIS functions. It also provides SQL query interface for ease of use, and a function to export the result to GeoJson format.

This example briefly explains how to analyze location data using Spatial Query provided by Metatron distributed Druid and Kepler.gl.

Get GIS data

1. Shape data
This example uses NYC Taxi Zones data. Data can be obtained from NYC Open Data. We use the CSV format for download format.

Import the downloaded file using Metatron Discovery’s Data Prep and create a snapshot. If the snapshot is created, ingest the snapshot into Druid. The reason for using Data Prep is for the convenience of ingestion such as automatic type detection, and it does not matter if you ingest on Druid in other ways. Refer to the figure below for the type of each field during ingestion.

Note that in this example, the_geom column has shape information of Multi Polygon type, but it is not indexed as polygon type and is saved as a string dimension. As an improvement of Druid compared to the previous version, some spatial operations are possible without indexing the polygon type. Indexing time can be greatly reduced by not indexing polygon types.

2. Fact data
Use Motor Vehicle Collisions-Crashes as Fact data. The total number of records is more than 1 million, and 8 GB or more of memory is required when using Data Prep. If the value of the_geom column does not exist, the record is deleted because it is meaningless when analyzing location data.

When creating a datasource, set CRASH TIME as the timestamp column and LOCATION column as the POINT type.

Spatial Query Using Druid SQL

1. Verifying Druid SQL Query Using Workbench
A spatial join is performed using Druid SQL between polygon data representing New York State Taxi Zone and point data representing accident location.

WITH NYC AS (
   SELECT 
		t1.Borough, t1.LocationID, t1.Zone, t1.service_zone, "NUMBER OF CYCLIST INJURED"
	FROM 
		"druid"."motor_vehicle_collisions_full" t2 JOIN "druid"."nyc_boundaries" t1
	ON 
		ST_Contains( t1.Zone, t2.LOCATION )
)
SELECT 
	NYC.LocationID, NYC.Borough, NYC.service_zone, t3.Zone as geom, count(*) as number, sum(NYC."NUMBER OF CYCLIST INJURED") as "cyclist injured"
FROM 
	NYC join "druid"."nyc_boundaries" t3 on (NYC.LocationID = t3.LocationID)
GROUP BY 
	NYC.LocationID, NYC.Borough, NYC.service_zone, t3.Zone;

You can check the result as follows.

2. Exporting data in GeoJSON format
Metatron distributed Druid has a function to output Druid SQL query results in Geo JSON format. If you request the same Druid SQL to the URL below, you can get the result in Geo JSON format.

curl -X POST -H "Content-Type: text/plain" -d @$QUERY_FILE "http://@$BROKER_HOST/druid/v2/sql/geojson"

One thing to be aware of when getting query results with Geo JSON foramt is that the type of location column of the query results must be changed according to the Geo JSON format, and the coordinate system must be changed according to the visualization library to be used. Kepler.gl supports epgs:4326, so convert the coordinate system accordingly.

WITH NYC AS (
   SELECT
                t1.Borough, t1.LocationID, t1.Zone, t1.service_zone, "NUMBER OF CYCLIST INJURED"
        FROM
                "druid"."motor_vehicle_collisions_sample3" t2 JOIN "druid"."nyc_boundaries" t1
        ON
                ST_Contains( t1.Zone, t2.LOCATION )
)
SELECT
        NYC.LocationID, NYC.Borough, NYC.service_zone, geom_transform(geom_fromWKT(t3.Zone), 4326, 4326) as geom, count(*) as number, sum(NYC."NUMBER OF CYCLIST INJURED") as "cyclist injured"
FROM
        NYC join "druid"."nyc_boundaries" t3 on (NYC.LocationID = t3.LocationID)
GROUP BY
        NYC.LocationID, NYC.Borough, NYC.service_zone, t3.Zone

After saving the query result as a file, import it into Kepler to complete the visualization.

Combining Kepler.gl and Druid

Metatron distributed Druid can query SQL through http interface, so it is possible to dynamically issue Druid SQL through javascript and import the result into Kepler.gl. Below is a simple example of this.

const xhr = new XMLHttpRequest();
xhr.addEventListener('load', () => {
  const geojson = JSON.parse(xhr.responseText);

  if(geojson.type == 'FeatureCollection'){
    let keplerData = {
      info : {
        label : 'datasource1',
        id : 'query-1'
      },
      data : {
        fields : [],
        rows : []
      }
    };

	//convert geojson to kepler.gl data format
    if(geojson.features.length > 0){
      const fields = Object.keys(geojson.features[0].properties);

      fields.forEach((field) => {
        keplerData.data['fields'].push({'name':field});
      });

      geojson.features.forEach((feature) => {
        let row = [];

        fields.forEach((field) => {
          row.push(feature.properties[field]);
        })

        row.push(feature.geometry);
        keplerData.data['rows'].push(row);
      });
    }

    keplerData.data['fields'].push({'name':'geometry'});

    this.props.dispatch(
      wrapTo(
        'map1',
        addDataToMap({
          datasets: keplerData
        })
      )
    );
  }

  this.props.dispatch(wrapTo('map1', hideAndShowSidePanel()));
});

xhr.addEventListener('onerror', () => {
  console.log(xhr.responseText);
});

xhr.open('POST', "http://localhost:8082/druid/v2/sql/geojson");
xhr.setRequestHeader('Content-type', 'text/plain');
xhr.send(this.state.sql);

druid sql spatial gis

One Response

Using Druid as a linked data source in Metatron Discovery | Metatron Discovery says:

September 4, 2020 at 3:30 am

[…] Kepler.gl on Druid Spatial Query […]

Reply

Kepler.gl on Druid Spatial Query