Category: Data preparation

Created with Sketch.

Data Preparation and Continuous Integration

After creating a nice dataflow, you’ll probably want to reuse the dataflow against another file or query result automatically (like daily-basis) or manually. Metatron Discovery’s data swapping feature is developed to enable this kind of works. Although Metatron Discovery itself doesn’t have scheduling ability, but you can use preptool, a CLI tool for data preparation…
Read more

Using Apache Spark in Data Preparation

Recently, Discovery Spark Engine has been introduced in Metatron Discovery. It is an external ETL engine based on Spark SQL. The embedded engine suffers huge garbage collection pressure if the record count reaches more than 1M. In this case, Discovery Spark Engine is the best solution. To avoid complex dependency problems, Discovery Spark Engine is…
Read more

How to pivot your data

This tutorial demonstrates how to use Data preparation to change rows into columns in your dataset. Let’s look at the following sample data: The above data consists of field name and values. Filed names are consisted of productId, userId, profileName, helpfulness, score, time, summary, and text. How do you represent these fields in a single…
Read more