Flattening JSON data
Nested JSON data must be "flattened" to be used in Datasmells. You can do this one of two ways: using JSONata or Handlebars.
Imagine a datasource - e.g. NoSQL database, API or file - returned JSON data like this:
{
"CustomerName": "Big Co",
"Contacts": [
{
"FirstName": "John",
"LastName": "Smith"
},
{
"FirstName": "Amy",
"LastName": "Jones"
}
]
}
and that you'd like to create a flattened array structure like this:
[
{
"CustomerName": "Big Co",
"FirstName": "John",
"LastName": "Smith"
},
{
"CustomerName": "Big Co",
"FirstName": "Amy",
"LastName": "Jones"
}
]
Here's how you can accomplish this using both JSONata and Handlebars:
JSONata is a lightweight query and transformation language for JSON data.
The full documentation will give you the best guidance on how to use language and there's also a playground where you can test your transformations within your browser.
To create our flattened data structure, the following expression can be used:
($)@$cust.Contacts.($)@$.{
"CustomerName": $cust.CustomerName,
"FirstName": FirstName,
"LastName": LastName
}
This creates a variable named $cust
to store a back-reference to the top-level customer record and iterates on the Contacts
array.
Should I use JSONata or Handlebars?
There is no "right" answer for everybody. Influencing factors include your comfort with JSONata's transformation language versus Handlebars' templating language and the degree of responsibility you want to take for creating valid JSON output.
JSONata is designed for data and will guarantee that you produce valid JSON output. It will often result in shorter templates and allows you to apply filters to the JSON data itself, for example to only include contacts in a certain country in the output. It will allow you to sort, group and aggregate data.
Handlebars is designed for text templating rather than managing data. Its syntax means that the template is much more readable to the human eye. You can also make use of any of its extensive list of helpers for manipulating numbers, strings, dates and more. Since we use Handlebars elsewhere in Datasmells - e.g. for generating queries based on previous result sets and for case templates - it also has the advantage of being familiar. However, since Handlebars allows you to create any kind of text output, the onus is on you to ensure that you create syntactically valid JSON.
How we write arrays of data to Parquet files
These aren't areas of functionality you can configure, though it's important you're aware how they work.
Merging arrays
Imagine we pull two records back from a NoSQL database or API sequentially. The first record is as follows:
{
"CustomerName": "Big Co",
"Contacts": [
{
"FirstName": "John",
"LastName": "Smith"
},
{
"FirstName": "Amy",
"LastName": "Jones"
}
]
}
and the second returns this:
{
"CustomerName": "Small Co",
"Contacts": [
{
"FirstName": "Tiny",
"LastName": "Timothy"
}
]
}
This will be flattened as two separate arrays:
[
{
"CustomerName": "Big Co",
"FirstName": "John",
"LastName": "Smith"
},
{
"CustomerName": "Big Co",
"FirstName": "Amy",
"LastName": "Jones"
}
]
[
{
"CustomerName": "Small Co",
"FirstName": "Tiny",
"LastName": "Timothy"
}
]
We will automatically merge these arrays together when the records are written to the Parquet dataset, giving something that is equivalent to this:
[
{
"CustomerName": "Big Co",
"FirstName": "John",
"LastName": "Smith"
},
{
"CustomerName": "Big Co",
"FirstName": "Amy",
"LastName": "Jones"
},
{
"CustomerName": "Small Co",
"FirstName": "Tiny",
"LastName": "Timothy"
}
]
Iteration level
Additionally, we always deep-dive into arrays to object-level in the final flattened / transformed JSON, so this:
[
[
[
{
"fruit": "apple"
}
]
],
{
"fruit": "orange"
}
]
would be transformed into:
[
{
"fruit": "apple"
},
{
"fruit": "orange"
}
]
Last updated
Was this helpful?