Flattening JSON data

Nested JSON data must be "flattened" to be used in Datasmells. You can do this one of two ways: using JSONata or Handlebars.

Imagine a datasource - e.g. NoSQL database, API or file - returned JSON data like this:

{
  "CustomerName": "Big Co",
  "Contacts": [
    {
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
     "FirstName": "Amy",
     "LastName": "Jones"
    }
   ]
}

and that you'd like to create a flattened array structure like this:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    }
]

Here's how you can accomplish this using both JSONata and Handlebars:

JSONata is a lightweight query and transformation language for JSON data.

The full documentation will give you the best guidance on how to use language and there's also a playground where you can test your transformations within your browser.

To create our flattened data structure, the following expression can be used:

($)@$cust.Contacts.($)@$.{
    "CustomerName": $cust.CustomerName,
    "FirstName": FirstName, 
    "LastName": LastName
    }

This creates a variable named $cust to store a back-reference to the top-level customer record and iterates on the Contacts array.

Handlebars is a generic text templating tool that's mainly used for creating HTML from data, however we also use it to create JSON data.

The full documentation will give you the best guidance on how to use language and there's also a playground where you can test your template within your browser.

You must ensure that your Handlebars template creates valid JSON.

To create our flattened data structure, the following expression can be used:

[
{{#each Contacts}}
	{
    "CustomerName": {{#jval}}"{{../CustomerName}}"{{/jval}},
    "FirstName": {{#jval}}"{{FirstName}}"{{/jval}},
    "LastName": {{#jval}}"{{LastName}}"{{/jval}}
    }{{#continue}},{{/continue}}
{{/each}}
]

This iterates over the Contacts array and prints out a JSON data structure.

The CustomerName value is retrieved by navigating to the parent object using the annotation {{../CustomerName}}.

Property values are wrapped in {{#jval}} helpers. This helper will render the specified content when the value isn't null, but render the word null when it is. So if a record's FirstName value is missing or null, you would see the following output:

    "FirstName": null,

Finally, the {{#continue}} helper is used to append a comma after each object in the array apart from the last.

Note that any values included in double moustaches - e.g. {{FirstName}} - will be automatically escaped. For example, double quotes will be changed to \" and any new lines will be changed to \n. If you don't want the value to be escaped, use triple moustaches: {{{FirstName}}}.

The Handlebars documentation refers to HTML escaping throughout its guidance. This is because the main way in which Handlebars is used is in generating web pages.

However, in our implementation of Handlebars, HTML escaping has been replaced with JSON escaping as described above.

Should I use JSONata or Handlebars?

There is no "right" answer for everybody. Influencing factors include your comfort with JSONata's transformation language versus Handlebars' templating language and the degree of responsibility you want to take for creating valid JSON output.

JSONata is designed for data and will guarantee that you produce valid JSON output. It will often result in shorter templates and allows you to apply filters to the JSON data itself, for example to only include contacts in a certain country in the output. It will allow you to sort, group and aggregate data.

Handlebars is designed for text templating rather than managing data. Its syntax means that the template is much more readable to the human eye. You can also make use of any of its extensive list of helpers for manipulating numbers, strings, dates and more. Since we use Handlebars elsewhere in Datasmells - e.g. for generating queries based on previous result sets and for case templates - it also has the advantage of being familiar. However, since Handlebars allows you to create any kind of text output, the onus is on you to ensure that you create syntactically valid JSON.

How we write arrays of data to Parquet files

These aren't areas of functionality you can configure, though it's important you're aware how they work.

Merging arrays

Imagine we pull two records back from a NoSQL database or API sequentially. The first record is as follows:

{
  "CustomerName": "Big Co",
  "Contacts": [
    {
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
     "FirstName": "Amy",
     "LastName": "Jones"
    }
   ]
}

and the second returns this:

{
  "CustomerName": "Small Co",
  "Contacts": [
    {
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
   ]
}

This will be flattened as two separate arrays:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    }
]

[
    {
      "CustomerName": "Small Co",
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
]

We will automatically merge these arrays together when the records are written to the Parquet dataset, giving something that is equivalent to this:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    },
    {
      "CustomerName": "Small Co",
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
]

Iteration level

Additionally, we always deep-dive into arrays to object-level in the final flattened / transformed JSON, so this:

[
    [
        [
            {
              "fruit": "apple"
            }
        ]
    ],
    {
      "fruit": "orange"
    }
]

would be transformed into:

[
    {
      "fruit": "apple"
    },
    {
      "fruit": "orange"
    }
]

PreviousDefining schemas for JSON NextPlatforms

Last updated 3 years ago

Was this helpful?