DataBug for Techies
  • Home
  • Getting Started
    • The DataBug process
    • Agents
      • Deploying your own Agent
      • IP Address Ranges
    • Connections
      • Relational Databases
        • Azure SQL
        • Azure Synapse Analytics
        • Firebird
        • IBM DB2
        • Microsoft SQL Server
        • MySQL
        • Oracle
        • PostgreSQL
      • Non-Relational Databases
        • Apache Cassandra
        • Apache CouchDB
        • Apache HBase
        • Azure Cosmos DB
        • Couchbase
        • Elasticsearch
        • MongoDB
        • Neo4j
        • RavenDB
      • Key-Value Stores
        • Amazon DynamoDB
      • File-Based Databases
        • Microsoft Access
        • H2
        • SQLite
      • Flat Files
        • Amazon S3
        • Azure Blob Storage
        • Azure File Storage
        • Local Folder
      • Web-Based Data
        • HTTP URL
    • Problem Definitions
    • Key Skills
  • 1. Gathering data
    • Working with query templates
    • Relational databases
      • Platforms
        • Microsoft SQL Server
    • No SQL databases
      • Defining schemas for JSON
      • Flattening JSON data
      • Platforms
        • Apache HBase
        • Neo4J
    • APIs
      • Platforms
        • ECOES API
    • Logs
      • Ingestion of log data
  • 2. Analyzing Data
    • Overview
  • 3. Managing Cases
    • Creating Cases
Powered by GitBook

Links

  • Guidance for Users
  • DataBug.com

Copyright © 2023 Red Bear Software Limited

On this page
  • Should I use JSONata or Handlebars?
  • How we write arrays of data to Parquet files

Was this helpful?

  1. 1. Gathering data
  2. No SQL databases

Flattening JSON data

Nested JSON data must be "flattened" to be used in Datasmells. You can do this one of two ways: using JSONata or Handlebars.

Imagine a datasource - e.g. NoSQL database, API or file - returned JSON data like this:

{
  "CustomerName": "Big Co",
  "Contacts": [
    {
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
     "FirstName": "Amy",
     "LastName": "Jones"
    }
   ]
}

and that you'd like to create a flattened array structure like this:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    }
]

JSONata is a lightweight query and transformation language for JSON data.

To create our flattened data structure, the following expression can be used:

($)@$cust.Contacts.($)@$.{
    "CustomerName": $cust.CustomerName,
    "FirstName": FirstName, 
    "LastName": LastName
    }

This creates a variable named $cust to store a back-reference to the top-level customer record and iterates on the Contacts array.

Handlebars is a generic text templating tool that's mainly used for creating HTML from data, however we also use it to create JSON data.

You must ensure that your Handlebars template creates valid JSON.

To create our flattened data structure, the following expression can be used:

[
{{#each Contacts}}
	{
    "CustomerName": {{#jval}}"{{../CustomerName}}"{{/jval}},
    "FirstName": {{#jval}}"{{FirstName}}"{{/jval}},
    "LastName": {{#jval}}"{{LastName}}"{{/jval}}
    }{{#continue}},{{/continue}}
{{/each}}
]

This iterates over the Contacts array and prints out a JSON data structure.

The CustomerName value is retrieved by navigating to the parent object using the annotation {{../CustomerName}}.

Property values are wrapped in {{#jval}} helpers. This helper will render the specified content when the value isn't null, but render the word null when it is. So if a record's FirstName value is missing or null, you would see the following output:

    "FirstName": null,

Finally, the {{#continue}} helper is used to append a comma after each object in the array apart from the last.

Note that any values included in double moustaches - e.g. {{FirstName}} - will be automatically escaped. For example, double quotes will be changed to \" and any new lines will be changed to \n. If you don't want the value to be escaped, use triple moustaches: {{{FirstName}}}.

The Handlebars documentation refers to HTML escaping throughout its guidance. This is because the main way in which Handlebars is used is in generating web pages.

However, in our implementation of Handlebars, HTML escaping has been replaced with JSON escaping as described above.

Should I use JSONata or Handlebars?

There is no "right" answer for everybody. Influencing factors include your comfort with JSONata's transformation language versus Handlebars' templating language and the degree of responsibility you want to take for creating valid JSON output.

JSONata is designed for data and will guarantee that you produce valid JSON output. It will often result in shorter templates and allows you to apply filters to the JSON data itself, for example to only include contacts in a certain country in the output. It will allow you to sort, group and aggregate data.

How we write arrays of data to Parquet files

These aren't areas of functionality you can configure, though it's important you're aware how they work.

Merging arrays

Imagine we pull two records back from a NoSQL database or API sequentially. The first record is as follows:

{
  "CustomerName": "Big Co",
  "Contacts": [
    {
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
     "FirstName": "Amy",
     "LastName": "Jones"
    }
   ]
}

and the second returns this:

{
  "CustomerName": "Small Co",
  "Contacts": [
    {
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
   ]
}

This will be flattened as two separate arrays:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    }
]
[
    {
      "CustomerName": "Small Co",
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
]

We will automatically merge these arrays together when the records are written to the Parquet dataset, giving something that is equivalent to this:

[
    {
      "CustomerName": "Big Co",
      "FirstName": "John",
      "LastName": "Smith"
    },
    {
      "CustomerName": "Big Co",
      "FirstName": "Amy",
      "LastName": "Jones"
    },
    {
      "CustomerName": "Small Co",
      "FirstName": "Tiny",
      "LastName": "Timothy"
    }
]

Iteration level

Additionally, we always deep-dive into arrays to object-level in the final flattened / transformed JSON, so this:

[
    [
        [
            {
              "fruit": "apple"
            }
        ]
    ],
    {
      "fruit": "orange"
    }
]

would be transformed into:

[
    {
      "fruit": "apple"
    },
    {
      "fruit": "orange"
    }
]
PreviousDefining schemas for JSONNextPlatforms

Last updated 3 years ago

Was this helpful?

Here's how you can accomplish this using both and :

The will give you the best guidance on how to use language and there's also a where you can test your transformations within your browser.

The will give you the best guidance on how to use language and there's also a where you can test your template within your browser.

Handlebars is designed for text templating rather than managing data. Its syntax means that the template is much more readable to the human eye. You can also make use of any of its extensive list of for manipulating numbers, strings, dates and more. Since we use Handlebars elsewhere in Datasmells - e.g. for generating queries based on previous result sets and for case templates - it also has the advantage of being familiar. However, since Handlebars allows you to create any kind of text output, the onus is on you to ensure that you create syntactically valid JSON.

JSONata
Handlebars
full documentation
playground
full documentation
playground
helpers