Key Skills

These are the skills you'll need to put together a Problem Definition.

General

YAML

At present, Problem Definitions are defined using YAML. We'll be introducing a graphical way of configuring these in a future update.

Querying

Gather Phase

When performing queries against a data source - e.g. SQL Server, MongoDB, Elasticsearch - you'll use the query language that's native to that platform.

For example, when using SQL Server, you'd use its flavor of T-SQL. For MySQL, it's a slightly different flavor of T-SQL. With MongoDB, it'd be MQL.

Analyze Phase

The Analyze phase is optional in Problem Definitions.

Analyze phase queries are performed using DuckDB. This allows you to query Parquet files using SQL as if they were tables.

DuckDB is Copyright 2018-2023 Stichting DuckDB Foundation.

For each Parquet file used as in input to the Analyze phase, a kind of virtual table is used. For example, if you have a filename off Foo.parquet then you would have a table named Foo. These virtual tables can then be queried using the DuckDB flavor of SQL.

Restructuring JSON data

The end result that DataBug needs to create cases is a Parquet file with "flat" data as opposed to hierarchic data like JSON.

When querying JSON-based data sources, you may be able to flatten your data using the query itself. For example, Cosmos DB's SQL syntax allows you to JOIN between hierarchic layers of a record and SELECT fields from multiple levels.

However, in some cases - especially JSON files or APIs - you can't control the structure of the data you get back from the data source. In these situations, we offer two ways of transforming the data you receive prior to a flat Parquet file being written.

JSONata

JSONata is a lightweight querying and transformation language for JSON data. It was inspired by the location path semantics of XPath.

JSONata is the approach we recommend for most JSON restructuring situations.

For any given JSON structure, you can transform it into another shape. The JSONata Exerciser gives a good example of what can be achieved and you can use it as a playground for experimentation.

JSONata has a rich language for querying JSON.

Handlebars

Handlebars is a {{ mustache-based }} templating language. Using Handlebars to transform data gives you complete control over every character of the output JSON you want to create.

Handlebars is Copyright (C) 2011-2019 by Yehuda Katz

The Handlebars Expressions documentation details how to put placeholders from a JSON object into output text. The Playground also lets you experiment with templates.

Creating Cases

Cases are created using Markdown and Handlebars.

Markdown

Markdown is a simple and easy-to-use markup language you can use to format virtually any document.

Its rich syntax allows you to create multiple levels of headings, style fonts, create tables or checklists, etc.

Use Markdown to present the narrative of a case to your end users. A good structure to follow is:

  • What's happened;

  • Why it's important / What the impact is;

  • How to resolve it, step-by-step.

However, you're free to write whatever works best for your team!

Markdown allows you to control the format and overall wording of your case, but to include data from Parquet files, you'll need to use Handlebars.

Handlebars

Handlebars is a {{ mustache-based }} templating language.

Handlebars is Copyright (C) 2011-2019 by Yehuda Katz

The Handlebars Expressions documentation details how to put placeholders from a JSON object into your Markdown content. The Playground also lets you experiment with templates.

Each row from a Parquet file will be presented into your template as a JSON object. Handlebars can then be used to "mail merge" data into your narrative, for example:

Customer {{ CustomerName }}'s contract is close to expiry.

On top of the "out-of-the-box" Handlebars expressions, we have also introduced several of our own to make writing cases easier:

  • One

  • Two

  • Three

Last updated

Was this helpful?