DataBug for Techies
  • Home
  • Getting Started
    • The DataBug process
    • Agents
      • Deploying your own Agent
      • IP Address Ranges
    • Connections
      • Relational Databases
        • Azure SQL
        • Azure Synapse Analytics
        • Firebird
        • IBM DB2
        • Microsoft SQL Server
        • MySQL
        • Oracle
        • PostgreSQL
      • Non-Relational Databases
        • Apache Cassandra
        • Apache CouchDB
        • Apache HBase
        • Azure Cosmos DB
        • Couchbase
        • Elasticsearch
        • MongoDB
        • Neo4j
        • RavenDB
      • Key-Value Stores
        • Amazon DynamoDB
      • File-Based Databases
        • Microsoft Access
        • H2
        • SQLite
      • Flat Files
        • Amazon S3
        • Azure Blob Storage
        • Azure File Storage
        • Local Folder
      • Web-Based Data
        • HTTP URL
    • Problem Definitions
    • Key Skills
  • 1. Gathering data
    • Working with query templates
    • Relational databases
      • Platforms
        • Microsoft SQL Server
    • No SQL databases
      • Defining schemas for JSON
      • Flattening JSON data
      • Platforms
        • Apache HBase
        • Neo4J
    • APIs
      • Platforms
        • ECOES API
    • Logs
      • Ingestion of log data
  • 2. Analyzing Data
    • Overview
  • 3. Managing Cases
    • Creating Cases
Powered by GitBook

Links

  • Guidance for Users
  • DataBug.com

Copyright © 2023 Red Bear Software Limited

On this page
  • Single Query to Cases
  • Multiple Queries to Cases

Was this helpful?

  1. Getting Started

Problem Definitions

These are recipes that describe the queries an Agent should run and how the content of a case should be formulated.

Problem Definitions are written using YAML.

We will introduce a graphical interface for configuring Problem Definitions in future updates.

A Problem Definition has two or three phases:

  1. Gather

  2. Analyze (optional)

  3. Manage

To understand these phases, here are two common scenarios:

  1. A single query that should be turned into cases

  2. Multiple queries against multiple data sources, analyzing the results of those queries, and then creating cases from that analysis.

Single Query to Cases

For this example, we'll assume the data comes from a SQL Server database.

The Problem Definition will consist of two phases:

  1. Gather

  2. Manage

In the Gather phase, a single SQL Server step will be used:

<- YAML ->

Column names are provided explicitly, making writing the case template easier later. Notice that the output of the stage is a Parquet file.

Parquet files store typed, tabular data in a way that's highly efficient for reading and querying.

In the Manage step, you can see that we're using Markdown to format the columns in the Parquet file. One case will be created for each row in the Parquet file that was the output from the Gather phase.

<- YAML ->

Multiple Queries to Cases

For this example, let's assume that you have data in a SQL Server relational database and in a Cosmos DB non-relational database.

The Problem Definition will consist of three phases:

  1. Gather

  2. Analyze

  3. Manage

The Gather phase will have two steps:

  • One to run a query against a SQL Server database and store the results in a Parquet file;

  • Another to run a query against a Cosmos database and store the results in another Parquet file.

<- YAML ->

The Analyze phase then uses the two Parquet files from the Gather stage and performs a LEFT JOIN query between the two using SQL:

<- YAML ->

The flavour of SQL used in the Analyze step is DuckDB. The output from the Analyze stage is another Parquet file.

Finally, in the Manage step, you can see that we're using Markdown to format the columns in the Analyze Parquet file. One case will be created for each row in the Parquet file.

<- YAML ->

PreviousHTTP URLNextKey Skills

Last updated 1 year ago

Was this helpful?