Defining schemas for JSON
JSON is typically schemaless. This allows each JSON object in a collection to differ in terms of its structure and data types, but this poses challenges for Parquet files which require a fixed schema.
An example
SELECT CustomerReference,
SUM (InvoiceTotal) AS SalesTotal
FROM 'customers.parquet' AS c
JOIN 'sales.parquet' AS s ON c.CustomerReference = s.CustomerReference
GROUP BY CustomerReferenceDefining a schema in YAML
columns:
- name: FirstName
type: string
- name: LastName
type: string
- name: Age
type: int64
- name: Salary
type: double
- name: Children
type: bool
- name: BirthDate
type: timestamp[s]Creating a schema automatically
All data types
Last updated
Was this helpful?