Technology · Data Stack

Columnar data movement without enterprise bloat.

Our data stack is built around open formats, local execution, and strict separation between raw data, transformed facts, and exported outputs.

Data infrastructure should be fast, inspectable, and portable.

Polars, DuckDB, Arrow, and Parquet form a practical local-first analytics layer. They let us move serious event data without pretending every customer needs a heavy SaaS platform.


DATA STACK RESPONSIBILITIES

Polars

Columnar transformation.

Polars carries the transformation layer: lazy execution, typed tables, joins, filters, aggregations, and predictable data preparation.

DuckDB

Local analytical SQL.

DuckDB gives us in-process SQL: ad hoc exploration, joins over larger datasets, and query workflows without a server dependency.

Arrow

Memory interchange.

Used as the common columnar interchange layer between engines, exports, and consumers that need efficient movement without copy-heavy conversion.

Parquet / CSV

Open outputs.

Parquet keeps analysis portable. Customers should be able to inspect, archive, and load outputs into their own BI tools.

Event Logs

Process evidence.

Used when source systems provide timestamps, cases, actions, owners, statuses, and workflow traces that can become process facts.

Contracts

Schema discipline.

Contracts control column mapping, semantic gates, missing data, source policies, and exported table definitions.

Boundary

Technology explains the data layer. Observatory is the product built on top.

This page should describe the engines and data movement. Detailed process mining, business diagnostics, licensing, and adoption belong under Observatory.

  • Polars
  • DuckDB
  • Arrow
  • Parquet
  • CSV
  • Event logs
# Local-first data path
source export
  -> schema contract
  -> Polars transform
  -> DuckDB query layer
  -> Parquet / CSV / BI output