Data collection

A Lightup metric tracks a series of aggregate measurements of your data. The aggregate measurement is collected by periodically querying rows in your data. The values are displayed on a time series chart with time on the X axis and the aggregate value on the Y axis.

The data collection process has a large number of configuration parameters that affect the timing of data collection and which rows in the dataset are collected during each data collection. Understanding the process of data collection is not necessary to work with Lightup, but it will enable you to better configure your metrics to be collected at the time most appropriate for your ETL pipeline.

All scheduled metrics follow the same basic data collection process: every collection interval, Lightup collects rows that fall within a collection window. Trigger metrics only collect data if they are triggered, and collect rows from all collection windows since the last time they were triggered.

A metric's Data Collection settings determine the extents of the collection window, how much data to aggregate to produce a metric value, and the frequency of data collection.

  • Query Scope determines whether collection includes the whole table or is limited to rows with timestamps in the collection window (via a WHERE clause). This to some extent then determines which other Data Collection settings are available.
  • Data Collection Schedule determines how often collection occurs, and for metrics with Incremental query scope, also the length of the collection window.
  • Aggregation Interval determines the time unit used for aggregating collected data into a datapoint (i.e., to GROUP BY).
  • Evaluation Delay determines how old timestamps must be before they are considered stable for data collection. In effect, this delay shifts the collection window towards an earlier aggregation interval.
  • Polling Delay adds a delay to the start of data collection. Once the delay period has passed, the metric is scheduled to run. This delay may be exceeded depending on system load, but will always at least be met.
  • Polling Interval determines how often data collection occurs if Data Collection Schedule is set to Scheduled.

Query scope

Lightup features two types of data collection, indicated by the Query Scope setting— Incremental and Full Table. Though the configuration options for these two query scopes differ significantly, the difference is essentially this: incremental collection uses timestamps to limit the range of collected data, but full table collection does not.

Incremental

When Query Scope is set to Incremental, Lightup performs data collection by periodically querying rows whose timestamp falls within a collection window. The Data Collection Schedule determines the length of the collection window. For Scheduled metrics, the default, each collection window is one Aggregation Interval long and occurs in the time zone specified by Aggregation Timezone, offset by the Evaluation Delay.

Each datapoint in an incremental metric's chart corresponds to an aggregation interval. You can hover on a datapoint to see the specific interval and the metric's value at that time.

The value of the datapoint is aggregated over the aggregation interval when the metric query runs (the evaluation time).

The Data Collection Window setting determines which collection window to collect: the window that immediately precedes the evaluation time, or the window that includes the evaluation time. Typically the collection window that precedes the evaluation time is the most recent collection window for which all data is available.

When performing collection, Lightup will use the metric configuration settings to identify the Target Collection Window. If the metric has not yet been backfilled, Lightup will perform collections up to and through that Target Collection Window such that all backfill windows are collected.

Full Table

When Query Scope is set to Full Table, Lightup performs data collection by querying the entire table at intervals specified by the Polling Interval, in the time zone specified by Polling Timezone, offset by the Polling Delay.

Each datapoint in a full table metric's chart corresponds to a collection window, and each successive collection window produces a new datapoint on the chart.

The value of the datapoint is aggregated over the whole table when the metric query runs (the evaluation time).