Table configuration

When you activate a table, its Manage Table Settings modal opens so you can configure the table's settings, which are inherited by metrics based on the table. You can override these on a per-metric basis if a particular metric should have different settings. Note that these settings are inherited by Lightup's autometrics and low-code custom metrics, but they are not inherited by SQL metrics. With a SQL metric, you will all settings on the metric itself via a combination of configuration options and SQL.

The settings you will select differ based on your selection of Query Scope. If Query Scope is set to Incremental, your form will have the fields shown below:

If Query Scope is set to Full Table, your form will instead have the fields shown below.

The various settings in this form are described below.

Query Scope

Query Scope controls how many rows of data get collected during metric collection (i.e., how much of the table is read). There are two options, Incremental and Full Table, and it should be the first setting that you specify: when you choose an option, some of the remaining inheritable settings change accordingly— they're dependent on Query Scope.

Full Table query scope was designed to support DQI for dimension tables (those with no time-based fields), while Incremental query scope is designed to support DQIs on tables where you want to progressively query consecutive time periods.

  • While most settings can be overridden on a per-metric basis, this setting cannot.
  • All metrics other than SQL metrics will have the same query scope as the table on which they are based.
  • Full Table metrics generate full table queries which may result in compute intensive/expensive queries on large tables

The table below gives a summary of the difference between Incremental and Full Table query scope.

Query ScopeWhat's collectedDependent settingsRecommended use case
IncrementalOnly those rows with a timestamp inside the most recent aggregation interval.Agg. Interval, Agg. Time Zone, Eval. Delay, Timestamp, Time ZoneFact tables (tables with a timestamp column)
Full TableAll rows, every time the metric query runs.Polling Interval, Polling Time Zone, Polling DelayDimension tables (tables with no timestamp column

Data Collection Schedule

Lightup metrics can be scheduled by Lightup or triggered by your own ETL flow. If you want to control exactly when your metric is evaluated (for example, you want it to be evaluated after some other condition has occurred in your ETL flow), set Data Collection Schedule to Triggered. Set it to Scheduled if you want Lightup to schedule evaluation on an appropriate repeating schedule.

Scheduled data collection

  • Metric collection occurs on a repeating schedule for each aggregation interval. Data within the Evaluation Delay period is not collected

Triggered data collection

  • Use an API call to trigger collection of specific metrics or for a whole table.
  • Data collection is triggered for live metrics only. Paused metrics ignore triggers.
  • When you trigger collection, the triggered metrics run at the start of the next collection cycle. Data within the Evaluation Delay period is not collected
  • API use requires authentication: To use most of the API calls in our API Reference, including Trigger metric collection, you need to get an auth token from Lightup. To prepare, follow the steps on Get access token.

Aggregation Interval

[Only for Query Scope = Incremental]

Specify an Aggregation Interval to set the timeframe over which the metric's value is aggregated. For Scheduled metrics, this (combined with Evaluation Delay) also determines when data collection occurs: daily metric values are calculated over the 24-hour period starting at 12AM; hourly metric values are calculated at the top of each hour; and weekly metrics are calculated starting at 12AM on Monday. Triggered metrics aggregate over the specified aggregation interval as well, but collection will not occur until an API call triggers the collection.

If you override table settings when you create or edit a metric, several longer periods are available options for Aggregation Interval: Monthly, Quarterly, and Yearly. The options are not available as table settings— you can only choose them in a metric when you override inherited table settings.

Note: Aggregation Interval is only set when Query Scope is set to Incremental

Aggregation Timezone

[Only for Query Scope = Incremental]

Aggregation Time Zone specifies the time zone that is used for metric aggregation. For example, if Aggregation Interval is daily, the data will be aggregated for the 24-hour period starting at 12AM in the Aggregation Time Zone.

Evaluation Delay

[Only for Query Scope = Incremental]

Set Evaluation Delay to a value which represents the time period required for your data to be guaranteed stable. Evaluation Delay is a blackout period during which data is considered not stable or not ready for running data quality checks. For example, if a metric has an Evaluation Delay of two hours, data with a timestamp in the past two hours is ignored. So, if the Aggregation Interval is daily and Evaluation Delay is 2 hours, each metric value aggregates 24 hours worth of data, starting at 2AM in the Aggregation Time Zone.

Timestamp

[Only for Query Scope = Incremental]

Timestamp specifies the column used as your timestamp. The aggregation period for a metric is based on the value in the timestamp column, translated into the time zone specified by the Aggregation Time Zone. As described above, only data with timestamps prior to [now] - Evaluation Delay are considered.

  • For query efficiency, when selecting a timestamp or creating a virtual timestamp based on another column, we recommend you reference an indexed/sort-keyed column.
  • Some databases support enforcement of a sort-key column. If your database supports that, you will have a toggle in Query Governance called Enforce Sort Keyed Timestamps. If this is toggled on, then only sort-keyed timestamps will show in your timestamp drop down both here in table settings, and when creating a SQL metric.

Create a virtual timestamp

Note that only date or datetime columns can be selected as a timestamp -- other columns will not be displayed. If you find you have no timestamp selection available but you know how to compute your timestamp, use this procedure to create a virtual timestamp.

  1. Select a column with values you can turn into a timestamp using functions.
  2. Enter a function for converting the column's values into UTC-format timestamp values. For example, if you store timestamps as the epoch with millisecond precision, your input might be TO_TIMESTAMP({column_value}/1000).
  3. Enter a function for converting UTC-format timestamp values into valid column values, then select Save. For example, your input might be EXTRACT(epoch FROM {timestamp_value}) *1000.

Virtual timestamp function examples

The following table lists the functions you would use to transform a column value to and from a timestamp value, if the column values were stored as the epoch with millisecond precision. You can use these as a starting point, adjusting the transformation as needed to accommodate your column values.

DatabasecolumnToTimestamptimestampToColumn
AthenaFROM_UNIXTIME({column_value} / 1000)TO_UNIXTIME({timestamp_value}) * 1000
BigQueryTIMESTAMP_MILLIS({column_value})UNIX_SECONDS({timestamp_value}) * 1000
DatabricksCAST(({column_value} / 1000) AS TIMESTAMP)UNIX_TIMESTAMP({timestamp_value})::BIGINT * 1000
GreenplumTO_TIMESTAMP({column_value} / 1000)EXTRACT(epoch FROM {timestamp_value}) * 1000
HiveFROM_UNIXTIME(CAST({column_value} / 1000 AS BIGINT))UNIX_TIMESTAMP({timestamp_value}) * 1000
Microsoft SQLDATEADD(s, {column_value} / 1000, '19700101')CAST(DATEDIFF(s, '19700101', {timestamp_value}) <br><br>AS BIGINT) * 1000
Oracle(timestamp '1970-01-01 00:00:00') + numtodsinterval({column_value} / 1000, 'second')(CAST(SYS_EXTRACT_UTC({timestamp_value}) AS DATE) - DATE '1970-01-01' ) * 86400 * 1000
PostgresTO_TIMESTAMP({column_value} / 1000)EXTRACT(epoch FROM {timestamp_value}) * 1000
RedshiftTIMESTAMP 'epoch' + {column_value} / 1000 * interval '1 second'EXTRACT(epoch FROM {timestamp_value})::bigint * 1000
SnowflakeTO_TIMESTAMP({column_value} / 1000)EXTRACT(epoch FROM {timestamp_value}) * 1000
TeradataTO_TIMESTAMP({column_value} / 1000)CAST(((CAST(CAST({timestamp_value} AS TIMESTAMP) AS DATE AT 'GMT') - DATE '1970-01-01') * 86400 + (EXTRACT(HOUR FROM CAST({timestamp_value} AS TIMESTAMP) AT 'GMT') * 3600) + (EXTRACT(MINUTE FROM CAST({timestamp_value} AS TIMESTAMP) AT 'GMT') * 60) + (EXTRACT(SECOND FROM CAST({timestamp_value} AS TIMESTAMP)))) * 1000 AS BIGINT)

Time Zone

[Only for Query Scope = Incremental]

Some timestamp columns don't include time zone information, so you might need to specify the Timezone for the timestamp. If the time zone is part of the timestamp, this setting says Derived and can't be changed.

Polling Interval

[Only for Query Scope = Full Table]

When Query Scope is set to Full Table, Polling Interval controls how often the metric query is scheduled to run. Similar to Aggregation Interval, if you override table settings when you create or edit a metric, several longer periods are available:Monthly, Quarterly, and Yearly.

Polling Time Zone

[Only for Query Scope = Full Table]

Set Polling Time Zone to the time zone associated with the Polling Interval.

Polling Delay

[Only for Query Scope = Full Table]

Use Polling Delay to introduce a delay at the beginning of the Polling Interval. Once the delay period has passed, the metric is scheduled to run. This delay may be exceeded depending on system load, but will always at least be met.

Partitions

If your table has date-based partitions, you'll see an additional Partitions section in your form. Use this section to specify the partition columns and the format of each partition. By specifying a partition, you will be enabling Lightup to generate efficient queries that query only partitions where the data is expected to reside.

Partitions are complex and a good way to understand them is to override your table settings and experiment with the partition settings on a simple metric. Run previews and then look at the query history. Some examples are included below.

Here's an example of a table with a single partition:

Here's an example of a table with three partitions:

To add a partition

  • Click Add + to add a partition column and select the column from the drop down. Columns that are partition columns will be denoted with (Partition) in grey lettering to the right of the column name.
  • Specify the partition's format using Python format codes. When Lightup generates a query for a time interval, it will convert the start and end of that time interval to the partition date format and generate a where clause that queries only the partitions where the data is expected to reside. Examples are probably the best way to explain this.
    • If you have a partition column called month and it contains a two-digit month number, you can enter %m for the format. If Lightup is querying the date range 03/07/2023 through 03/13/2023, the following WHERE clause will be included in the query
      WHERE `month` in (03)
      
    • If you have three partition columns called year (a four-digit year), month (a two-digit month), and day (a two-digit day), you can enter %Y for year, %m for month, and %d for day. If Lightup is querying the date range 03/07/2023 through 03/13/2023, the following WHERE clause will be included in the query:
      WHERE
                  `year` in (2023)
                  AND `month` in (03)
                  AND `day` in (07, 08, 09, 10, 11, 12, 13)
      
    • If you have a partition column called datehour that contains the date and hour, in a format such as 03/07/2023/18, you can enter %y/%m/%d/%H for the format. If Lightup is querying the time range 03/07/2023 18:00 through 03/08/2023 18:00, the following WHERE clause will be included in the query:
      WHERE
              CAST(`datehour` AS STRING) in (
                      '2023/03/07/18',
                      '2023/03/07/19'
              ...
              '2023/03/07/23',
              '2023/03/08/00',
              '2023/03/08/01',
              ...
              '2023/03/08/18'
                  )
      
  • Click Advanced to configure the Timezone, Include Past and Include Future settings for partitions. Include Past and Include Future tell Lightup to query partitions associated with timestamps in the past or future.
    • In example 2 above, where you have the year, month, and day partitions, if Include Past is set to 2 days, the WHERE clause will be modified to include two additional days in the past (note inclusion of 05 and 06):
      WHERE
                  `year` in (2023)
                  AND `month` in (03)
                  AND `day` in (05, 06, 07, 08, 09, 10, 11, 12, 13)
      
    • In example 3 above, where you have a partition column called _datehour__, if Include Past is set to 2 days in the past, the WHERE clause will be modified to include the full two day period, hour by hour for the two days prior to the query period, resulting in:
      WHERE
              CAST(`datehour` AS STRING) in (
              '2023/03/05/18',
              ...
              '2023/03/05/23',
              '2023/03/06/00',
              ...
              '2023/03/06/23',
              '2023/03/07/00',
              ...
                      '2023/03/07/18',
                      '2023/03/07/19'
              ...
              '2023/03/07/23',
              '2023/03/08/00',
              '2023/03/08/01',
              ...
              '2023/03/08/18'
                  )