User Guide
Search…
Prepare tables for data quality metrics
Activate a table, set values for its inheritable settings, and manage any partitions or timestamps

Set a table Active

Before you can collect metrics on a data asset, it must be Active. You can only set a data asset Active if its parent asset is already Active:
  1. 1.
    To make a schema active, select its datasource in the Explorer tree, then choose Manage Metrics on the Actions menu on the top right panel. Then in the dialog that opens, toggle the Active slider for the schema (slide it right for Active). It takes a few minutes for the schema to activate. During this time, the schema's data will be unavailable.
2. To make a table active, select its schema in the Explorer tree, then choose Manage Metrics on the Actions menu on the top right panel. Then in the dialog that opens, toggle the Active slider for the table (slide it right for Active). The Manage Table Settings dialog immediately opens so you can provide values for inheritable settings.
3. To make a column active, select Manage Metrics on the Actions menu for its parent table, then slide the Active toggle to the right for the column you want to activate. You can then enable various column autometrics.

Manage Table Settings

When you make a table Active, its Manage Table Settings dialog opens so you can configure the table's inheritable settings.
Manage Table Settings dialog for a table with partitions that can be added for use in metrics' queries
  1. 1.
    Specify an Aggregation Interval to set how often the metric's value is aggregated. For daily, metric values are calculated over the 24-hour period starting at 12AM. For hourly, metrics will be calculated at the top of each hour. For weekly, metrics will be calculated starting at 12AM on Monday.
  2. 2.
    Select the Aggregation Time Zone. This specifies the time zone that is used for metric aggregation. For example, if Aggregation Interval is daily, the data will be aggregated for the 24-hour period starting at 12AM in the Aggregation Time Zone.
  3. 3.
    Set Evaluation Delay to a value which represents the time period required for your data to be guaranteed stable. Evaluation Delay is a blackout period during which data is considered not stable or not ready for running data quality checks. For example, if a metric has an Evaluation Delay of one day, data with a timestamp in the past 24 hours is ignored.
  4. 4.
    Choose the timestamp column for your table. You can also create a virtual timestamp if no suitable column is ready to use. The aggregation period for a metric is based on the value in the timestamp column, translated into the time zone specified by the Aggregation Time Zone. As described above, only data with timestamps prior to [now] - Evaluation Delay are considered.
  5. 5.
    Some timestamp columns don't include time zone information, so you might need to specify the Time Zone where the data was written. If the time zone is part of the timestamp, this setting says Derived and can't be changed.
  6. 6.
    Under Partitions, if your table has time-based partitions, you can specify the column and format of the partition so that you can use it to improve metric's query performance. Format should be specified using Python format codes. If your table doesn't have any partitions, the Partitions section doesn't appear in the Manage Table Settings dialog.
  7. 7.
    Click Confirm and Save to save your settings. Now that you've activated the table, you can create metrics for it.

Create a virtual timestamp column

Most datasources support virtual timestamps. If you're working with a datasource that doesn't, you won't see the option to create one.
If you don't have a column that can be used as a timestamp you can create a timestamp column by creating a virtual timestamp. You create a virtual timestamp for a table in the Manage Table Settings dialog.
  1. 1.
    Select the Timestamp field, then select + Create a virtual column.
  2. 2.
    In the flyout that appears, select a column with values you can turn into a timestamp using functions.
  3. 3.
    Enter a function for converting the column's values into UTC-format timestamp values. For example, if you store timestamps as the epoch with millisecond precision, your input might be TO_TIMESTAMP({column_value}/1000).
  4. 4.
    Enter a function for converting UTC-format timestamp values into valid column values, then select Save. For example, your input might be EXTRACT(epoch FROM {timestamp_value}) *1000.
If you already have stored procedures for these transformations, you can use the procedures as inputs if Lightup is able to access them.

Virtual timestamp function examples

The following table lists the functions you would use to transform a column value to and from a timestamp value, if the column values were stored as the epoch with millisecond precision. You can use these as a starting point, adjusting the transformation as needed to accommodate your column values.
Database
columnToTimestamp Function
timestampToColumn Function
Athena
FROM_UNIXTIME({column_value} / 1000)
TO_UNIXTIME({timestamp_value}) * 1000
BigQuery
TIMESTAMP_MILLIS({column_value})
UNIX_SECONDS({timestamp_value}) * 1000
Databricks
CAST(({column_value} / 1000) AS TIMESTAMP)
UNIX_TIMESTAMP({timestamp_value})::BIGINT * 1000
Microsoft SQL
DATEADD(s, {column_value} / 1000, '19700101')
CAST(DATEDIFF(s, '19700101', {timestamp_value}) AS BIGINT) * 1000
Oracle
(timestamp '1970-01-01 00:00:00') + numtodsinterval({column_value} / 1000, 'second')
(CAST(SYS_EXTRACT_UTC({timestamp_value}) AS DATE) - DATE '1970-01-01' ) * 86400 * 1000
Postgres
TO_TIMESTAMP({column_value} / 1000)
EXTRACT(epoch FROM {timestamp_value}) * 1000
Redshift
TIMESTAMP 'epoch' + {column_value} / 1000 * interval '1 second'
EXTRACT(epoch FROM {timestamp_value})::bigint * 1000
Snowflake
TO_TIMESTAMP({column_value} / 1000)
EXTRACT(epoch FROM {timestamp_value}) * 1000