Basic Configuration

Configuration guide for common use-cases

Common use-cases

Here are some common common data quality use-cases that our users bring up:

  1. Check if data is available for a set of key tables. This comprises of two main questions:

    1. Were the tables updated on time?

    2. If updates happen on time, did the right volume of data show up in the tables?

  2. For some important columns, check if data is valid. Here are the most common questions:

    1. Did the column start registering too many null values?

    2. For a categorical column, did a new category show up or did a category disappear?

Configuring Lightup

Step 1: Connect data warehouses(s)

Click on Datasources under Settings, and add a new connector by entering the credentials. Lightup supports a wide list of pre-built connectors, a complete list can be found here.

Step 2: Enable profiling to get instant visibility

Lightup comes with out-of-the-box definitions for key data quality measures that we call data quality indicators (DQIs). You can choose which tables/columns to profile in order to start recording DQIs for those data assets.

Once a data warehouse is connected you can click on it to access all the tables and columns. Select the tables you want to start profiling data availability for, and bulk enable profiling.

Lightup pre-selects the timestamp column to use for time aggregated query against the data source for the each table, adjust this as needed before enabling profiling.

If you wish to profile specific columns for certain tables, you can click into a table to access the columns and bulk enable distribution and null checks for columns of interest.

Now click on Dashboard to get a real-time and historical view of DQIs such as data delay, data volume for each data asset that has profiling enabled on it. These charts can be used to answer questions such as:

  • Is data delay higher than recent past indicating that the table was not updated on time?

  • Is the data volume too low or too high compared to recent past?

Step 3: Set up monitoring (optional)

You can turn on monitoring and alerting for DQIs of interest with a few clicks. Turning on monitoring will detect incidents and generate alerts when a data quality metric deviates from expectations. Clicking on Monitor this metric lets you select the type of monitoring and assign alerting channel(s) for generated alerts (email, Slack, Pagerduty or other alerting channels). You can add new alerting channels by clicking on Notifications under Settings. You can choose to set manual thresholds for monitoring or use ML based anomaly detection.

Using thresholds you can set up checks such as:

  • Alert me when data delay goes above 12 hours for a given table (select absolute threshold).

  • Alert me when data volume changes by more than 30% day over day or week over week (select percentage change threshold).

For more complex scenarios, you can use Lightup’s ML powered anomaly detection features. In this case, the system will select the right algorithm to use, train it based on past data and start alerting when the metric deviates from learned expectations.