Monitor a metric

You monitor metrics to stay aware of data quality issues the metrics uncover.

❗️

The procedures on this page require the Workspace Editor role, unless otherwise noted at the top of the procedure.

  • A metric periodically calculates values from a data asset and aggregates those results into a single value.
  • A monitor provides a mechanism for generation of incidents when a metric value deviates from desired bounds.
  • A monitor is associated with a single metric and defines which values of that metric are out of bounds.
  • As time goes on, if a monitor is attached to a metric, the monitor will generate incidents when the metric value is found to be out of bounds.
  • A monitor may be either a manual threshold monitor, in which case the bounds are specified explicitly by the data analyst, or an anomaly detection monitor, in which case the bounds are generated by looking at the historical norms for the metric.
  • Anomaly detection monitors must be trained based on past data. The training data can be a single period or multiple non-contiguous periods. These monitors become better at detecting incidents when you reject incidents (helping the monitor learn what constitutes a false positive), and when you adjust/add more training periods.

Create a monitor

  1. With a metric's chart displayed in Explorer, on the Monitors menu select + Add.
1189

📘

An Activity metric can only have one monitor, and when you add the monitor, the modal will only display notification options.

  1. Based on the metric's type, Lightup opens the Add monitor modal and displays one of two tabs: Threshold or Anomaly Detection.
517
  • Threshold tab settings
    • Type - Choose Value if the monitor should focus on whether metric values are expected, or Percentage Change if the monitor should focus on whether the metric value's trend is changing. Do not use Percentage Change unless there is a known trend already and metric values are consistently non-zero.
    • Select the direction for your thresholds and specify values - Select an arrow button to turn detection on or off for that direction.
    • Upper and/or Lower threshold value - Enter threshold values. Note that you can only enter a threshold value if the direction arrow is selected for that direction.
  • Anomaly Detection tab settings
    • Select manually create a monitor to open the monitor configuration details to finish creating the monitor. If you don't choose this option, Lightup will use automatic anomaly detection.
  1. If you have an alerting channel ready and want incidents to generate alerts there, enter it under Select the channels to send notifications. You can also manage alerting channels later, after the monitor is working.
  2. Optionally, move the Mute toggle to the right to prevent the monitor from generating alerts.
  3. Click OK to save the new monitor and begin any training.
    • After you finish setting up, the monitor detects when its metric's value is out of bounds and logs variations as incidents.
    • A monitor can bundle consecutive variations as one incident or record them as separate incidents.
    • You can adjust training for anomaly detection monitors, for example by adding training periods to improve detection. For more information, see Train a monitor.

Pause/Resume a monitor

In Explorer

  • On the metric's chart in Explorer, on the Monitors menu, do one of the following:
    • Select name-of-the-monitor → Pause.
    • Select name-of-the-monitor → Resume.

In the Monitors list

  1. On the Monitors tab, use the controls to find the monitor in the Monitors list.
  2. In the Status column, select the current status of the monitor, and then choose Pause or Resume (depending on the current status).

Choose a monitor's alerting channels

  1. On the metric's chart in Explorer, on the Monitors menu select name-of-the-monitor → Manage alerts.
  2. In the Notification Channels modal, click the box to select one or more existing alerting channels. As needed, adjust the Mute toggle. Note that a muted monitor stops sending alerts to all channels; to stop receiving alerts for a specific channel, remove it from the selected channels in the Notification Channels modal.

Use schedules to control notifications for a monitor

🚧

You can't create a schedule while you're editing a monitor— instead, you select a schedule that's already been created. For steps, see Create a schedule.

  1. In the Monitors list, select the monitor's name and then select Edit.
  2. Near the top-right, select the ellipsis beside the Muted button.
  3. In the menu that opens, under Select notification muting schedule, click the box and then select a schedule from the drop-down list that opens.
1393

Manually create a monitor

For some metric types, you can manually create a monitor. Manually creating a monitor gives you more control over training periods and detection settings. Although you can let a new monitor train itself, it will perform better if you give it some "known good" time periods during which metric values are all normal (i.e., no cause for an incident/alert). Likewise, if you know the regular patterns of your data, you can adjust the monitor's Detection settings to fine-tune the detection of incidents.

If the metric type supports manual monitor creation, you'll see a link on the Add monitor modal:

517
  1. In the Add monitor modal, select manually create a monitor. The new monitor's configuration screen opens and displays the Define tab.
1393
  1. Adjust any settings as follows:
    • Monitor name can be changed as needed.
    • Metrics is set to the metric you selected in Explorer in order to add this monitor, so you probably don't want to change it.
    • Symptom to detect determines what behavior the monitor looks for in the metric. Keep in mind that some symptoms don't handle metric values of zero, per the following note.

❗️

Symptom configurations that expect a trend in the metric values may fail training, if metric values are frequently zero. This training failure can happen in specific slices if the metric is sliced.

  • Two symptoms—Sharp change and Slow burn trend change— use the ratio of current metric values to older metric values, and are only suitable for metrics with a clear trend and non-zero values.
  • The Value outside expectations symptom has a configuration option to Account for trend that uses the same ratio. If you select this symptom, make sure your metric has a trend and non-zero values.
  • The Manual threshold symptom has two types: Value (suitable for any metric) and Percentage change (suitable for metrics with a trend and non-zero values).
  1. Once everything looks right, select Next at the bottom left to proceed to the Train tab, where you can add training periods and adjust the monitor's detection settings.

Train a monitor

The Train tab lets you improve the monitor's performance by leveraging your domain knowledge of your operation to give the monitor a head start:

  1. Add training periods of "known good" data.
  2. Adjust detection settings that affect when a monitor determines an incident has occurred and has ended.
  3. Specify whether the monitor should learn from the rejection of incidents.
1442

Add training periods

🚧

Before you begin

  • Because the modal for adding training periods doesn't provide a view of the metric's history, you should review the metric's history beforehand: make note of one or more date ranges that have all "normal" values and which when considered together capture any expected variations. You'll use these date ranges to train the monitor.
  1. On the monitor configuration screen, on the Train tab, under Training select +.
  2. In the modal that opens, specify a start and end for the training period, then select OK.
  3. Repeat to add as many ranges as you need to make sure the monitor trains using the whole range of normal behavior.

Adjust Detection settings

  1. While manually creating a new monitor, on the Train tab, under Detection settings, select the pencil icon. The Detection Setting modal opens.
514
  1. Set any values as follows:
    • Drift Duration - The amount of time that a detected anomaly must last before a monitor will log an incident. Specify an amount and a time unit, e.g., 10 seconds.
    • Recovery Duration - After an incident starts, the amount of time to wait after a metric stops exhibiting a symptom before ending the incident. If the metric begins exhibiting the symptom again before this period has passed, the behavior is logged as part of the same incident.
    • Aggressiveness - A value from 1 to 10 that determines how sensitive the monitor is.
    • Set direction for drift detection - Select either arrow to indicate which direction to watch for drift.
  2. Select OK when you're done.

Save and train

After you add training periods and adjust detection settings, make sure the Incident learning toggle is set the way you want, then at the bottom left select Save and train. Your monitor will take a little while to train, and will update the metric chart in Explorer accordingly.

The Monitors list

With a workspace selected, select Monitors on the top bar to open the Monitors list: an easy way to work with all the monitors in the workspace.

1480
  • Assess monitors at a glance— tiles across the top display summary information about monitors in the workspace:
    • Total (count of monitors)
    • Live (monitors currently active)
    • Paused (monitors currently paused for any reason)
    • Training (monitors currently training)
    • Error (monitors with training failures)
  • Just above the list items, controls let you work with the list:
    • Search for a monitor.
    • Add Monitor +.
    • Show x sets how many rows to display per page (in general, fewer is faster).
    • Select the gear icon to choose which columns to display.
    • Navigation arrows and page numbers let you jump around in the list.
  • In the list header, you have more options for working with list items:
    • Select the check box to select all items on the page.
    • Select a column heading to sort the list using the column's values in ascending order. Select it again to reverse the sort order.
  • In the Name column, select a value to open the context menu for the monitor.
  • In the Metric column, select a value to open that metric in Explorer.
  • In the Status column, hover over a value to see a popup with details, or select a value to see a menu of possible actions.
  • In the Alerting column, select a value to open a menu of alerting options.
  • In the Incidents (7D) column, select a value to view the Incidents Report.
1480