Analyze an incident
You can analyze an incident to understand the data quality issue behind it. Your analysis can (and often will) include other incidents that are related in some way— through time correlation, for example, or through sharing a metric. Lightup identifies incidents that are time-correlated, and you can include these and any other incidents of interest in your analysis.
Add metrics of interest
On the collapsible left panel, there are two groups of metrics. (This panel is closed when you first open the incident.)
- Metrics of interest are metrics that you have chosen to compare with the incident. The charts for these metrics appear below the incident chart. You can select a listed metric of interest and open it in Explorer, hide its chart in the incident details, or remove it from Metrics of interest.
- Time-correlated - These are metrics that have incidents that occurred around the same time as the one you're analyzing. You can select a time-correlated metric and then open it in Explorer or add it to Metrics of interest.
At the top of the panel, use the search box to find items that are already listed in the panel, or select Actions > Find a metric to search for metrics that might be related in some way besides overlapping in time.
Find a metric
- At the top of the left panel, select Actions > Find a metric.
- In the Find any metric pane, click Search metric name and then enter a search term. Matching items appear as you type.
- When you find a metric you want to include, click Add. The chart for the selected metric appears on the Charts tab, below the incident chart.
Information about the incident is on four tabs:
- The Chart tab shows the incident chart at the top, and any metrics of interest you add.
- The Summary tab is a repository of information about the incident, with links to related components.
- The Failing Records tab shows the result of the failing records query associated with the metric. If no failing records query is present, the tab does not appear.
- The Activity tab lists the actions that have been taken that affect the incident.
On the Chart tab, you can set a timeframe to include more data points in the displayed charts. The first chart is the incident chart. Below the incident chart are the charts for any metrics of interest you add. Within a chart, you can turn on threshold data to help you visualize how the metric deviated.
Set a timeframe
On the right just above the charts, click the dates to change them.
- If you change one date, you need to confirm the other date too (i.e., click OK twice).
Turn on threshold data
You can view threshold data for the incident chart and for the metrics of interest charts you add. When you add the chart for a metric of interest that has multiple monitors, you must choose a monitor to turn on threshold data.
One monitor: In the chart where you want to see thresholds, on the Actions menu select Turn On Threshold Data.
Multiple monitors: In the chart, on the Monitors menu select a monitor, and then select View This Monitor. Threshold data will display on the chart when you choose a monitor.
Hide or remove a Metric of Interest
If you no longer want to see the chart for a metric of interest on the Chart tab, you can hide it (if you plan to display the chart again); or, you can remove the metric from the Metrics of Interest list (if you no longer want the chart to be available on the Chart tab).
- Hide: On the chart's Actions menu, select Hide.
- Remove: In the left panel, select the metric. Then on the panel's Actions menu select Remove From Metrics Of Interest.
The Summary tab is organized into four columns: Incident Info, Metric Info, Monitor Info, and Data Asset Info. Select the Metric Name to open the metric, or the Monitor Name to open the monitor.
A failing records query is a query that selects the rows of a data asset that were part of the detected data quality failure. If the incident's metric has failing records query, you can review it on the Failing Records tab. Null percent and Conformity metrics have default failing records queries, and you can create your own for any metric.
Review the failing records query
- On the incident tabs, select Failings Records.
- At the top of the Failing Records tab the SQL for the query appears in a text box. To the upper right, click Download to download the results.
You can also see the rows returned, just below the SQL box.
Any activity taken on the incident is listed on the Activity tab. This helps you coordinate your analysis.
The following activities are listed:
- Someone adds a comment
- Someone submits a ticket (e.g., creates a bug in Jira to track the incident)
Validate an incident fix
After you review an incident and take actions to correct the data quality issue, you can validate that your fix addresses the incident— that the data quality issue is resolved.
Validation is not supported for real-time metrics—metrics where the x-axis of the metric chart shows when the value was calculated. Currently, this includes Activity metrics for schemas and tables, and Data delay metrics.
- On the incident chart, select Run validation. Lightup begins checking to see whether the data quality incident is fixed, and the button text changes to Running validation....
- To cancel, select ... → Cancel validation.
After validation finishes running, the button text changes again to reflect the outcome:
- Resolved if the issue is fixed
- Unresolved if the issue is still present
If validation indicates the incident is resolved, consider changing the incident status to Closed.
Updated about 1 month ago