Databricks

Prepare in Databricks

To create a Databricks connection, you'll need an access token and the server hostname and path.

Step 1: Get a Databricks personal access token

  1. Generate a personal access token in Databricks.
  2. Copy the generated token and store in a secure location.

Step 2 - Get your Server Hostname and HTTP Path

Do one of the following:

  • If you're using a SQL endpoint:

    1. Click SQL Warehouses in the left nav.
    2. Choose an endpoint to connect to.
    3. Navigate to the Connection Details tab.
    4. Copy the Server Hostname and the HTTP Path.\
  • If you're using a compute cluster:

    1. Click Compute in the left nav.
    2. Choose a cluster to connect to.
    3. Navigate to Advanced Options.
    4. Click on the JDBC/ODBC tab.
    5. Copy the Server Hostname and the HTTP Path.

Connect to a Databricks datasource

  1. In the left pane, open a workspace menu and select Datasources.
  2. In the main page select Create Datasource +.
  1. Enter a Datasource Name, then for Connector Type select Databricks.
  2. Under Configure connector, provide the following inputs:
  • Workspace URL - The Server Hostname of the compute cluster
  • HTTP Path - The HTTP path for the compute cluster or the SQL Warehouse
  • Token - The Databricks personal access token
  • Catalog (Optional) - If you want to specify a Databricks catalog, enter the catalog name here. You may leave this blank, in which case Lightup will connect to the default Databricks catalog, hive_metastore. Note that each datasource connection can only include one catalog, and you must know the name of a catalog in order to add it.
  1. After entering the required settings and any optional settings that apply, below the Configure connector section select Test Connection.
  2. After a successful connection test, select Save.
  3. Your new datasource appears in the list of available datasources. By default, these are listed in alphabetical order, so you might have to scroll or change the sort order to see your new datasource.

Advanced

  • Schema scan frequency - Set how often scans run for the datasource: Hourly, Daily, or Weekly.

Query Governance

Databricks datasources support the Query timeout, Query date range limit, Scheduling, Enable data storage, Maximum backfill duration, and Maximum distinct values settings. For steps, see Set query governance settings for a datasource.

Date/time data types

These Databricks date/time data types are supported:

Object types

These Databricks object types are supported:

  • Tables
  • Views

Partitions

Databricks datasources support partitions, multi-partitions, and partition time zones.

Databricks Partner Connect datasources

The Databricks Partner Connect free trial uses Unity Catalog support to streamline the creation of datasources for multiple catalogs. These datasources remain in the trial workspace after the trial ends, but you can recreate them in other workspaces by reusing the configuration settings for each datasource.

Supported Versions

Lightup currently supports:

  • Databricks Hive Metastore via SQL Endpoint
  • Databricks Unity Catalog
  • Databricks Photon