Thursday, June 5, 2025

How to load data from a Data Warehouse in Azure Databrics Catalog using Power BI

How to load data from a Data Warehouse in Azure Databrics Catalog with Native SQL Query using Power BI
Unity Catalog:
It is a data governance solution in Databricks that provides centralized access control, auditing, lineage tracking, and data discovery across multiple workspaces. It ensures secure and organized data management.

Catalog:
It is a top-level container within Unity Catalog that holds multiple Databases (Schemas). It helps in data isolation and logical organization.
Example: A SalesData catalog may contain databases like Retail, Wholesale, E-commerce.

Databricks primarily operates on a Lakehouse architecture, which combines elements of data lakes and data warehouses. While Databricks SQL Warehouses are used for querying and analytics, Databricks also supports Databases (Schemas) within Unity Catalog for structured data organization.

SQL Warehouses: 
Compute resources optimized for running SQL queries on structured data. Supports BI tools, Delta Lake integration, and fast query execution.

Databases (Schemas): 
The Database (Schema) exists inside a Catalog and is used for logical grouping of related data. It contains tables, views, functions, and models.
Example: Inside the Retail database, you might store tables like Customers, Orders, and Products.

Notes:
When connecting to Databricks Unity Catalog, Power Query may flatten the hierarchy and display the Catalog as a Database.
This happens because traditional databases (like SQL Server) don’t have a Catalog layer, so Power Query maps the Catalog to a Database for compatibility.

Since Databricks Unity Catalog introduces a three-tier hierarchy (Catalog → Database → Table/View), Power Query may simplify this by treating the Catalog as a Database.
Databricks does not function like a traditional database system (such as SQL Server or PostgreSQL), but it does provide database-like structures within Unity Catalog for managing data efficiently.

--------------------------------------------------------------------------------------------------
Scenario:
To connect and load data from a Data Warehouse in Azure Databrics Catalog, we need the following details like, Server Hostname, HTTP Path, which are mandatory.

I have defined the Parameters in the Power Query to hold the Server Hostname, HTTP Path, Catalog/DB Name, and Schema name as shown below samples (Dummy values).

We will use these parameters in the following examples. Please make sure to pass your own correct values for these parameters while you are testing.

p_ServerHostName =  adb-server_host_id.3.azuredatabricks.net
p_http_Path = sql/protocolv1/o/server_host_id/cluster_id
p_Catalog_DB = MyCatlog_db
p_Schema = MySchema

p_Load_By_Year = 2021This Parameter is used to limit the Data Load from the Query. 

Notes:
In the Azure Databrics Workspace, we can find the Server Hostname and HTTP path values from the following ways:
Compute >[Cluster_Name] > Advanced Options > JDBC/ODBC:
Server hostname: adb-server_host_id.3.azuredatabricks.net
HTTP path: sql/protocolv1/o/server_host_id/cluster_id

SQL Warehouses> [MyWarehouse_Name] > Connection Details:
Server hostname: adb-server_host_id.3.azuredatabricks.net 
HTTP path: /sql/1.0/warehouses/warehouse_id

1) Loading a View or Table from the Data Warehouse in Azure Databrics Catalog:
In Power BI, we use default "Azure Datarbics" connector to connect and load a View or Table from a Data Warehouse in Azure Databrics Catalog.

We need to pass only the Server Hostname and HTTP Path as shown below.


After Connecting to the Source, it will open a dialogue box with available Catalogs and their underlying Warehouses/Databases and Objects to select.

The following is the sample Power Query logic of an object (View) loaded, which is updated with Parameters defined above:

let
    Source = Databricks.Catalogs(p_ServerHostName,p_http_Path, [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
    Src_Database = Source{[Name=p_Catalog_DB,Kind="Database"]}[Data],
    Src_Schema = Src_Database{[Name=p_Schema,Kind="Schema"]}[Data],
    Src_Object = Src_Schema{[Name="MyViewName",Kind="View"]}[Data]
in
    Src_Object

Note:
If you want to load Table instead of View, change the 
Kind="View" to Kind="Table"

2) Loading data using a Native SQL Query from the Data Warehouse in Azure Databrics Catalog:
To connect and load data by running a Native SQL Query from the Data Warehouse in Azure Databrics Catalog, we need to pass the Server Hostname, HTTP Path and Catalog Name are mandatory.
Please make sure include the Schema.Table/View Name in the Native SQL Query section.

The following is the sample Power Query logic of data loaded using Native SQL Query, which is updated with Parameters defined above:
let
    vCY = Date.Year(DateTime.LocalNow()),
    vYearsList = " IN (" & Number.ToText(vCY) & "," & Number.ToText(vCY-1) & "," & Number.ToText(vCY-2) & ")", 
    vLoadYears = if p_Load_By_Year is null then vYearsList else ">= " & Number.ToText(p_Load_By_Year),
    vSrc_Qry = "SELECT transact_id, transact_date, field3, field4, field5, fieldN
    FROM " & p_Schema & ".fact_Transactions fact
    Where YEAR(fact.transact_date) " & vLoadYears,
    Source = Value.NativeQuery(Databricks.Catalogs(p_ServerHostName, p_http_Path
    [Catalog=p_Catalog_DB, Database=null, EnableAutomaticProxyDiscovery=null]){[Name=p_Catalog_DB,Kind="Database"]}[Data], 
    vSrc_Qry, null, [EnableFolding=true]),
    ChangType = Table.TransformColumnTypes(Source,{{"transact_date", type date}})
in
    ChangType

Notes:
p_Load_By_Year is a Year Parameter is used to limit the Data Load from the Query. If this Parameter is Blank (), then the Data will be loaded by default for last 3 Years (from Current Year). Otherwise, it loads the Loads the data >= 2021 (Year value from Parameter)


--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. If you wish, please share your genuine Feedback or comments only related to this Blog Posts. It is my humble request that, please do not post any Spam comments or Advertising kind of comments, which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose we have a sample Sales data, which is stored with Monthly Snapsh...

Popular Posts from this Blog