Programmatically Creating, Managing Lakehouses in Fabric

Using mssparkutils to create, manage, update lakehouses in Microsoft Fabric

At MS Ignite, Microsoft unveiled a variety of new APIs designed for working with Fabric items, such as workspaces, Spark jobs, lakehouses, warehouses, ML items, and more. You can find detailed information about these APIs here. These APIs will be critical in the automation and CI/CD of Fabric workloads.

With the release of these APIs, a new method has been added to the mssparkutils library to simplify working with lakehouses. In this blog, I will explore the available options and provide examples. Please note that at the time of writing this blog, the information has not been published on the official documentation page, so keep an eye on the documentation for changes.

You can use the lakehouse REST API but I find it easier to use mssparkutils.

Lakehouse Method

To check all the available methods, use .help()

from notebookutils import mssparkutils
mssparkutils.lakehouse.help()

There are five methods create, get, update, delete and list. You can use mssparkutils.lakehouse.help("methodName") to get the description and the arguments.

create()

Use this to create a lakehouse in any of the Fabric workspaces.

create(name: String, description: String = "", workspaceId: String = "")

I created a test lakehouse as below. It created the lakehouse, SQL endpoint and a default semantic model. Note that it takes ~5-10 s for the SQL endpoint/lakehouse to get created so wait for ~10s before adding new items to this newly created lakehouse. Lakehouse names cannot have spaces or special characters like @,#,$,%. Underscores are allowed.

💡
workspaceID is optional. If no workspaceID is provided, the workspace hosting the notebook is used. To get the id of the workspace, use spark.conf.get("trident.workspace.id")

get()

get() returns the details of the lakehouse. You can specify either the lakehouse name or the id. As mentioned above, excluding the workspace id will return the lakehouse of that name from the attached workspace.

update()

update() as the name suggests can be used to change the name or the description of the specified lakehouse. update does not change the lakehouse id, only the name and the description. newName is a required parameter.

list()

Use this to get a list of all the lakehouses in the specified workspace which is great.

Below I used the list method to get a list of all the lakehouses in all the workspaces the user has access to. Note that since these methods are calling APIs, the throttling limits are applicable. In the below code, I have added a 10s wait to overcome that, adjust as needed.

import requests
import pandas as pd
import time
from notebookutils.mssparkutils.credentials import getToken
from notebookutils.mssparkutils.lakehouse import list as list_lakehouse

def get_lakehouse_list():
    """
    Sandeep Pawar   |    faric.guru

    Retrieves a list of all lakehouses in all workspaces accessible to the user.
    Only Fabric and Premium workspaces are used.
    Refer https://learn.microsoft.com/en-us/rest/api/fabric/articles/throttling for throttling limits

    Returns a pandas datafrema
    """
    base_url = "https://api.powerbi.com/v1.0/myorg"
    token = getToken("https://analysis.windows.net/powerbi/api")
    headers = {"Authorization": f"Bearer {token}"}

    response = requests.get(f"{base_url}/groups", headers=headers)
    workspaces_data = response.json().get('value', [])

    #Premium/Fabric only
    premium_workspaces = pd.DataFrame(workspaces_data).query('isOnDedicatedCapacity == True')[["name", "id"]]

    lakehouses = []
    for ws_id in premium_workspaces['id'].unique():
        lakehouses.append(pd.DataFrame(list_lakehouse(workspaceId=ws_id)))
        time.sleep(10)  # Throttling limit for compliance

    return pd.concat(lakehouses, ignore_index=True)

# Call the function
lakehouse_data = get_lakehouse_list()
lakehouse_data

Using the Fabric REST API

Alternatively, you can use the Fabric REST API to get a list of all the lakehouses as below:

import requests
import pandas as pd
from requests.exceptions import HTTPError

def get_lakehouse_list_api():

    '''
    Sandeep Pawar  |   fabric.guru

    This function uses the Fabric REST API to get all the lakehouses in the tenant that the user has access to.

    '''
    base_url = "https://api.fabric.microsoft.com/v1/admin/items?type=Lakehouse"
    token = mssparkutils.credentials.getToken("https://api.fabric.microsoft.com/")
    headers = {"Authorization": f"Bearer {token}"}

    try:
        response = requests.get(base_url, headers=headers)
        response.raise_for_status()

        data = response.json()

        # Check if 'itemEntities' exists, may not exist for all items types
        if 'itemEntities' not in data:
            raise KeyError("'itemEntities' key not found in the response data")

        lakehouses = pd.json_normalize(data['itemEntities'], sep='_')
        return lakehouses

    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"An error occurred: {err}")

get_lakehouse_list_api()

REST API returns a few additional details not provided by mssparkutils.

You may also find my other blog on mounting a lakehouse helpful. As a side note, to get the lakehouse attached to a notebook use:

spark.conf.get("trident.lakehouse.id")
spark.conf.get("trident.lakehouse.name")

Update Jan 12, 2024

With Semantic-Link v 0.5.0, you can use the create_lakehouse method to a create lakehouse in any workspace. By default, it will be created in the workspace attached to the notebook.

#install semantic link >= v0.5.0
!pip install --upgrade semantic-link --q

import sempy.fabric as fabric
from sempy.fabric.exceptions import FabricHTTPException

lh_name = "Sales_Lakehouse"
ws_name = None #if None use the default workspace else use the specified
try:
    lh = fabric.create_lakehouse(display_name=lh_name, workspace=ws_name)
    print(f"Lakehouse {lh_name} with ID {lh} successfully created")

except FabricHTTPException as exc:
    print(f"An error occurred: {exc}")

Did you find this article valuable?

Support Sandeep Pawar by becoming a sponsor. Any amount is appreciated!