Guide to Transferring Power BI Tables to Fabric Data Warehouse

Have you ever had data in a Power BI file locally or in the service that you wish could be exported easily?

Microsoft have released a Python Library called Semantic-Link that allows us to access Power BI Semantic Models in Fabric Notebooks. With this Library we can easily extract data from Power BI!

The steps in this guide are:

  1. Upload a Power BI file to the Workspace
  2. Install and try the Semantic-Link library using a Spark Notebook
  3. Load a Power BI table to a DataFrame
  4. Clean the column names
  5. Store the DataFrame in a Lakehouse

Note: In this guide I will use the term “Semantic Model” for the Power BI model since in Fabric they are the same. To avoid confusion: Dataframe is a table of data in the Spark cluster that run our Notebook.

Prerequisites

For this lab you need a Fabric Capacity. If you do not yet have one you can sign up for a free trial.

You also need a Power BI file to upload to the service. Microsoft has a Regional Sales Sample that is perfect for this. The Download link can be found on this page (you will want to download the *.pbix file).

Step 1 – Upload a Power BI file to the Workspace

Sign in to Fabric and open your Workspace. To add a Power BI report press Upload and select the Power BI file you wish to add to the workspace.

After the file is uploaded Fabric creates two things for us: a Power BI Report that contains the user interface and a Semantic Model that contains the data.

Step 2 – Install and try the Semantic-Link library using a Spark Notebook

Go to “Data Engineering” mode in Fabric (bottom left button) and create a new Notebook. Make sure you are using the default Language “PySpark (Python)”.

Before we can begin we need to install the Semantic-Link Python library that contains functions we can use to access the Semantic Model. Things we are able to access are: tables, model relationships, measures and much more!

In the first cell run the code:

#Install semantic link
%pip install semantic-link

pip install is a command used to download and install packages from the Python Package Index (PyPI). In this case we are using a package from Microsoft but note that using packages can be a security risk.

In the next cell add the following code to list the available Datasets (Semantic Models):

import sempy.fabric as fabric

workspace = 'Anvil'
df_datasets = fabric.list_datasets(workspace)
display(df_datasets)

After running the cell you should see all available Datasets in the workspace.

With the following code we can list all the tables in a Semantic Model:

mydataset = 'Regional Sales Sample'
#workspace defined in previous cell!

tables = fabric.list_tables(mydataset, workspace= workspace) 
display(tables)

Here are the result we expect from running the two code-cells above:

Step 3 – Load a Power BI table to a DataFrame

The tables we listed above are part of a Power BI report and just like in Power BI desktop or DAX studio we can run DAX queries against this model in Fabric using the evaluate_dax() function from the Fabric package.

One of the most basic DAX statements we can use is EVALUATE(‘TableName’). This function only returns the entire table. We could use other DAX functions here such as FILTER or GROUP but let’s just get the complete ‘Accounts’ table from our Semantic Model:

pdf = fabric.evaluate_dax(
    mydataset,
    """
    evaluate('Accounts')
    """
    ,workspace= workspace,
)

# displaying the DataFrame
df = spark.createDataFrame(pdf)
display(df.limit(10))

The Semantic-Link library returns a Pandas DataFrame that is similar to a spark DataFrame but we will use the internal spark.createDataFrame() command to transform the DataFrame to Spark.

We could continue in Spark and transform and validate the data if we wish.

Step 4 – Clean the column names

The eagle-eyed developer may have noticed that the column names in our DataFrame contains characters that is not allowed in a Fabric Delta table. Before we can store the DataFrame into our Lakehouse we must clean the column names.

If we regularly work with tables from Semantic models a Python function to clean the column headers would be nice to have. But here we just remove the illegal bracket characters “[]” in the column names:

# Example renaming logic
new_column_names = [col.replace(" ", "").replace("[", "").replace("]", "") for col in df.columns]

# Applying the new names to the DataFrame
df = df.toDF(*new_column_names)

# Now df_renamed will have the updated column names.
# You can show the updated DataFrame to verify the changes
display(df.limit(10))

Note that if you take another table that contains dates or timestamps you might get a type mismatch between the date format used in Power BI and the Delta format used by the Lakehouse. To keep this guide short I will not delve into the topic of date and time formats.

Step 5 – Store the DataFrame in a Lakehouse

Before storing the DataFrame to a Delta Table make sure you have created or attached a Lakehouse to the Notebook. to attach a Lakehouse open the Explorer side bar and press “+ Data sources” and create or add a Lakehouse.

The following code stores our DataFrame in a table named accounts:

# Specify the path where the Delta table will be stored
delta_table_path = "Tables/accounts"

# Write the DataFrame to a Delta table
df_renamed.write.format("delta").mode("overwrite").save(delta_table_path)

After creating a new table you need to click the three dots next to “Tables” in your Lakehouse and choose refresh to see the table.

Conclusion

Using a Notebook in Fabric we are able to query and manipulate a Semantic Model using the Semantic-Link library. Once we have the data in a Spark DataFrame it is easy to store and reuse the data.

Possible use cases are: reusing tables created in Dataflows or Power BI, data validation in Python, dumping data from *.pbix files and much more.

Leave a comment