Integrating Python with Power BI

3 min read

Power BI is Microsoft’s proprietary product for performing business intelligence tasks. Since 2018, integrating statistical and general-purpose software like R and Python has been made possible by Power BI.

How does it help you? If you are a business intelligence (BI) professional and want to perform certain data science tasks, you have to rely on the data science team. On the other hand, a Python developer needs to rely on the BI team to showcase their analysis in a presentable format, maybe in a dashboard. However, Power BI gets rid of this co-dependency as you can now run Python within an integrated environment.

It is imperative for the new age data scientist to understand the complete pipeline of solving complex business problems. Generally, this includes gathering, cleaning, exploring, transforming data to come up with predictions for future events. Finally, the analysis is presented in a dashboard or a report. Thus, the traditional roles of a business intelligence professional and predictive modeling expert are getting blurred.

This enhances the need for a comprehensive tool that will perform all the above-mentioned tasks in the analytical pipeline. If this tool comes from the makers of Excel, nothing like it. Thus, Power BI is the new talk of the town for performing embedded analytics.

In this tutorial, we will understand the integrating capabilities of Power BI and Python and go hands-on to perform certain tasks that a standalone system would not achieve easily.

The first step is to get an integrated environment up and running. To do this, you should have a distribution of Python installed on your machine. For this purpose, I prefer the base distribution of Python. For all my coding related tasks, I use Anaconda. Nonetheless, integrating Anaconda with Power BI can be a complicated exercise.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

Post-installation, the integrated environment requires you to install four Python packages. They are Pandas (for data manipulation and analysis), Matplotlib and Seaborn (for plotting), and Numpy (for scientific calculations).

You may use the pip command in your command-line tool to install these packages.

After installing these packages we have to enable Python Scripting in Power BI. You can open Power BI to check whether it automatically detects the Python distribution installed on your machine. Go to Files -> Options and Settings -> Options. Under Python Scripting you should see the Home Directory for Python installed on your machine.

Now, you can run a quick test to check whether Python works within the Power BI stack. To begin with, you can import a small dataset in Power BI using Python script.

For this purpose, go to the Home ribbon, click on Get Data and select Other. This section allows you to import data from a varied list of sources, especially, Web, Hadoop Distributed File System (HDFS), Spark, etc. apart from using scripts like R or Python. Here, we will import the Churn Prediction dataset that is stored on my machine.

Click on Connect. It will open up a section where you can write the following Python script:

Clicking on OK will load the Navigator and it will ask you to select the churn data, then click on Load. You can go to the data view to check whether the data has been loaded. Now, you are ready to use Power Query to perform one-click data transformations.

Continue Reading

Enjoyed this summary? Read the complete article at the source:

Continue at analyticsvidhya.com →

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.