Enhancing Data Analysis: integrating Python into Tableau

Tools used in this project:
Tableau Python
Difficulty level:
Advanced

In the era of Big Data, analysing and visualising complex data is essential for any Digital Data Analyst. Tableau has an intuitive data visualization interface but is limited in its native statistical modelling capabilities.

Use Python with Tableau

Fortunately, Tableau allows users to overcome these limitations through external connections, particularly with Python, a programming language mainly used in data science.

Introduction

This article will explore how Python can extend Tableau’s capabilities, allowing you to perform more detailed and customized analysis. We will discuss how to set up your system by installing Python and Anaconda, and I will walk you through the steps to establish an external connection with Python in Tableau. This approach amplifies Tableau’s potential with Python’s robust libraries and opens the door to infinite possibilities in advanced data analysis.

To make this information accessible to beginners and advanced users, this article will be a comprehensive guide to successfully integrating Python into your data analysis routine with Tableau. By following the detailed tips and procedures we are about to discuss, you can turn your data visualizations into powerful insight and decision-making tools.

Next step

What is Python?

Python is a high-level, interpreted, object-oriented programming language that has gained immense popularity in software development because of its simple and readable syntax. Initially created by Guido van Rossum in 1991, Python has become an indispensable tool in numerous technology areas, from web programming to machine learning.

Wide Application in Data Science

In the field of data science, Python is particularly valuable for several reasons:

  • Powerful libraries: Python offers a wide range of specialized libraries such as NumPy for numerical computation, Pandas for data manipulation, and Matplotlib for data visualization. These libraries facilitate the analysis and manipulation of extensive data sets with less code and more efficiency.
  • Flexibility and Scalability: Whether you are working on a small data analysis project or developing complex machine learning systems, Python scales effectively to fit different needs.
  • Community and support: Python is one of the most popular programming languages, with a large community of developers. This translates into excellent peer-to-peer (P2P) support, technical resources, and constant updates to its libraries and features.

Use in general programming

In addition to data science, Python has applications in many other fields:

  • Web development: frameworks such as Django and Flask enable developers to build robust and scalable web applications.
  • Automation: Python is frequently used to write scripts that automate daily tasks and system operations, making processes more efficient and less prone to human error.
  • Artificial Intelligence: Python is the lingua franca in AI, with libraries such as TensorFlow and Keras facilitating the construction and training of advanced machine learning models.

Python’s simplicity, combined with its powerful suite of tools and libraries, makes it the language of choice for professionals who want to analyze, visualize, and interpret data to turn complex information into actionable insights.

Next step

What is Tableau?

Tableau is a leading data visualization software that transforms raw data into intuitive and easily interpreted visual formats. Its user-friendly approach and ability to handle large volumes of data make it a tool of choice for data analysts, digital analysts, business intelligence professionals and decision-makers in various industries.

Data visualization capabilities

Tableau is distinguished by its powerful visualization capabilities that enable users to:

  • Create interactive dashboards: users can combine different visualizations into an interactive dashboard, making data more accessible and understandable for all stakeholders.
  • Visual Exploration of Data: Through drag-and-drop and zoom capabilities, users can explore data more dynamically, uncovering patterns and correlations that may not be apparent in traditional reports.
  • Customizable Visualizations: Tableau offers a wide range of chart types and visualizations, from geographic maps to complex bar charts, allowing detailed customizations to fit the user’s specific needs.

Ease of use and accessibility

One of Tableau’s main strengths is its intuitive interface:

  • Ease of use: even without deep technical knowledge, users can create meaningful visualizations through a drag-and-drop interface.
  • Connectivity to data: Tableau connects easily to almost any data source, from Excel files to large SQL databases, cloud data such as Google BigQuery, or real-time data.

Organizational Impact

Using Tableau in an organization can significantly improve data-driven decision-making. It makes analytics accessible to a broader audience and enables companies to respond more quickly to emerging trends and market dynamics.

In summary, Tableau facilitates data visualisation and enriches strategic decision-making by transforming data into visual insights that can drive innovation and business success.

Next step

Why integrate Python with Tableau?

The integration of Python into Tableau represents a significant evolution in data analysis. It combines the power of programming with the ease of visualization. This combination offers several advantages that overcome the limitations of Tableau’s native functions, thereby expanding the user’s analytical capabilities and flexibility.

Overcoming native limitations

While Tableau is excellent for basic visualizations and analysis, some analysis scenarios require more sophisticated computational capabilities, such as:

  • Advanced Statistical Modeling: Python supports advanced statistical analysis and machine learning techniques beyond the standard Tableau capabilities.
  • Manipulation of complex data: Python allows for more complicated and detailed data manipulation using libraries such as Pandas, which can easily handle operations on large data sets and cleaning and preparation.

Automation and efficiency

Python can automate many processes within Tableau, improving efficiency and reducing the time needed for analysis:

  • Workflow Automation: Python scripts can automate repetitive workflows in Tableau, such as data updates and transformations, allowing analysts to focus on more strategic tasks.
  • Customizing calculations and functions: Python allows you to write custom functions that can be executed directly within Tableau dashboards.

Extension of Analytical Skills

By integrating Python, Tableau users can take advantage of the wide range of Python libraries and modules to extend their analysis:

  • Integration of Machine Learning libraries: use libraries such as scikit-learn to implement predictive models directly within Tableau.
  • Text Analysis and NLP: Natural Language Processing (NLP) techniques are applied via Python to analyze text data directly in Tableau.

Advanced interactivity

The use of Python within Tableau can also increase the interactivity of visualizations:

  • Dynamic scripts: Python scripts can be executed in response to user interactions with the dashboard, allowing dynamic and custom displays based on real-time input.

Scalability and community

The global community of Python developers provides a constant stream of new tools and libraries that can be integrated into Tableau, ensuring that solutions remain state-of-the-art and easily scalable to adapt to new analytical challenges.

In conclusion, Python’s integration with Tableau not only overcomes the limitations of the software’s native capabilities but also opens new doors for analytical innovation, making deeper data insights and more customized analyses possible.

Next step

Prerequisites

A well-configured working environment is essential for integrating Python with Tableau. This section outlines the prerequisites needed to establish a compelling connection between Python and Tableau, ensuring that you can fully take advantage of both tools’ capabilities.

Python

  • Version: make sure you have Python 3.x installed, as it is the latest version and supported by many libraries for data analysis.
  • Installation: Python can be downloaded and installed directly from the official website. During installation, selecting the option to add Python to the operating system PATH is essential. This makes it easier to run Python scripts from any command prompt.

Tableau Desktop

  • Version: This feature requires a version of Tableau that supports external connections, such as Tableau Desktop. Check that your license and version of Tableau are up to date.
  • Installation: Tableau Desktop can be purchased and downloaded from the official website.

Anaconda

  • Utility: Anaconda is a Python distribution that simplifies package management and environment. It is beneficial for data science and statistical analysis.
  • Installation: Download and install Anaconda from the official website. This will install Python and Anaconda and pre-configure many valuable packages for data analysis.
Anaconda Navigator

TabPy (Tableau Python Server)

  • Function: TabPy is a Python server that allows Python scripts to run directly within Tableau, facilitating the integration of Python’s parsing capabilities into Tableau visualizations.
  • Installation: TabPy can be installed via pip (the Python package manager) with the command `pip install tabpy-server`. More details can be found in the Official TabPy documentation.

Network connectivity

  • Configuration: Ensure your computer is configured to connect Tableau and the Python server (TabPy). This may require configuring the firewall or other security settings to allow communication between the two programs.

Basic knowledge

  • Python and Tableau: It is helpful to have a basic knowledge of Python and Tableau. A specific understanding of Python libraries for data analysis and familiarity with the Tableau user interface can significantly help.

These prerequisites are essential to fully utilise Python’s integration into Tableau, significantly improving your analytical and visual skills.

Next step

Creating an Environment in Anaconda

Configuring an Anaconda environment specifically for use with Tableau is critical in ensuring that your data analysis sessions are efficient and separate from other Python projects. This helps to keep dependencies organized and avoid conflicts between packages. Here is how to create and configure an Anaconda environment with Tableau.

Step 1: Open Anaconda Navigator or command prompt.

  • Anaconda Navigator: You can open Anaconda Navigator from the Windows Start menu or the Launcher in MacOS.

Step 2: Create a new environment

  • Via Anaconda Navigator:
    • In the left sidebar, click on ‘Environments‘.
Anaconda Navigator - Environments
  • Click on ‘Create‘.
Anaconda Navigator - Environments - Create

  • In the dialogue box that appears, enter a name for your environment, for example, `tableau_python`.
  • Choose ‘Python’ as the package to install, and select the version of Python you wish to use, preferably one compatible with Tableau and your libraries.
  • Click on ‘Create‘ to initiate the creation of the environment.
Anaconda Navigator - Environments - Create - Settings

Step 3: Activating the environment

  • Anaconda Navigator:
    • Choose ‘Home’ from the left side menu.
    • Select the new `tableau_python` environment from the ‘All applications on’ drop-down list.
    • The environment will now be active, and you can install additional packages from Navigator.
Anaconda Navigator - Home - tableau_python

Step 4: Installing the necessary packages

  • Installation of TabPy and other packages:
    • While your environment is active, install TabPy and other valuable packages for data analysis with Tableau. Using the terminal, install using the following commands:
      • python -m pip install --upgrade pip
      • pip install tabpy
Anaconda Navigator - Environments - Open Terminal

  • This takes a few minutes because you install TabPy and all its dependencies with other packages. When finished, you will receive a message that all packages have been successfully installed, and the command prompt will reappear. Now, you are ready to start the local server and allow it to open a connection to your computer. To start the Python server for Tableau, enter the following command and press Enter:
    • tabpy
  • A warning will indicate that you are enabling the TabPy server without correctly configuring authentication. It will ask if you want to proceed (y/N), as shown in the figure below. If you wish, you can configure a username and password. This procedure is beyond the scope of this article, but if you want to enable it, you will need detailed documentation about it.
Anaconda Navigator - TabPy settings

  • Enter “y” and click the Enter button on your keyboard. This will activate TabPy, and you will see some information about the server appear. Most important is the port on which the Web service is listening. By default, it is the port 9004. This is an important fact to remember when you switch to Tableau.
  • With the server running in the command terminal, open Tableau and establish a connection to the external resource. To do so, click on “Help” in the top navigation menu, hover your mouse over “Settings and Performance,” and then click on “Manage Analytics Extension Connection,” as shown in the following figure.
Tableau - Connecting to external analytics extension

  • You will see a window appear asking you to select a connection type. Choose “TabPy“.
Tableau - TabPy connection

  • On the next screen, enter”localhost” as “Hostname” and 9004 as “Port” (or whatever other port you have configured), as shown below; then click on”Test Connection.”
Tableau - TabPy settings
  • If the information is correct and the server is running, you will see a message from the command prompt saying that Tableau Desktop has successfully connected to the extension.

Step 5: Verification of the environment

If the connection is successful, click “Save” to close the menu. Now, you can write Python scripts within Tableau Desktop. These scripts will run in your created environment and then be returned to Tableau Desktop as new calculated fields. As a test, write a straightforward equation that takes the sales values from the data from the Superstore dataset embedded in Tableau Desktop and multiplies them by 5 in Python. Python will return the result as a calculated field that you can use in Tableau Desktop. To begin, create a new calculated field, “Python script example”, and enter the following calculation:

SCRIPT_INT("return [int(x * 5) for x in _arg1]",SUM([Sales]))

Now, create a small cross table that you can check to ensure the values are correct. Add “Sub-Category” to “Rows” and then drag “Sales” and “Python script example” to the “Text” property in the “Marks” box with the mouse. You should see results similar to those in the following figure:

Tableau - Python script example

This is a straightforward but helpful example of ensuring everything is working correctly.

Once you have finished working on Tableau, it is always important to disconnect the connection to the server from both Tableau Desktop and the terminal. Return to the “Manage Analytics Extension Connection” menu in Tableau Desktop and select “Disconnect.”

Tableau - Disconnect TabPy

Then go back to the terminal screen and press Ctrl + C

Terminal - Shutting down TabPy

You’ll see a message that says, “Shutting down TabPy…” your command line will reappear.

Next step

If you use Tableau Cloud or Tableau Server

If you use Tableau Cloud or Tableau Server, you must deploy a TabPy server on a cloud or web hosting platform. You can do this using Docker or directly deploying the server on a service such as Heroku.

To run the remote server on Heroku:

You can access your Heroku account via a browser, or if you don’t have one, you can sign up for free.
Go to the TabPy’s GitHub repository and click “Deploy to Heroku” in the section “README.”

Deploying TabPy to Heroku

  • Follow the instructions. Type in the server’s name and select its geographic location (it is essential to choose “Europe” for those who reside in Europe and must comply with GDPR rules).
  • Set username and password.

Once the server is activated, you can connect it to Tableau Cloud/Server using the URL and port number.

Next step

Example: Using Python with Tableau calculations

For this example, we will use a Kaggle data set related to a store’s sales.

Store sales dataset from Kaggle

You will use Python to calculate the Pearson correlation between sales and profits using np.corrcoef by NumPy, which returns a matrix of correlation coefficients. In Tableau, this calculation assesses the strength of the relationship between two variables directly in visualizations, helping to understand better how changes in one variable can be associated with changes in the other.

Before going any further, however, let us clarify what Pearson’s Correlation Coefficient is.

Pearson’s correlation coefficient, often denoted “r,” varies between -1 and +1. A value of +1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 means no linear correlation between the two variables. Therefore:

  • +1: A correlation of +1 indicates that an increase in one variable is always associated with a proportional increase in the other.
  • -1: A correlation of -1 indicates that an increase in one variable is always associated with a proportional decrease in the other.
  • 0: A correlation of 0 indicates no linear relationship between the two variables.
  • Intermediate Values: Values between -1 and +1 indicate the degree of linear relationship between the variables. The closer the value is to the extremes (-1 or +1), the stronger the correlation.

Pearson’s correlation coefficient is widely used in many fields, such as economics, biology, social sciences, marketing, and others, to:

  • Determine the strength of a potential relationship between two variables before conducting further, more complex analyses.
  • Help with variable selection in linear regression models.

Using this coefficient in analytical contexts such as Tableau enriches data analysis by allowing decisions based on specific quantitative insights regarding interdependencies between variables.


We can continue with the example now that we have clarified what Pearson’s Correlation Coefficient is.

To run a Python script in a Tableau calculated field, we need one of these script functions based on the output:

  • SCRIPT_BOOL
  • SCRIPT_REAL
  • SCRIPT_INT
  • SCRIPT_STR

For example, if our function returns boolean values, we must use the SCRIPT_BOOL function. Remember that you can always get integer values and convert them to other types using native functions.

For our calculation, we will use SCRIPT_REAL, which requires two parts: the Python script in quotes and the aggregate arguments.
We will use Python’s Numpy to calculate a correlation between Sales and Profits.
Since we cannot add the arguments directly, we will use placeholders such as “_arg1” and “_arg2” instead. For example, SUM([Profit]) is the second in the order and is linked to “_arg2.” Then, we extract a correlation coefficient from the matrix np.corrcoef to return a single column.

SCRIPT_REAL("import numpy as np 
return np.corrcoef(_arg1,_arg2)[0,1]", 
SUM([Sales]),SUM([Profit]))

After adding the script to the calculation field, click the “Apply” button. The script will run and return values corresponding to the Customer Name. Before pressing the OK button, click on the link “Default Table Calculation” and change the option from “Automatic” to “Customer Name“.

Before pressing the OK button, click on the text Default Table Calculation and change the option from Automatic to Customer Name

We conclude by visualizing the scatter plot of product category and customer segmentation:

  • Drag the field Category and Sales in the Columns.
  • Then, the field Customer Segment and Profit in the Rows.
  • Drag the Customer Name on Detail in the Marks section.
  • Finally, drag the newly created calculated field onto the button Label in the section Marks.

The graph shows the correlation coefficient of customers by product category and customer segment. There is a high correlation between the Furniture and Small Business category.

The graph just created shows the customer correlation coefficient by product category and customer segment, indicating a high correlation between Furniture and Small Businesses.

Finally, as indicated above, remember to disconnect from the server connection between Tableau Desktop and the terminal.

That’s it!

Next step

Conclusion

Integrating Python into Tableau opens a vast field of analytical possibilities that significantly extend Tableau’s native capabilities. Using Python, analysts can implement complex statistical functions, such as Pearson’s correlation coefficient, directly into their dashboards, enriching the analysis with insights beyond simple data visualizations. This approach increases the accuracy of analyses and allows for more excellent data manipulation and processing flexibility. With the right tools and knowledge, Python and Tableau can transform data analysis, making information more accessible and interpretable for many business users.

FAQ

News tag:
Scroll to Top