This article explores the critical role of statistics in data analysis and how Tableau can facilitate understanding complex concepts through visualisation.
Introduction
In the information age in which we live, data is everywhere. From business decisions to scientific advances, the ability to interpret and analyse data has become a core competence. But what makes this interpretation possible? The answer lies in statistics, the branch of mathematics that allows us to collect, analyse, interpret, present and organise data. Despite its importance, statistics can seem intimidating to many, given its complexity and the need for a deep understanding to apply it correctly.
This is where Tableau, a revolutionary tool in data analysis, comes in. With its intuitive interface and powerful visualisation capabilities, Tableau transforms complex numbers and intricate datasets into understandable, interactive graphics. This not only makes data analysis accessible to a broader audience but also facilitates the discovery of patterns, trends and anomalies that would otherwise remain hidden.
This article aims to guide readers through the fundamental statistics concepts, illustrating them with practical examples in Tableau. Whether you are an experienced data analyst or a beginner in the world of data, the aim is to demystify statistics and show how, with the right tools, they can become a valuable resource rather than an obstacle.
Through a series of concrete examples, we will explore the different types of data, how to interpret critical measures in datasets, the meaning of distributions and how visualisation can play a crucial role in making statistics not only more accessible but also more engaging. We will prepare the ground so that even the most advanced concepts, such as hypothesis testing and variance analysis, become understandable and applicable in your daily data analysis.
With the advent of tools like Tableau, the barrier to entry for data analysis is significantly lowered. This article is meant to testify that statistics is far from an abstract and distant subject. It is a vibrant and fundamental discipline capable of shaping how we understand the world through data. We begin this visual journey, discovering how data can tell stories, guide decisions and ultimately improve our understanding of the world.
Note: You can access the dashboard on my personal Tableau Public page by clicking on each image.
Types of variables
This topic forms the foundation for understanding how data can be collected, analysed and interpreted. This section will emphasise the distinction between discrete and continuous variables, illustrating how each can be represented and analysed in Tableau.
Statistics mainly distinguishes between two types of variables: discrete and continuous. This distinction is crucial for choosing appropriate data analysis techniques and visualisations.
Discrete variables
Discrete variables are those that assume counted or discrete values. These data are presented in separate units and are easily accounted for. Typical examples include the number of children in a household, the number of parked cars, or the number of sales completed in a day. Discrete variables are often represented using bar graphs in Tableau, as these allow the distinction between different units to be visualised.
Which graphs should be used with Tableau? A bar graph showing the number of sales per product. Each bar represents a distinct product, and its height indicates the total number of sales for that product. This type of visualisation helps to quickly identify which products are the most popular or those that need more attention.
Example with Tableau: the bar graph below shows the total sales for each product subcategory, which is a discrete figure: each bar represents a different subcategory, and the height of the bar indicates the total sales in that subcategory. This visualisation is beneficial for easily comparing performance between different categories and identifying which subcategories contribute more or less to the overall turnover. It is immediately apparent which subcategories are the best and worst-selling, allowing analysts and decision-makers to identify areas of strength and improvement.
Continuous variables
In contrast, continuous variables can take on any value in a constant interval. These data represent measurements and can be divided into smaller units, such as time, temperature, or income. Unlike discrete variables, continuous variables offer a more fluid and detailed view of the phenomenon under investigation.
Which graphs should be used with Tableau? A line graph illustrating the change in temperature over a day. Each point on the graph represents the temperature measured at a particular time, and the line connecting the points shows how this changes over time. This type of visualisation is ideal for observing trends and patterns over time.
Example with Tableau: the line graph below clearly shows the total sales (represented on the axis Y) about months (represented on the axis X), with each dot indicating the total sales for each month. The line connecting the dots makes it easy to follow sales fluctuations over time. It highlights trends, cycles, or seasonal patterns and shows how sales vary month by month, offering an immediate view of any increases or decreases that could be related to specific events, promotions, or market trends. This visualisation is beneficial as it allows stakeholders and the decision-making team to quickly identify high and low-performance periods, thus informing sales and marketing strategy.
Displaying discrete and continuous variables in Tableau
Tableau offers powerful tools to visualise discrete and continuous variables, enabling analysts to transform complex data into intuitive and informative graphs. Using the different chart types available, the unique characteristics of each variable can be effectively explored and communicated.
Tip: When working with Tableau, it is helpful to experiment with different visualisations to find out which best communicates the information in your data. Remember to consider the intended audience of the analysis and the study’s objective.
In conclusion, understanding the distinction between discrete and continuous variables is crucial for any data analysis. This knowledge guides the choice of the most suitable analysis techniques and visualisations, enabling analysts to extract and communicate valuable insights from their datasets. The following section will explore how key measures such as mean, variance and standard deviation can be calculated and visually represented for both variables.
Measurements in datasets
In this section, we focus on synthesizing and interpreting data sets using basic statistical measures. Understanding these measures is vital for any data analyst, as they provide an immediate overview of a dataset’s main characteristics. Using Tableau, we can visualise these measurements to facilitate their interpretation.
Mean
The mean represents the average value of a set of numbers and is calculated by adding up all the values and dividing the result by the total number of values. The mean is often used to identify the central position of data in a set but can be influenced by extremely high or low values (outliers).
Which graphs should be used with Tableau? A bar graph with a horizontal line representing the average values. This can be particularly useful for visualising how individual values are distributed relative to the average.
Example with Tableau: the following bar graph shows the average sales for each sub-category within broader product categories. This type of graph helps compare average sales on various levels of product detail and reveals which subcategories perform better or worse on average categories.
Variance and standard deviation
The variance measures how much the data spread out from the mean; in other words, it indicates how much the values in the dataset differ. The standard deviation is the square root of the variance. It provides a measure of the dispersion of the data in original units, making it easier to interpret than the variance.
Which graphs should be used with Tableau? A histogram showing the data distribution, with vertical lines superimposed to show the mean and standard deviation. This type of visualisation helps to understand the dispersion of the data around the mean.
Example with Tableau: In the following graph, we have exploited the analytical capabilities of Tableau to create a ‘control chart’, a quality tool used to monitor the stability and control of a process over time. A control chart helps identify anomalies and trends and enables companies to maintain efficient and predictable processes. The graph displays an example of monthly sales with several vital elements integrated:
- Time series: sales are represented in a time series to capture their evolution over time.
- Average sales: a reference line crosses the graph, showing the average monthly sales as a central reference point.
- Control limits: We set the upper and lower control limits at standard deviations from the mean (1-3). This defines the expected range of variation and identifies any data point that falls outside this range as a potentially significant signal.
- Indications of anomalies: points representing monthly sales above or below these thresholds indicate unusual events. For example, months with sales over three standard deviations from the mean are considered statistically improbable and might suggest the need for further investigation.
- The graph is interactive. Users can change the number of standard deviations for the control limits using a parameter, which allows a more or less conservative analysis. This offers a magnifying glass through which to examine fluctuations in sales: a parameter set at one standard deviation will highlight even minor volatility. In contrast, only three parameters will provide a view of the most drastic variations.
This graph serves as a diagnostic tool to identify anomalies over time and can also guide operational and strategic decisions, such as inventory management or sales performance analysis. Through Tableau, sales data is transformed into a powerful storytelling tool that guides corporate action based on concrete and measurable insights.
How to visualise measurements in datasets with Tableau
Tableau visualisations can transform these abstract measurements into intuitive graphical representations, making trends, patterns and anomalies immediately evident.
- Mean: use line or bar graphs to superimpose the mean on the data. This helps to visualise the relationship between the individual data points and the average whole value.
- Variance and standard deviation: graphs such as histograms or box plots are particularly effective for showing the dispersion of data. The standard deviation can be displayed as an interval around the mean, providing a clear visual representation of the data’s variability.
Considerations on the use of measurements in datasets
When analysing these indicators, it is essential to remember that outliers can influence the mean and standard deviation. In such cases, other measures of central tendency, such as the median, can provide a more faithful representation of the data’s distribution. Tableau makes exploring these alternatives through different charts and configurations easy, allowing analysts to choose the visualisation best suited to their specific analysis context.
In conclusion, understanding and visualising fundamental measures in datasets is essential for data analysis. Tableau provides the tools to represent these measures in ways that make statistical concepts accessible and immediately understandable, allowing analysts to extract valuable insights and communicate them effectively. The following section will explore data distributions, another key concept for interpreting and analysing datasets in depth.
Distributions
This section deals with how data are distributed across different values or categories, offering valuable insights into the nature of the phenomena studied. Understanding distributions is crucial for data analysis, as it provides the basis for further statistical investigations, such as hypothesis testing and regression. We can easily visualise these distributions through Tableau, making the concepts more accessible and interpretable.
Frequency and proportional distributions
- Frequency distributions: show how data are grouped by different values or categories, counting the number of occurrences for each. This type of distribution helps identify patterns, such as the prevalence of specific values or categories in the data.
- Proportional distributions: Unlike frequency distributions, which focus on counting, proportional distributions examine the percentage or proportion that each group represents in the total data. They are handy for comparing groups of different sizes.
Which graphs should be used with Tableau? Create a pie chart or bar graph to visualise the proportions of different categories in a dataset, such as the sales distribution by region. These graphs make it possible to quickly visualise which category dominates or is least represented in the dataset.
An example with Tableau: The histogram below displays the frequency distribution of the quantities of products sold, broken down by category. Each column, or ‘bin’, represents a range of product quantities sold, while the height and colour of the columns reflect the count and proportion of sales within that bin. The histogram is a powerful visualisation tool for sales analysis, revealing the most frequently purchased quantities and allowing a direct comparison between categories. For example, we can immediately observe that most ‘Office Supplies’ transactions involve purchasing small quantities, a valuable insight for optimising stock and marketing strategies. The graph’s proportions further emphasise the prevalence of these small transactions. The graph’s proportions can guide operational decisions such as inventory planning or customising promotions based on the most common purchase frequency. This type of analysis and visualisation can help companies better understand the behaviour of their customers and adjust their operational strategies accordingly.
Symmetric and asymmetric distributions
- Symmetrical distributions: In a symmetrical distribution, the data are distributed equally around a central point, typically the mean. The graph of a symmetrical distribution looks the same on both sides of the centre.
- Skewed distributions: distributions can be asymmetrical or skewed to the right (positively skewed) or to the left (negatively skewed). This indicates that one tail of the distribution is longer than the other, suggesting a group of outliers or a trend in the data that deviates from the norm.
Which graphs should be used with Tableau? A histogram should show whether a distribution is symmetrical or asymmetrical. Histograms offer an immediate visualisation of the shape of the distribution, making asymmetries evident.
Normal distribution
- The normal, or Gaussian, distribution is one of the most important statistical distributions. Characterised by a symmetrical bell shape, it describes how data tend to cluster around a mean, with the variation decreasing as one moves away from the mean. Knowledge of the normal distribution is crucial, as many statistical techniques assume that the data follow this distribution.
Which graphs should be used with Tableau? Create a bell-shaped graph to represent the normal distribution of data. This type of graph effectively shows how most of the data is concentrated around the mean and how the frequency of data decreases as the distance from the mean increases.
Visualising distributions with Tableau
Tableau’s ability to transform complex data into clear, interactive visualisations allows analysts to explore and communicate the characteristics of data distributions effectively. Experimenting with different types of graphs can help you find the most suitable visualisation for the message you want to convey, thus improving your understanding and interpretation of the data.
Example with Tableau: the graph below reflects a detailed analysis of the distribution of profits in the “Sample – Superstore” dataset. Using both a histogram and a density curve, we can precisely observe the nature of the profit data’s distribution and determine whether it exhibits symmetry and normality characteristics.
- The shape of distribution: the symmetrical figure of the histogram and the corresponding density curve indicate a symmetrical distribution, where the data are equally distributed on both sides of the mean. This equilibrium is a hallmark of normal distributions, which are fundamental in statistics because of their properties and applications in hypothesis testing and statistical inference.
- The bell-shaped density curve: The density curve that emerges from the histogram has the characteristic shape of a bell, suggesting a normal distribution of profits. In a perfectly normal distribution, about 68% of the data would fall within one standard deviation of the mean, 95% within two standard deviations, and almost 99.7% within three standard deviations.
- Interpretation of asymmetrical distributions: If the histogram had shown a longer tail on one side or the other, this would have indicated an asymmetrical distribution, with data that tend to skew towards values higher or lower than the mean. Such distributions require special attention, as statistical techniques for normal distributions may not be appropriate.
Our graph confirms that profits have a general trend towards normality and that there are no indications of significant asymmetry that could indicate deviations from the standard operating routine. This information can be vital for companies seeking to establish financial forecasts, set performance benchmarks or assess the probability of rare events.
In the next section, we will explore in detail how visualisations, such as histograms and box plots, can be used to analyse and interpret distributions, thus offering further insight into the dynamics within your data.
Visualisations
Visualisations can transform abstract numbers into intuitive and easily interpretable graphs. Visualisation is a crucial part of data analysis, as it allows patterns, trends, and outliers to be uncovered that would not be immediately obvious just by examining the raw numbers. Two of the most powerful tools available are the histogram and the box plot, each offering a unique way of representing data distributions.
Histograms
Histograms show the data distribution by grouping the information into bins (intervals) and counting the number of observations in each bin. They are handy for visualising the shape of the data distribution, such as whether the data are symmetrical, asymmetrical (skewed) or follow a normal distribution.
Which graphs to use with Tableau: By creating a histogram of monthly sales, you can quickly identify which months had peaks or troughs in sales. Looking at the shape of the histogram, one can speculate on seasonal causes or events that influenced sales.
Box Plot
Box plots (or box-and-whisker diagrams) provide a visual summary of five main statistical measures: the minimum, the first quartile (Q1), the median, the third quartile (Q3) and the maximum. They offer a compact view of the data distribution, highlighting the presence of outliers and their symmetry.
Which graphs to use with Tableau: A box plot can be used to analyse the distribution of sales prices of different products. This can quickly identify which products have more significant price variability and whether any outliers might need further investigation.
How to visualise distributions with Tableau
Creating histograms
- Data selection: choose the data set to be analysed.
- Bin definition: set bins to reflect meaningful intervals for analysis.
- Visualisation: drag the bin field on the X-axis and the data count on the Y-axis to create the histogram.
Creation of box plots
- Data selection: identify the variable of interest for the analysis.
- Positioning: place the variable on the Y-axis and a categorical variable on the X-axis if you wish to compare distributions between categories.
- Choice of graph: select the ‘Box Plot’ option from Tableau to generate the diagram.
Example with Tableau (1): The following box plot focuses on sales distribution within product categories, giving an overview of how sales vary between categories. It is helpful to highlight differences in sales patterns between categories, such as the ‘Technology’ category, which may have higher average sales and more significant variability. This visualisation can be handy if we are trying to explain how different types of products perform in the market or if we are trying to analyse pricing or promotion strategies for specific categories. Furthermore, this visualisation is suitable in contexts that focus more on aspects of product assortment, category management, and pricing strategies, as it clearly represents the distribution of sales across product categories, which is crucial for inventory management and the allocation of marketing resources.
Example with Tableau (2): The box plot below, on the other hand, shows the distribution of sales among different customer segments and highlights customer-specific sales, thus offering a view of the purchasing behaviour of customers according to their segment. This is particularly useful if we discuss the customisation of sales strategies or the importance of recognising and incentivising critical customers in each segment. It also provides a clear illustration of how specific customers can significantly impact overall sales and could inspire strategies to engage with these individuals or companies. Compared to the previous visualisation, which focused on product category analysis and business strategy management, this one is more oriented towards analysing customer behaviour and optimising customer relationships and value.
Using visualisations for analysis
Visualisations help not only to understand the data at hand but also to communicate it effectively to an audience. Histograms and box plots, in particular, are essential tools in a data analyst’s arsenal to illustrate a dataset’s key characteristics. Tableau allows for customisation of these visualisations to suit the specific needs of the analysis, making the results accessible even to those without advanced statistical training.
In the next section, we will elaborate on the concept of hypothesis testing and how visualisations can support interpreting the results of such tests, offering even deeper insights into the analysed data.
Hypothesis testing
Hypothesis tests play a crucial role in statistical analysis, allowing analysts to evaluate statements or theories about data. This process uses the sample data to test whether a certain hypothesis about the population from which the sample was taken is likely. Using Tableau, we can visualise the results of hypothesis tests understandably, even for those without advanced statistical training.
Fundamental concepts
- Null hypothesis (H0): the null hypothesis assumes no significant difference or relationship exists between the groups or variables under consideration. It is the hypothesis that the test seeks to disprove.
- Alternative hypothesis (H1): the alternative hypothesis proposes a significant difference or relationship between the groups or variables. This is what it is hoped to demonstrate through the test.
- P-value: the p-value indicates the probability of obtaining the observed or more extreme results, assuming that the null hypothesis is true. A low p-value (typically less than 0.05) suggests that such data are unlikely to be observed by chance, thus rejecting the null hypothesis.
- Confidence and confidence interval: the confidence interval provides a range within which we expect the actual value of the population to be found, with a certain confidence level (often 95%).
Visualisation of hypothesis tests in Tableau
While Tableau does not directly perform hypothesis testing in the way statistical packages would, it can be used to visualise data so that decisions made by hypothesis testing are transparent and based on visual evidence.
- Scatter plots can show relationships between two variables, helping identify patterns supporting or refuting the alternative hypothesis.
- Box plots: These help compare the distributions of a variable across different groups. Box plots can show significant differences in medians, supporting rejecting the null hypothesis.
- Bar and line graphs: to show significant differences in mean or proportions between groups, which may indicate whether the alternative hypothesis has a statistical basis.
Interpretation of Results
The key to interpreting hypothesis test results is understanding that we are working with probability. Rejecting the null hypothesis does not definitively prove the alternative hypothesis but indicates that the data collected are unlikely if the null hypothesis were true. Visualisations in Tableau can help communicate these concepts, making the results of hypothesis tests more intuitive and less abstract.
Considerations
Hypothesis testing is a fundamental statistical analysis component, providing a framework for making data-driven decisions. Visualising the results with Tableau helps interpret the data and facilitates sharing these results with a broader audience, making data analysis more collaborative and inclusive. Remember, the ultimate goal is to use data to make informed statements and evidence-based decisions.
Example with Tableau
The following scatter plot illustrates the relationship between the average discounts applied and the average profits obtained from the products in the “Sample-Superstore” dataset. Using a linear regression model, we can predict the average impact of discounts on profits.
The formula for the regression line is as follows: Expected Profit = (Intercept) + (Slope) x (Average Discount), where ‘Intercept’ represents the profit value when the discount is zero and ‘Slope’ indicates the change in expected Profit per unit change in the discount.
For example, with the value “38.91” for the intercept and “-133.86” for the slope obtained from the regression model, to calculate the expected profit with an average discount of 10%, we insert 0.1 (10% expressed as a decimal) into the equation: 38.91 + (-133.86 x 0.1) = 38.91 – 13.39 = 25.52. Therefore, if the average discount applied is 10%, the model predicts the average profit would be $25.52. This analysis provides a better understanding of how profits vary as the discounts applied change and can help companies optimise their pricing strategies to maximise profitability.
Note: In the context of linear regression, it is essential to check that the slope and intercept are correctly indicated in the graph or the analysis software report. The slope coefficient indicates how much the average profit is expected to change with a one-percentage-unit change in the average discount. A negative value for the slope (-133.86) indicates a negative relationship between the discount and the profit, i.e. as the average discount increases, the average profit is expected to decrease. The very low p-value (0.0005003) suggests that the relationship between average discount and average profit is statistically significant, at least within the data sample depicted in the graph.
Conclusion
Concluding our journey through the foundations of statistics and the power of data visualisation with Tableau, it is essential to reflect on the importance of these tools and concepts in data analysis. The ability to understand and apply statistical principles, together with the effective use of visualisation tools such as Tableau, not only significantly improves our ability to interpret data but also opens the door to more informed decisions and deeper insights.
Statistics is more than a set of numbers and calculations; it is the language through which data speak. By using measurements in datasets, understanding distributions and applying hypothesis tests, we can extract meaningful insights from large datasets, revealing the hidden stories behind the numbers. However, the real magic happens when we combine these analyses with the power of data visualisation offered by tools like Tableau. This not only makes the analysis results accessible to a wider audience but also facilitates the discovery of patterns, trends and anomalies that might otherwise remain hidden.
Histograms, box plots and other visualisations become windows through which we can observe the world of data in new and illuminating ways. These tools allow us to present our results in a way that is immediately understandable, regardless of the technical background of the audience. The ability to effectively communicate the analysis’s results is just as important as the analysis itself, and data visualisation plays a key role in this process.
Ultimately, integrating statistics and data visualisation into our analytical toolkit enriches our understanding of data and improves our ability to make decisions based on it. Whether optimising business strategies, guiding public policy or advancing scientific research, data analysis is indispensable, supported by a sound understanding of statistics and enriched by compelling visualisations.
Next Steps
While this article introduces the fundamentals of statistics and data visualisation with Tableau, many other techniques and tools exist to explore. I invite you to practice further, experimenting with different types of data and visualisations and applying these concepts to new and different analytical challenges. Remember: every dataset has a story to tell, and with the right tools, you can be the one to say to it.
Thank you for sharing this brief introductory journey into statistical analysis with Tableau. I hope you feel more ‘equipped’ to explore the vast and dynamic world of data, armed with statistics and Tableau. Great job on the analysis!”
Note (1): the dashboard with the examples in the article is available on my personal Tableau Public page.
Note (2): The blog will soon feature several articles that address these topics separately and in more detail, including a tutorial explaining the use and understanding of scatterplots in Tableau.
FAQ
The FAQ section aims to answer some of the most common questions concerning statistical analysis and the use of Tableau for data visualisation. These questions can help readers clarify doubts and further deepen their understanding of the concepts discussed in the article.
This FAQ provides a starting point further to explore data analysis and the use of Tableau. Continuing to ask questions and experiment with data is crucial to developing a solid understanding of statistical analysis and visualisation.
If you enjoyed the article and want to explore these issues further or connect with me to exchange ideas, please visit my LinkedIn profile. I am waiting for you!