Data blending in Tableau is the operation of combining multiple data sources into the same view by finding common fields between them to join on. Unlike an ordinary join, which combines data sources at the lowest granularity before any aggregation is done, a data blend can join data sources after aggregation is performed on the individual sources; ultimately limiting the number of records that are joined together and maximizing computational efficiency. The diagram below describes the blending process in Tableau.
Even to an experienced SQL query writer, the mechanics of blending data, rather than joining it, can be difficult to understand upon initial exposure. Since data blending is a fairly advanced concept, this article explains the concept of data blending at a basic level using Excel as an example.
You may ask, “How is blending different from a regular join?”
To answer this question, consider the following example, which is the equivalent of a data blend in Tableau but is executed within the familiar environment of Excel.
Suppose you have data in two different Excel spreadsheets: the first sheet contains a table of data on sales broken out by country, the second sheet contains a table of sales quotas for sales agents, split out by the sales agent name and his or her assigned country.
|Sales Agent||Country||Sales Quota|
Notice how these two tables are at different granularities, as the Sales table lacks a Sales Agent dimension. If you wanted to do a side by side comparison of Sales vs. Sales Quota by Country, you would likely first aggregate the Sales Quota data to be summed by Country, then perform a VLOOKUP to the Sales Quota table on each Country in the Sales table to grab the total Sales Quota for that Country, and place the result next to the Sales column for the corresponding Country.
The steps would look like this:
- Aggregate Sales Quota by Country
- Perform a vlookup from Sales to Sales Quota on Country
This concept is referred to as a post-aggregate join, as the VLOOKUP to the sales quota spreadsheet did not occur until after the sales data was already aggregated by country. Therefore, the “join” to the second data source was done post-aggregation. This process of joining data sources post-aggregation is referred to as data blending in Tableau.
Had we performed a VLOOKUP on the Sales data before aggregating by Country, we would likely have multiple Sales records for each Country (one row for each Sales Agent) with the Sales Quota next to each record that repeats itself for the same countries. In order to get the desired comparison, we would have to aggregate the data by summing the Sales column by Country, and grabbing the max (or min) Sales Quota by Country.
The steps in an ordinary join would follow the process below:
- Join Sales and Sales Quota tables on common fields (Country)
|Sales Agent||Country||Sales||Sales Quota|
- Use the above table to aggregate Sales and Sales Quota by Country. i.e. take the max of Sales by Country and the sum of Sales Quota by Country. Remove the Sales Agent column from view.
This process is understandably more work, as two different aggregations have to occur, and much more data ends up being processed at once because the VLOOKUP occurred before the aggregation. Both of these operations are computationally expensive.
Creating a blended join in Tableau would mean having your Sales data in one data source, Sales Quota data in another data source, creating an aggregated view of the sum of Sales by Country, and simply dragging Sales Quota onto the visualization next to the Sales column. Tableau will automatically recognize the common field of Country between the two data sources, and use that to do a post-aggregate join. Tableau does not join the two data sources until after the data is already aggregated.
The post-aggregate join is much better for performance than joining at the row-level before developing the view and then performing the aggregate calculations.
Questions on how to blend data in Tableau? Leave a comment below.