Why You Should Blend Instead of Join in Tableau

Why You Should Blend Instead of Join in Tableau

Data blending in Tableau is the operation of combining multiple data sources into the same view by finding common fields between them to join on. Unlike an ordinary join, which combines data sources at the lowest granularity before any aggregation is done, a data blend can join data sources after aggregation is performed on the individual sources; ultimately limiting the number of records that are joined together and maximizing computational efficiency. The diagram below describes the blending process in Tableau.

Blended Image

Even to an experienced SQL query writer, the mechanics of blending data, rather than joining it, can be difficult to understand upon initial exposure. Since data blending is a fairly advanced concept, this article explains the concept of data blending at a basic level using Excel as an example.

You may ask, “How is blending different from a regular join?”

To answer this question, consider the following example, which is the equivalent of a data blend in Tableau but is executed within the familiar environment of Excel.

Suppose you have data in two different Excel spreadsheets: the first sheet contains a table of data on sales broken out by country, the second sheet contains a table of sales quotas for sales agents, split out by the sales agent name and his or her assigned country.

Sales

Country Sales
United States $1,000,000
Canada $750,000
Mexico $250,000

 

Sales Quota

Sales Agent Country Sales Quota
Sal United States $400,000
Mary United States $350,000
Bob Canada $300,000
Jim Canada $200,000
Elizabeth Mexico $100,000

 

Notice how these two tables are at different granularities, as the Sales table lacks a Sales Agent dimension. If you wanted to do a side by side comparison of Sales vs. Sales Quota by Country, you would likely first aggregate the Sales Quota data to be summed by Country, then perform a vlookup to the Sales Quota table on each Country in the Sales table to grab the total Sales Quota for that Country, and place the result next to the Sales column for the corresponding Country.

The steps would look like this:

  1. Aggregate Sales Quota by Country
Country Sales Quota
United States $750,000
Canada $500,000
Mexico $100,000

 

  1. Perform a vlookup from Sales to Sales Quota on Country
Country Sales Sales Quota
United States $1,000,000 $750,000
Canada $750,000 $500,000
Mexico $250,000 $100,000

 

This concept is referred to as a post-aggregate join, as the vlookup to the sales quota spreadsheet did not occur until after the sales data was already aggregated by country. Therefore, the “join” to the second data source was done post-aggregation. This process of joining data sources post-aggregation is referred to as data blending in Tableau.

Had we performed a vlookup on the Sales data before aggregating by Country, we would likely have multiple Sales records for each Country (one row for each Sales Agent) with the Sales Quota next to each record that repeats itself for the same countries. In order to get the desired comparison, we would have to aggregate the data by summing the Sales column by Country, and grabbing the max (or min) Sales Quota by Country.

The steps in an ordinary join would follow the process below:

  1. Join Sales and Sales Quota tables on common fields (Country)
Sales Agent Country Sales Sales Quota
Sal United States $1,000,000 $400,000
Mary United States $1,000,000 $350,000
Bob Canada $750,000 $300,000
Jim Canada $750,000 $200,000
Elizabeth Mexico $250,000 $100,000

 

  1. Use the above table to aggregate Sales and Sales Quota by Country. i.e. take the max of Sales by Country and the sum of Sales Quota by Country. Remove the Sales Agent column from view.
Country Sales Sales Quota
United States $1,000,000 $750,000
Canada $750,000 $500,000
Mexico $250,000 $100,000

 

This process is understandably more work, as two different aggregations have to occur, and much more data ends up being processed at once because the vlookup occurred before the aggregation. Both of these operations are computationally expensive.

Creating a blended join in Tableau would mean having your Sales data in one data source, Sales Quota data in another data source, creating an aggregated view of the sum of Sales by Country, and simply dragging Sales Quota onto the visualization next to the Sales column. Tableau will automatically recognize the common field of Country between the two data sources, and use that to do a post-aggregate join. Tableau does not join the two data sources until after the data is already aggregated.

The post-aggregate join is much better for performance than joining at the row-level before developing the view and then performing the aggregate calculations.

Questions on how to blend data in Tableau? Leave a comment below.

5 Comments

  • Justin August 16, 2016 8:11 pm

    What about when you have two different complex models blending on each other? We ran into data problems doing that but are seeing improvements by combining into one source.

    • Salman Khan August 17, 2016 12:57 pm

      I’m not sure what data problems you experienced but typically blending will be better for performance when the data sources contain too many records for a join to be practical or if you want to display summary information and details at the same time.

      In other situations like yours it may make more sense to combine the data sources before aggregating.

      Thanks for your input!

      • Justin August 22, 2016 8:31 am

        Blending should not be the go-to when combining data. I look at blending as using it to connect to a lookup table or joining monthly files together…everything else I join before it goes into Tableau. I’ve used sources in different scenarios and have found that blending is not always optimal.

        • Salman Khan August 22, 2016 10:10 pm

          You’re right, it’s not the go-to in all situations. It is preferred, as per Tableau’s knowledge base article linked below, in situations where there are too many records for a join to be practical or summary information and details should be displayed simultaneously.

          The summary-detail data is the lookup table you are describing. The example in this article was an example of such a scenario.

          http://kb.tableau.com/articles/en_US/knowledgebase/join-vs-relationship

  • William April 17, 2017 5:51 pm

    Good insight for data blending. However in blending can’t do the lod, such as this formula every sales agent how much to Sales = Sales – Sale quota by agents. Any suggestion for this kind of problem?

Your email address will not be published. Required fields are marked *

Phone: 312-602-4000
Email: marketing@westmonroepartners.com
222 W. Adams
Chicago, IL 60606
Show Buttons
Share On Facebook
Share On Twitter
Share on LinkedIn
Hide Buttons