Why use AddColumns with SummarizeColumns in DAX ?

If you are a seasoned PowerBI developer then it is expected that you should be aware of pitfalls of not using ADDCOLUMNS when using SUMMARIZE when creating an extension column in your DAX query.
But first, what is an extension column ?
Lets take the example of the following DAX expression
EVALUATE
SUMMARIZE (
Customer,
Customer[Name],
"LastPurchaseYear", YEAR ( MAX ( Sales[Order Date] ) ),
"Sales", [Sales Amount]
)
In the above expression, we created LastPurchaseYear as our extension column.
As best practice and performance, to create extension column it is always advisable to use ADDCOLUMNS with SUMMARIZE instead of only SUMMARIZE.
For example, the below expression
ADDCOLUMNS(
SUMMARIZE( <table>, <group by column> ),
<column_name>, CALCULATE( <expression> )
)
should be preferred over
SUMMARIZE( <table>, <group_by_column>, <column_name>, <expression> )
You might ask as to why we need CALCULATE while using a combination of ADDCOLUMNS/SUMMARIZE ?
This is because ADDCOLUMNS executes under a row context and if you use you use an expression with an ADDCOLUMNS, the value of the expression executes under the row context leading to meaningless results.
For more details please refer to the following article : https://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/
Now to the title of this article.
Do we need ADDCOLUMNS with SUMMARIZECOLUMNS as well ?
Lets look at a simple DAX expression that calculates the Sales Amount for a unique combination of Brand, Continent and Color.
EVALUATE
SUMMARIZECOLUMNS (
Product[Brand],
Customer[Continent],
Product[Color],
"Sales", [Sales Amount]
)
ORDER BY
'Product'[Brand],
Customer[Continent],
'Product'[Color]
We have the following output. There are 310 rows in the output.
Lets now change the above expression adding ADDCOLUMNS to the expression.
EVALUATE
ADDCOLUMNS (
SUMMARIZECOLUMNS ( Product[Brand], Customer[Continent], Product[Color] ),
"Sales Amount", [Sales Amount]
)
ORDER BY
'Product'[Brand],
Customer[Continent],
'Product'[Color]
We have the following output
The output has 333 rows , 23 more rows compared to the expression that did not use ADDCOLUMNS.
If you look at the output closely you would realize that in the 3rd row for Brand : A.Datum, Continent : Asia and Color : Blue we have (Blank) Sales Amount.
Lets narrow it down and check details for Brand : A.Datum and Color : Blue for all continents.
EVALUATE
CALCULATETABLE (
ADDCOLUMNS (
SUMMARIZECOLUMNS ( Product[Brand], Customer[Continent], Product[Color] ),
"Sales Amount", [Sales Amount]
),
TREATAS ( { ( "Blue", "A. Datum" ) }, Product[Color], 'Product'[Brand] )
)
ORDER BY
'Product'[Brand],
Customer[Continent],
'Product'[Color]
Now, lets not use ADDCOLUMNS in the expression
EVALUATE
CALCULATETABLE (
SUMMARIZECOLUMNS (
Product[Brand],
Customer[Continent],
Product[Color],
"Sales Amount", [Sales Amount]
),
TREATAS ( { ( "Blue", "A. Datum" ) }, Product[Color], 'Product'[Brand] )
)
ORDER BY
'Product'[Brand],
Customer[Continent],
'Product'[Color]
and we have only two rows in the output
Looking at the raw data for the Brand : A. Datum , we see that for color Blue, we have Sales for Europe and North America but no sales for Asia.
That’s the reason why we have a blank values for the sales amount for A.Datum Color: Blue and Continent : Asia in the output of the expression that uses ADDCOLUMNS.
Lets try to understand why
If we use SUMMARIZECOLUMNS without ADDCOLUMNS pattern, the three columns (Brand, Continent, Color) are grouped together along with the measure directly inside the function. This will result in groups being excluded that have measure returning blank values for a group. Similar to an SQL inner join where the Group makes an inner join with the Sales Amount that is executed under the current filter context.
Adding ADDCOLUMNS to the expression makes the expression behave differently. We have introduced the extension column vis-à-vis Sales Amount measure as part of the ADDCOLUMNS and not as SUMMARIZECOLUMNS.
So when SUMMARIZECOLUMNS is being evaluated the columns (Brand, Continent, Color) are grouped together without having understanding of the presence of measure [Sales Amount].Once the grouping is complete, ADDCOLUMNS starts evaluating the measure [Sales Amount] for each group.
But you might say that ADDCOLUMNS run under row context then how can [Sales Amount] be evaluated for individual group. Remember that a measure causes a context transition which leads the measure being calculated for a unique combination (Brand, Continent, Color).and this leads for the measure to return blank values for combination of columns that don’t have Sales Amount. Similar to an SQL left join.
To understand what I mean when I say if we don’t use measure for ADDCOLUMNS, the expression gets evaluated under a row context and not filter context.
Lets look at an example.
EVALUATE
ADDCOLUMNS (
SUMMARIZECOLUMNS ( Customer[Name] ),
"Currency", DISTINCTCOUNT ( Sales[Currency Code] )
)
In the above expression we trying to get DISTINCTCOUNT of currency a customer uses but with the above expression we get meaningless result because the count of currency type is repeated across each customer. DISTINCTCOUNT does not cause context transition and hence does not get evaluated under filter context.
Conclusion
In this article we explored the various scenarios where you might want to use ADDCOLUMNS in combination with SUMMRIZECOLUMNS. The right of usage might depend on specific use case and requirement.
The decision to use one pattern over another such as placing the measure directly inside SUMMRIZECOLUMNS versus using ADDCOLUMNS to append it afterward depends on the specific use case and reporting requirements. Factors such as performance, handling of blank rows, filter context behavior and readability all play a role in choosing the most appropriate approach.
Thanks for reading !!!




