Common Table Expressions (CTE) and Derived Tables are very useful when you are reporting from transactional tables as opposed to analytical, or OLAP tables (i.e. SQL Server Analysis Services, or SSAS). The reason being is this, while OLAP tables are built for the purpose of summaries and reporting, transactional tables typically do not summarize records, instead they contain records of transactions or records of the objects that conducted the transactions (and/or related info). I’ll explain with an example.
Imagine a table of a team’s baseball games. Each record contains the team, it’s opponent, and the result (win or loss). You can count all games played by a team very simply:
SELECT TeamName, count(*) as GameCount FROM Game GROUP by TeamName
You can also count all the wins by each team, just by adding a WHERE clause, like this:
SELECT TeamName, count(*) as GameCount FROM Game WHERE Result = 'Win' GROUP by TeamName
Problem: If you want to produce a report of each team’s won-loss record, it would take two passes at the data — first, get the game count, and then get the win count (and subtract games minus wins to get the losses).
Solution #1: Derived Tables
If you use a derived table, you can produce each team’s won-loss record with one SQL statement. To do this, take one of the statements above and use it almost like it exists a table in the SQL database. Here is what it looks like:
SELECT TeamGames.Team , WinCount as Wins , count(*) - WinCount as Losses FROM Game as TeamGames JOIN ( SELECT Team, count(*) as WinCount FROM GameResult WHERE Result = 'Win' GROUP by Team ) as TeamWins on TeamGames.Team = TeamWins.Team GROUP by TeamGames.Team, WinCount
See how I joined onto the second statement as if it was table within the parentheses? There is no TeamWins summary table, but the second statement is what a TeamWins summary table would look like. Therefore, we join onto it by placing the code from the statement in between the parentheses. Then we name the table, and provide the join criteria.
Actually, there is a problem with the third statement. Any team that has not won a game will not show up at all in the results. This is because our join statement is implied as an INNER JOIN in T-SQL. We need to make a few changes to perfect this, first by using a LEFT OUTER JOIN and then by adding the isNULL function to convert any NULLs into zeroes. Here is the finished code:
SELECT TeamGames.Team , isNULL(WinCount,0) as Wins , count(*) - ISNULL(WinCount,0) as Losses FROM Game as TeamGames LEFT OUTER JOIN ( SELECT Team, count(*) as WinCount FROM Game WHERE Result = 'Win' GROUP by Team ) as TeamWins on TeamGames.Team = TeamWins.Team GROUP by TeamGames.Team, WinCount
Solution #2: Common Table Expressions
Derived tables have been used in SQL for a long time, but the newer way to do this is via a CTE. This syntax is part of the ANSI SQL 99 specification and T-SQL (Microsoft) beginning with version 2005. In a CTE, you define the table as an expression in the beginning of the statement as shown in this code:
WITH CTE_TeamWins as ( SELECT Team, count(*) as WinCount FROM GameResult WHERE Result = 'Win' GROUP by Team ) SELECT TeamGames.Team , WinCount as Wins , count(*) - WinCount as Losses FROM Game as TeamGames JOIN TeamWins on TeamGames.Team = TeamWins.Team GROUP by TeamGames.Team, WinCount
With the example above, it’s hard to tell why the CTE method is typically preferred over derived tables. The CTE actually uses more code, and — in my opinion — is less readable for this very simple example. However, as things become more complex, the advantage of using CTE instead of derived tables becomes more apparent.
Here are advantages of using a CTE versus a derived table:
- CTE can be referenced multiple times in the statement, but a derived table would need to be typed out for each reference.
- You can create CTE that reference CTE, keeping your code clean, and easier to read.
- You can create recursive CTE which is useful to traverse hierarchies.
Word to the Wise – Performance Issues
You can use CTE for some reporting like the examples shown above. However, use of CTE (or derived tables) in large ETL solutions can cause performance issues. In those cases, it’s typically better to use temporary tables in place of CTE.