Sunday 14 June 2009

Data Pivoting Techniques in SQL Server 2008 & 2005

The problem

A common problem in SQL database queries is the need to generate cross-tabulation reports to summarize data, By cross-tabulation or abbreviated as cross-tab, what i mean is converting rows to columns and columns to rows.

Lets take a look at the following example.

   1: --Declare the table variable = @TblOrderDetails
   2:  
   3: DECLARE @TblOrderDetails TABLE(
   4:     Id              INT NOT NULL PRIMARY KEY, --1
   5:     UniqueOrderKey  UNIQUEIDENTIFIER DEFAULT(NEWSEQUENTIALID()), --2,3
   6:     Employee        NVARCHAR(50) NOT NULL,
   7:     ProductId       INT NOT NULL,
   8:     Sales           DECIMAL(8,3)        
   9: )
  10:  
  11: /*NOTE:--------------------------------------------------------------------------
  12: 1. I've used an integer datatype to build the default index on the PK.
  13: 2. Uniqueidentifier type for true uniqueness(useful for replication scenarios)
  14: 3. You can use newid() or newsequentialid() for creating a new unique identifier,
  15:    for performance reasons, newsequentialid() is preferred.
  16: ---------------------------------------------------------------------------------*/
  17:  
  18:  
  19: --Insert Test Data into @TblOrderDetails
  20:  
  21: INSERT INTO @TblOrderDetails(Id,Employee,ProductId,Sales)
  22: SELECT 1,'Sudhir.Murthy',5,2000
  23: UNION ALL
  24: SELECT 2,'Tom.Hanks',6,3000
  25: UNION ALL
  26: SELECT 3,'Jim.Griffiths',5,1000
  27: UNION ALL
  28: SELECT 4,'Sue.Steely',7,1000
  29: UNION ALL
  30: SELECT 5,'Sudhir.Murthy',8,2000
  31: UNION ALL
  32: SELECT 6,'James.Martin',5,1000
  33: UNION ALL
  34: SELECT 7,'Paul.Smith',5,1000
  35: UNION ALL
  36: SELECT 8,'Sudhir.Murthy',6,1000
  37: UNION ALL
  38: SELECT 9,'Tom.Hanks',7,2000
  39: UNION ALL
  40: SELECT 10,'Sue.Steely',4,2000
  41: UNION ALL
  42: SELECT 11,'Jim.Griffiths',6,2000
  43: UNION ALL
  44: SELECT 12,'James.Martin',7,5000
  45: UNION ALL
  46: SELECT 13,'Paul.Smith',4,4000
  47: UNION ALL
  48: SELECT 14,'Sue.Steely',3,1000
  49: UNION ALL
  50: SELECT 15,'Tom.Hanks',5,1000
  51: UNION ALL
  52: SELECT 16,'Jim.Griffiths',6,1000
  53: UNION ALL
  54: SELECT 17,'Sudhir.Murthy',3,2000
  55: UNION ALL
  56: SELECT 18,'Tom.Hanks',2,5000
  57: UNION ALL
  58: SELECT 19,'Sue.Steely',2,2000
  59: UNION ALL
  60: SELECT 20,'Sudhir.Murthy',7,1000
  61:  
  62: --Fetch from @TblOrderDetails
  63: SELECT * FROM @TblOrderDetails

Your @TblOrderDetails table data should now look like this :-

 

Id    UniqueOrderKey                          Employee          ProductId    Sales
----------------------------------------------------------------------------------------
1 BD8BAC49-B6A1-E211-BE97-68A86D1C3B19 Sudhir.Murthy 5 2000.000

2 BE8BAC49-B6A1-E211-BE97-68A86D1C3B19 Tom.Hanks 6 3000.000

3 BF8BAC49-B6A1-E211-BE97-68A86D1C3B19 Jim.Griffiths 5 1000.000

4 C08BAC49-B6A1-E211-BE97-68A86D1C3B19 Sue.Steely 7 1000.000

5 C18BAC49-B6A1-E211-BE97-68A86D1C3B19 Sudhir.Murthy 8 2000.000

6 C28BAC49-B6A1-E211-BE97-68A86D1C3B19 James.Martin 5 1000.000

7 C38BAC49-B6A1-E211-BE97-68A86D1C3B19 Paul.Smith 5 1000.000

8 C48BAC49-B6A1-E211-BE97-68A86D1C3B19 Sudhir.Murthy 6 1000.000

9 C58BAC49-B6A1-E211-BE97-68A86D1C3B19 Tom.Hanks 7 2000.000

10 C68BAC49-B6A1-E211-BE97-68A86D1C3B19 Sue.Steely 4 2000.000

11 C78BAC49-B6A1-E211-BE97-68A86D1C3B19 Jim.Griffiths 6 2000.000

12 C88BAC49-B6A1-E211-BE97-68A86D1C3B19 James.Martin 7 5000.000

13 C98BAC49-B6A1-E211-BE97-68A86D1C3B19 Paul.Smith 4 4000.000

14 CA8BAC49-B6A1-E211-BE97-68A86D1C3B19 Sue.Steely 3 1000.000

15 CB8BAC49-B6A1-E211-BE97-68A86D1C3B19 Tom.Hanks 5 1000.000

16 CC8BAC49-B6A1-E211-BE97-68A86D1C3B19 Jim.Griffiths 6 1000.000

17 CD8BAC49-B6A1-E211-BE97-68A86D1C3B19 Sudhir.Murthy 3 2000.000

18 CE8BAC49-B6A1-E211-BE97-68A86D1C3B19 Tom.Hanks 2 5000.000

19 CF8BAC49-B6A1-E211-BE97-68A86D1C3B19 Sue.Steely 2 2000.000

20 D08BAC49-B6A1-E211-BE97-68A86D1C3B19 Sudhir.Murthy 7 1000.000


Now, From the above schema of the table, you can find out how much sales has an employee made, how many products has he sold etc., For instance, the below query



   1: SELECT      Employee,
   2:             COUNT(ProductId) AS TotalProducts,
   3:             SUM(Sales) AS TotalSales
   4: FROM        @TblOrderDetails 
   5: GROUP BY    Employee
   6: ORDER BY    Employee

fetches you the result :-

 


Employee        TotalProducts    TotalSales
--------------------------------------------
James.Martin 2 6000.000
Jim.Griffiths 3 4000.000
Paul.Smith 2 5000.000
Sudhir.Murthy 5 8000.000
Sue.Steely 4 6000.000
Tom.Hanks 4 11000.000


But, you would want to generate a cross tabular report, for instance, which give you a result of employees vs products, which looks like this ?

Employee   Product2    Product3    Product4    Product5    Product6    Product7   Product8
------------------------------------------------------------------------------------------

 

In order to achieve the above desired result, we would have to transpose the rows in the select query to columns, in other worlds pivot the Employee and ProductId fields. This can be achieved using the CASE Construct and the PIVOT operator. Both of which are discussed extensively below.

 

1. Solution using CASE Constructs

Lets take a look at the old school CASE Construct in SQL Server. By using a CASE construct, We can rotate the rows into columns.


example 1.



   1: /*Rotate Rows To Columns using CASE*/
   2:  
   3: SELECT    Employee,
   4:          (CASE WHEN ProductId=2 THEN ProductId ELSE NULL END) AS Product2,
   5:          (CASE WHEN ProductId=3 THEN ProductId ELSE NULL END) AS Product3,
   6:          (CASE WHEN ProductId=4 THEN ProductId ELSE NULL END) AS Product4,
   7:          (CASE WHEN ProductId=5 THEN ProductId ELSE NULL END) AS Product5,
   8:          (CASE WHEN ProductId=6 THEN ProductId ELSE NULL END) AS Product6,
   9:          (CASE WHEN ProductId=7 THEN ProductId ELSE NULL END) AS Product7,
  10:          (CASE WHEN ProductId=8 THEN ProductId ELSE NULL END) AS Product8  
  11: FROM     @TblOrderDetails

 

example 1 result.


   1: Employee        Product2  Product3  Product4  Product5  Product6  Product7  Product8
   2: ------------------------------------------------------------------------------------
   3: Sudhir.Murthy    NULL     NULL        NULL      5        NULL       NULL    NULL
   4: Tom.Hanks        NULL     NULL        NULL    NULL        6         NULL    NULL
   5: Jim.Griffiths    NULL     NULL        NULL      5        NULL       NULL    NULL
   6: Sue.Steely       NULL     NULL        NULL    NULL       NULL         7     NULL
   7: Sudhir.Murthy    NULL     NULL        NULL    NULL       NULL        NULL    8
   8: James.Martin     NULL     NULL        NULL      5        NULL        NULL    NULL
   9: Paul.Smith       NULL     NULL        NULL      5        NULL        NULL    NULL
  10: Sudhir.Murthy    NULL     NULL        NULL    NULL        6          NULL    NULL
  11: Tom.Hanks        NULL     NULL        NULL    NULL       NULL         7      NULL
  12: Sue.Steely       NULL     NULL         4      NULL       NULL        NULL    NULL
  13: Jim.Griffiths    NULL     NULL        NULL    NULL        6          NULL    NULL
  14: James.Martin     NULL     NULL        NULL    NULL       NULL         7      NULL
  15: Paul.Smith       NULL     NULL         4      NULL       NULL        NULL    NULL
  16: Sue.Steely       NULL     3           NULL    NULL       NULL        NULL    NULL
  17: Tom.Hanks        NULL     NULL        NULL     5         NULL        NULL    NULL
  18: Jim.Griffiths    NULL     NULL        NULL    NULL        6          NULL    NULL
  19: Sudhir.Murthy    NULL     3           NULL    NULL       NULL        NULL    NULL
  20: Tom.Hanks         2       NULL        NULL    NULL       NULL        NULL    NULL
  21: Sue.Steely        2       NULL        NULL    NULL       NULL        NULL    NULL
  22: Sudhir.Murthy    NULL     NULL        NULL    NULL       NULL         7      NULL

However, Note that employees repeat in the ‘Employee’ column. This is probably not the expected result. what we expect is to have an aggregate of all ProductIds over Employee.


In order to do that we take a look at the below example.


example 2.



   1: /*Rotate Rows To Columns using COUNT ( CASE..)*/
   2:  
   3: SELECT      Employee,
   4:             COUNT(CASE WHEN ProductId=2 THEN ProductId ELSE 0 END) AS Product2,
   5:             COUNT(CASE WHEN ProductId=3 THEN ProductId ELSE 0 END) AS Product3,
   6:             COUNT(CASE WHEN ProductId=4 THEN ProductId ELSE 0 END) AS Product4,
   7:             COUNT(CASE WHEN ProductId=5 THEN ProductId ELSE 0 END) AS Product5,
   8:             COUNT(CASE WHEN ProductId=6 THEN ProductId ELSE 0 END) AS Product6,
   9:             COUNT(CASE WHEN ProductId=7 THEN ProductId ELSE 0 END) AS Product7,
  10:             COUNT(CASE WHEN ProductId=8 THEN ProductId ELSE 0 END) AS Product8  
  11: FROM        @TblOrderDetails
  12: GROUP BY    Employee
  13:  

example 2 result.


Employee       Product2  Product3  Product4  Product5  Product6  Product7  Product8
-----------------------------------------------------------------------------------
James.Martin 2 2 2 2 2 2 2
Jim.Griffiths 3 3 3 3 3 3 3
Paul.Smith 2 2 2 2 2 2 2
Sudhir.Murthy 5 5 5 5 5 5 5
Sue.Steely 4 4 4 4 4 4 4
Tom.Hanks 4 4 4 4 4 4 4

Here as expected, The COUNT() aggregate function aggregates the ProductIds over Employees giving you an overview of how many products each employee sold. Also, in daily usage and of course dependent on problems, a most  common scenario is to use the aggregate function SUM(). I will leave that for you to fiddle around.

 

2. Solution using PIVOT


PIVOT rotates a table-valued expression by turning the unique values form one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.


UNPIVOT performs the opposite operation to PIVOT by rotating the columns of a table valued expression into row values.


NOTE :- IMPORTANT


When PIVOT and UNPIVOT are used against databases that are upgraded to SQL Server 2005 or later, the compatibility level of the database must be set to 90 or higher.


Syntax for using PIVOT


The following is annotated syntax for PIVOT.
SELECT <;non-pivoted column>,
[first pivoted column] AS <;column name>,
[second pivoted column] AS <;column name>,
...
[last pivoted column] AS <;column name>
FROM
(<;SELECT query that produces the data>)
AS <;alias for the source query>
PIVOT
(
<;aggregation function>(<column being aggregated>)
FOR
[<;column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS <;alias for the pivot table>
<optional ORDER BY clause>;

example 3.


Here, I’m going to demonstrate the PIVOT operation to pivot table valued expression rows to columns. The SQL Server PIVOT keyword / construct is only available from SQL Server 2005 onwards.


In it’s simple form, The PIVOT construct looks as below. It has a Aggregate function, which is COUNT() here in our case. We are then using this to Count (ProductId) From the Query and pivotting the distinct row values of the ProductId which are  [2],[3],[4],[5],[6],[7],[8] to columns in the Select Query.


/*********************************************************************
Query to yeild result set of employees vs all products that they sold.
Here we are pivotting the ProductIds.
**********************************************************************/

SELECT Employee,
[2] AS Product2,
[3] AS Product3,
[4] AS Product4,
[5] AS Product5,
[6] AS Product6,
[7] AS Product7,
[8] AS Product8
FROM @TblOrderDetails
PIVOT (
COUNT(ProductId)
FOR ProductId IN ([2],[3],[4],[5],[6],[7],[8])
)Pvt


/*Notice that after pivotting the ProductId column,
It will not be available in the SELECT statement.*/

 

example 3 result.


Employee    Product2    Product3    Product4    Product5    Product6    Product7    Product8
--------------------------------------------------------------------------------------------
Sudhir.Murthy 0 0 0 1 0 0 0
Tom.Hanks 0 0 0 0 1 0 0
Jim.Griffiths 0 0 0 1 0 0 0
Sue.Steely 0 0 0 0 0 1 0
Sudhir.Murthy 0 0 0 0 0 0 1
James.Martin 0 0 0 1 0 0 0
Paul.Smith 0 0 0 1 0 0 0
Sudhir.Murthy 0 0 0 0 1 0 0
Tom.Hanks 0 0 0 0 0 1 0
Sue.Steely 0 0 1 0 0 0 0
Jim.Griffiths 0 0 0 0 1 0 0
James.Martin 0 0 0 0 0 1 0
Paul.Smith 0 0 1 0 0 0 0
Sue.Steely 0 1 0 0 0 0 0
Tom.Hanks 0 0 0 1 0 0 0
Jim.Griffiths 0 0 0 0 1 0 0
Sudhir.Murthy 0 1 0 0 0 0 0
Tom.Hanks 1 0 0 0 0 0 0
Sue.Steely 1 0 0 0 0 0 0
Sudhir.Murthy 0 0 0 0 0 1 0


As you can notice, this is not quite what we aimed for initially, The employees are repeated here. We would ideally want to have a consolidated view of all *distinct* employees and the number of products that they’ve sold.


example 4.


In this example, we are going to extend our query from example 3 to achieve what we call as ‘Grouping’ using PIVOT. As you can see from the query below, We have used the Aggregate function COUNT() inside the PIVOT construct and we have a select query to select Employee and ProductId’s. But notice the construct has changed in the From clause. By using the Employee & ProductId in the From Clause here, every employee servers as a value column against which the unique rows - [2],[3],[4],[5],[6],[7] & [8] are grouped to columns.


/***************************************************************
Query to yeild the consolidated result set of employees vs
products that they sold.
****************************************************************/

SELECT Employee,
[2] AS Product2,
[3] AS Product3,
[4] AS Product4,
[5] AS Product5,
[6] AS Product6,
[7] AS Product7,
[8] AS Product8
FROM (
SELECT Employee,ProductId FROM @TblOrderDetails
) AS Qry
PIVOT (
COUNT(ProductId)
FOR ProductId IN ([2],[3],[4],[5],[6],[7],[8])
) Pvt


/*Notice that after pivotting the ProductId column,
It will not be available in the SELECT statement.*/


example 4 result.


   1: Employee      Product2   Product3    Product4   Product5   Product6   Product7  Product8
   2: ----------------------------------------------------------------------------------------
   3: James.Martin       0        0            0          1          0        1           0
   4: Jim.Griffiths      0        0            0          1          2        0           0
   5: Paul.Smith         0        0            1          1          0        0           0
   6: Sudhir.Murthy      0        1            0          1          1        1           1
   7: Sue.Steely         1        1            1          0          0        1           0
   8: Tom.Hanks          1        0            0          1          1        1           0

Note: When aggregate functions are used with PIVOT, the presence of any null values in the value column are not considered when computing an aggregation.


UNPIVOT example.


UNPIVOT is the reverse operation of PIVOT. i.e It converts the table valued expression to rows by rotating columns into rows. Suppose the table produced in the previous example is stored in the table ‘@TblEmployeeProducts’, and you want to rotate the column identifiers Product2, Product3, Product4, Product5, Product6, Product7 and Product8 into row values that correspond to a particular Employee.





   1: /*Declare the table variable = @TblEmployeeProducts
   2: Please Note: The table is only designed this way to 
   3: keep the example simple and straightforward. 
   4: In an ideal world, Products will not be repeated this
   5: way to violate Third and BC Normal Forms.
   6: */
   7:  
   8: DECLARE @TblEmployeeProducts TABLE(
   9:     Id              INT NOT NULL PRIMARY KEY,
  10:     UniqueOrderKey  UNIQUEIDENTIFIER DEFAULT(NEWSEQUENTIALID()),
  11:     Employee        NVARCHAR(50) NOT NULL,
  12:     Product2        INT NOT NULL,
  13:     Product3        INT NOT NULL,
  14:     Product4        INT NOT NULL,
  15:     Product5        INT NOT NULL,
  16:     Product6        INT NOT NULL,
  17:     Product7        INT NOT NULL,
  18:     Product8        INT NOT NULL
  19: )
  20:  
  21: --Insert Test Data into @TblEmployeeProducts
  22:  
  23: INSERT @TblEmployeeProducts
  24: (Id,Employee,Product2,Product3,Product4,Product5,Product6,Product7,Product8) 
  25: SELECT 1,'James.Martin',0,0,0,1,0,1,0
  26: UNION  ALL
  27: SELECT 2,'Jim.Griffiths',0,0,0,1,2,0,0
  28: UNION  ALL
  29: SELECT 3,'Paul.Smith',0,0,1,1,0,0,0
  30: UNION  ALL
  31: SELECT 4,'Sudhir.Murthy',0,1,0,1,1,1,0
  32: UNION  ALL
  33: SELECT 5,'Sue.Steely',1,1,1,0,0,1,0
  34: UNION  ALL
  35: SELECT 6,'Tom.Hanks',1,0,0,1,1,1,0
  36:  
  37: ---------------------------------------
  38:  
  39: SELECT  Employee,
  40:         Product2 as Prod2,
  41:         Product3 as Prod3,
  42:         Product4 as Prod4,
  43:         Product5 as Prod5,
  44:         Product6 as Prod6,
  45:         Product7 as Prod7,
  46:         Product8 as Prod8
  47: FROM    @TblEmployeeProducts
  48:  


After you execute the select statement (above) , Your table should look like.,
 

Employee        Prod2    Prod3    Prod4    Prod5    Prod6    Prod7    Prod8
----------------------------------------------------------------------------
James.Martin 0 0 0 1 0 1 0
Jim.Griffiths 0 0 0 1 2 0 0
Paul.Smith 0 0 1 1 0 0 0
Sudhir.Murthy 0 1 0 1 1 1 0
Sue.Steely 1 1 1 0 0 1 0
Tom.Hanks 1 0 0 1 1 1 0


Now, To rotate the columns of the above table values into row values, We can use UNPIVOT. This means that you must identify two additional columns. The column that will contain the column values that you are rotating (Product2, Product3,...) will be called ProductId, and the column that will hold the values that currently reside under the columns being rotated will be called TotalOrders. These columns correspond to the pivot_column and value_column, respectively, in the Transact-SQL definition.

Here is the query:-

 

example 5.


   1: /*Query to unpivot the Employee and Products to 
   2: rows of Employee, Products and TotalOrders*/
   3:  
   4: SELECT Employee,ProductId,TotalOrders
   5: FROM (
   6:         SELECT  Employee,
   7:                 Product2,
   8:                 Product3,
   9:                 Product4,
  10:                 Product5,
  11:                 Product6,
  12:                 Product7,
  13:                 Product8
  14:         FROM    @TblEmployeeProducts
  15: )Qry
  16: UNPIVOT (
  17:         TotalOrders FOR ProductId IN(Product2,
  18:                                      Product3,
  19:                                      Product4,
  20:                                      Product5,
  21:                                      Product6,
  22:                                      Product7,
  23:                                      Product8)
  24: )AS UnPvt
  25:     
  26:     

 

example 5 result.


   1: Employee        ProductId    TotalOrders
   2: -----------------------------------------
   3: James.Martin      Product2    0
   4: James.Martin      Product3    0
   5: James.Martin      Product4    0
   6: James.Martin      Product5    1
   7: James.Martin      Product6    0
   8: James.Martin      Product7    1
   9: James.Martin      Product8    0
  10: Jim.Griffiths     Product2    0
  11: Jim.Griffiths     Product3    0
  12: Jim.Griffiths     Product4    0
  13: Jim.Griffiths     Product5    1
  14: Jim.Griffiths     Product6    2
  15: Jim.Griffiths     Product7    0
  16: Jim.Griffiths     Product8    0
  17: Paul.Smith        Product2    0
  18: Paul.Smith        Product3    0
  19: Paul.Smith        Product4    1
  20: Paul.Smith        Product5    1
  21: Paul.Smith        Product6    0
  22: Paul.Smith        Product7    0
  23: Paul.Smith        Product8    0
  24: Sudhir.Murthy     Product2    0
  25: Sudhir.Murthy     Product3    1
  26: Sudhir.Murthy     Product4    0
  27: Sudhir.Murthy     Product5    1
  28: Sudhir.Murthy     Product6    1
  29: Sudhir.Murthy     Product7    1
  30: Sudhir.Murthy     Product8    0
  31: Sue.Steely        Product2    1
  32: Sue.Steely        Product3    1
  33: Sue.Steely        Product4    1
  34: Sue.Steely        Product5    0
  35: Sue.Steely        Product6    0
  36: Sue.Steely        Product7    1
  37: Sue.Steely        Product8    0
  38: Tom.Hanks         Product2    1
  39: Tom.Hanks         Product3    0
  40: Tom.Hanks         Product4    0
  41: Tom.Hanks         Product5    1
  42: Tom.Hanks         Product6    1
  43: Tom.Hanks         Product7    1
  44: Tom.Hanks         Product8    0

 

Note on UNPIVOT

 

Notice that UNPIVOT is not the exact reverse of PIVOT. PIVOT performs an aggregation and, therefore, merges possible multiple rows into a single row in the output. UNPIVOT does not reproduce the original table-valued expression result because rows have been merged. Besides, null values in the input of UNPIVOT disappear in the output, whereas there may have been original null values in the input before the PIVOT operation.

 

References : 1. Microsoft, MSDN.


Source Code : Complete SQL Script Here

No comments:

Post a Comment