Monday, August 16, 2010

PIVOT TRANSFORMATION

What does the Pivot Transformation do?
This bit is fairly straightforward – it takes row values and converts them to columns, moving – but not aggregating -  a value as it does so.

How to configure the Pivot Transformation
Step 1: Configure the Inputs
The first step is to open the Pivot Transformation and on the Input Columns tab select the columns that are going to be used in the Pivot. Once that is done, switch to the Input and Output Properties tab. Here we are going to set how each column is used in the Pivot operation. This is done by setting the PivotUsage property of the Input Columns. This can be found by expanding Pivot Default Input > Input Columns and selecting the relevant Input Column. Down at the bottom of the list under the Custom header is the property PivotUsage. This can be set to one of four values:
0 – the column is passed through unaffected
1 – the column values become the rows of the pivot (aka the Set Key)
2 – the column values become the column names of the pivot (aka the Pivot Column)
3 - the column values that are pivoted in the pivot
Now something you have to configure upstream from this is that whatever column will be the rows of the pivot (PivotKeyUsage = 1) needs to be sorted for the component to work properly. The Pivot component iteslf doesn’t force this (as it does in a Merge Join) so you need to go back and make sure that your column is sorted by the time it hits the pivot. Setting the IsSorted and SortKeyPosition properties of the Set Key column upstream will make no difference as the Pivot Transformation won’t actually pay any attention to these settings, but it may cause unexpected results if the data itself isn’t sorted. You can see how this affects the output by running Data Flow 1a, which is the same as Data Flow 1 but not sorted properly.
Step 2: Configure the Outputs
Despite everything you’ve done so far, this component will have exactly zero outputs – each has to be manually defined. So you will need an output for your Rows  and an output for each of your Pivoted column’s values that you want to output. So in my example I will need to create an output for each year that is in the source data – you cannot ignore a value as the component will return an error. For example, if I forgot to add a column for Components, I would get the error “The pivot key value “Components” is not valid” when I tried to execute.
To create an output, on the Inputs and Outputs tab expand the Pivot Default Output > OutputColumns and start clicking the Add Column button until you have enough columns. Then for each column you need to set the following properties:
Name – the name for the output column – so if you were pivoting a value of year you could rename it to year2001 in the output
PivotKeyValue – the value in the pivoted column that will go into this output
SourceColumn – the Lineage ID of the column that is to be used in the pivot. What this means is you need to go back to your input columns, and get the value of the LineageID property of the column that you set the PivotUsage of to 3 for pivoted columns, and the LineageID of the Column you set to PivotUsage of 1 for the Row Column.































Print this post

No comments: