Sunday, July 25, 2010

AGGREGATE TRANSFORMATION



What is the Aggregate Transformation?

The Aggregate Transformation provides a means of carrying out some simple aggregations on data pushed through SSIS, similar to those found in SQL where using “Group By” clause. The available aggregations are:
Group by
Sum
Average
Count
Count distinct
Minimum
Maximum



























Improving Performance in the Aggregate Transformation
The Aggregate Transformation is pretty quick as it runs in memory, but if you are shifting very large volumes of data through it and it is slowing down there are a few tweaks available. First is the Keys and KeyScale properties. These tell the component how many “Group By” distinct groups it should be prepared to handle. By default the value for KeyScale is “Unspecified”, but can be set to low (up to 500,000 keys), medium (up to 5m keys) or high (25m keys). If you are more certain of how many Keys you will be writing you can use the Keys property, which overrides KeyScale, and you can enter the amount of expected Keys. This can either be set per Aggregation output in the advanced editor grid, or globally using the Advanced tab of the editor. If you are using a CountDistinct aggregation you can set the CountDistinctScale and CountDistinctKeys properties which operate in the same way. Usually there is no need to adjust these properties.

When would you use the Aggregate Transformation?
As a rough rule, you should use the Aggregate transformation only when the data source cannot efficiently support the Aggregation processes by itself. If you are reading in data from a relational source, usually it will make more sense to have the server aggregate the data in a query before passing it into SSIS. An exception to this may be if you are hitting a live system and cannot afford to (or are not allowed to) load the server with queries. If you were reading from a Flat File source then you would have to use the Aggregate Transformation as the File System doesn’t provide any means to perform data operations.


Print this post