Module Objectives
After completing this unit , you should be able to
understand about:
–Map Operation transformer
–Table Comparison transformer
–History Preserving transformer
–Auditing transformer
–Handling Slowly Changing Dimensions
MAP_OPERATION
• Allows conversions between data manipulation operations.
• The Map_Operation transform allows you to change
operation codes on data sets to produce the desired output.
TABLE_COMPARISON
• Compares two data sets and produces the difference
between them as a data set with rows flagged as INSERT, UPDATE, or DELETE.
• The Table_Comparison transform allows you to detect and
forward changes that have occurred since the last time a target was updated.
DATA INPUTS
• The data set from a source or the output from another
transform. Only rows flagged as NORMAL are considered by the transform. This data
is referred to as the input data set.
• The specification for a database table to compare to
the input data set. This table is referred as the comparison table.
COMPARISON METHOD
Row-by-row select
—Select this option to have the transform look up the
target table using SQL every time it receives an input row. This option is best
if the target table is large compared to the number of rows the transform will
receive as input.
Cached comparison
table
—Select this option to load the comparison table into
memory. In this case, queries to the comparison table access memory rather than
the actual table. This option is best when you are comparing the entire target
table. Data Integrator uses page able cache as the default. If the table fits
in the available memory, you can change the cache type to in-memory in the data
flow Properties.
Sorted input
—Select this option to read the comparison table in the order
of the primary key column(s) using sequential read. This option improves
performance because Data Integrator reads the comparison table only once.
DATA INPUTS
•Input primary key
column(s)
–The input data set columns that uniquely identify each
row. These columns must be present in the comparison table with the same column
names and data types.
•Compare columns
–(Optional) Improves performance by comparing only the
sub-set of columns you drag into this box from the input schema. If no columns
are listed, all columns in the input data set that are also in the comparison
table are used as compare columns.
•Generated Key
Column
–For an UPDATE, the output data set will contain the
largest generated key found for a given primary key.
•Detect deleted
row(s) from comparison table
–(Optional) Generates DELETEs for all rows that are in
the comparison table and not in the input set . Assumes the input set
represents the complete data set. By default this option is turned off.
DATA OUTPUTS
• A data set containing rows flagged as INSERT, UPDATE or
DELETE.
• This data set contains only the rows that make up the
difference between the two input sources: one from the input to the transform
(input data set), and one from a database table you specify in the transform
(the comparison table).
HISTORY_PRESERVING
• The History_Preserving transform allows you to produce
a new row in your target rather than updating an existing row.
• You can indicate in which columns the transform identifies
changes to be preserved.
• If the value of certain columns change, this transform
creates a new row for each row flagged as UPDATE in the input data set.
DATA INPUTS
• A data set that is the result of a comparison between
two images of the same data in which changed data from the newer image are
flagged as UPDATE rows and new data from the newer image are flagged as INSERT
rows.
SCD-2
•Date Based
•Flag Based
FLAG BASED
DIMENSION
• Compare columns
–Rows flagged as Insert should be inserted with current
flag as ‘A’
–Input might have been flagged as update because of phone
no
change. This row should be updated.
Current flag does not change
–If input is flagged as update because of state change, this
row
should be inserted with current flag as ‘A’ &
existing row should
be updated with current flag as ‘I’
AUDITING
• Auditing provides a way to ensure that a data flow
loads correct data into the
warehouse. Use auditing to perform the following tasks:
• Define audit points to collect run time statistics
about the data that flows out of
objects.
• Define rules with these audit statistics to ensure that
the data at the following
points in a data flow is what you expect:
–Extracted from sources
–Processed by transforms
–Loaded into targets
• Generate a run time notification that includes the
audit rule that failed.
AUDIT
CONFIGURATION
•Audit label—The
unique name in the data flow that Data Integrator generates for the audit statistics
collected for each audit function that you define. You use these labels to define
audit rules for the data flow.
•Audit rule—A Boolean
expression in which you use auditlabe is to verify the Data Integrator job. If you
define multiple rules in a data flow, all rules must succeed or the audit fails.
•Actions on audit failure—One
or more of three ways to generate notification of an audit rule (or rules) failure:
email, custom script, raise exception.
Print this post
No comments:
Post a Comment