Module Objectives
After
completing this unit , you should be able to understand about:
–Date
Generation transformer
–Key Generation
transformer
–Data
Transformer
DATE_GENERATION TRANSFORM
• Produces a
series of dates incremented as you specify
• Use this
transform to produce the key values for a time dimension target.
• From this
generated sequence you can populate other fields in the time dimension (such as
day_of_week) using functions in a query.
• Data Outputs:
A data set with a single column named DI_GENERATED_DATE containing the date
sequence.
• The rows
generated are flagged as INSERT.
• Generated
dates can range from 1900.01.01 through 9999.12.31
OPTIONS
• Start dateThe
first date in the output sequence. Specify this date using the following
format: YYYY.MM.DD where YYYY a year, MM is a month value, and DD is a day
value.
• End dateThe
last date in the output sequence. Use the same format used for Start dateto
specify this date.
• IncrementThe
interval between dates in the output sequence. Select Daily, Monthly, or
Weekly.
• Join rank
• A positive
integer indicating the weight of the output data set if the data set is used in
a join. Sources in the join are accessed in order based on their join ranks.
The highest ranked source is accessed first to construct the join.
• Cache
• Select this
check box to hold the output from the transform in memory to be used in
subsequent transforms. Select Cacheonly if the resulting data set is small
enough to fit in memory.
KEY GENERATION
•Generates new keys
for new rows in a dataset.
•When it is necessary
to generate artificial keys in a table, the Key_Generation transform looks up the
maximum existing key value from a table and uses it as the starting value to generate
new keys.
•The transform expects
the generated key column to be part of the input schema.
OPTIONS
• Table Name
–The fully
qualified name of the source table from which the maximum existing key is
determined. Should already be imported.
• Generated key
column
–The column in the
key source table containing the
existing keys
values. A column with the same name
must exist in
the input data set;
• Increment
value
–The interval
between generated key values.
DATA TRANSFER TRANSFORM
•Data Integrator
generates SQL SELECT statements to retrieve the data from source data bases.
•Data Integrator
automatically distributes the processing work load by pushing down as much as possible
to the source data base server.
•Pushing down operations
provides the following advantages:
–Use the power of
the data base server to execute SELECT operations (such as joins, GroupBy, and common
functions such as decode and string functions).Often the data base is optimized
for these operations.
–Minimize the amount
of data sent over the network. Less rows can be retrieved when the SQL statements
include filters or aggregations.
•You can also do
a full push down from the source to the target, which means Data Integrator sends
SQL INSERT INTO...SELECT statements to the target data base.
•Data Transfer transforms
Writes the data from a source or the output from another transform into a transfer
object and subsequently reads data from the transfer object.
•The transfer type
can be a relational data base table, persistent cache table, or file.
•Use the Data_Transfer
transform to push down operations to the data base server when the transfer type
is a data base table.
•You can push down
resource- consuming operations such as joins, GROUPBY, and sorts.
EXAMPLE
•This simple data
flow contains a Query transform that does a lookup of sales sub totals and groups
the results by country and region.
Suppose the GROUPBY
operation processes millions of rows.
•Data Integrator
cannot push the GROUPBY operation down to the data base because the Query before
it contains a lookup_ext function which Data Integrator cannot push down.
•You can add a Data_Transfertrans
form to split the lookup_ext function and the GROUPBY operation into two sub data
flows to enable Data
•Integrator to push
the GROUPBY to the target data base.
•When you execute
the job, Data Integrator displays messages for each sub data flow. Also watch table
getting created & dropped during job run.
BUILT IN TRANSFORMS & OPERATION CODES
•Transforms manipulate
data inputs and produce one or more output datasets.
•Operation codes
maintain the status of each row in each data set described by inputs to and outputs
from objects in data flows.
OPERATION CODES
•Normal Creates
a New row in the target. All rows in a data set are flagged as normal when they
are extracted from source table. Most of the transforms operate only on rows flagged
as NORMAL
•Insert Rows can
be flagged as INSERT by the Table_Comparison transforms to indicate that a change
occurred in a data set as compared with an earlier image of the same data set. The
Map_Operation transform can also produce rows flagged as INSERT. Only History_Preserving
and Key_Generation transforms can accept data sets with rows flagged as INSERT as
input.
•Delete is ignored
by the target. Rows flagged as DELETE are not loaded. Rows can be flagged as DELETE
in the Map_Operation and Table Comparison transforms. Only the History_Preserving,
transform with the Preserve delete row(s) as update row(s) option selected, can
accept data sets with rows flagged as DELETE.
•Update Rows can
be flagged as UPDATE by the Table_Comparison transform to indicate that a change
occurred in a data set as compared with an earlier image of the same dataset. Map_Operation
transform can also produce rows flagged as UPDATE. Only History_Preserving and Key_Generation
transforms can accept data sets with rows flagged as UPDATE as inputs
Print this post
No comments:
Post a Comment