Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
Mapping
The definition of the relationship and data flow between source and target objects.
Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
Staging Area
A place where data is processed before entering the warehouse.
Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a source to a data warehouse.
Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
Guidelines to work with Informatica Power Center
• Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.
• Power Center Client: Informatica client is used for managing users, identifiying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.
• Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.
• Power Center Server: Power Center server does the extraction from source and then loading data into targets.
• Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata from source systems.
Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target.
Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.
• Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).
• Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.
• Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.
• Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.
• Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.
• Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.
Power Exchange:
Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.
Power Channel:
This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer files over FTP, etc.
Meta Data Exchange:
Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc). By using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for informatica developer to create these data structures once again.
Power Analyzer:
Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.
With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model.
Super Glue:
Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze meta data.
Power Mart:
Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment.
Active Transformation
An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.
Transformations can be Connected or UnConnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.
Following are the list of Transformations available in Informatica:
• Aggregator Transformation
• Expression Transformation
• Filter Transformation
• Joiner Transformation
• Lookup Transformation
• Normalizer Transformation
• Rank Transformation
• Router Transformation
• Sequence Generator Transformation
• Stored Procedure Transformation
• Sorter Transformation
• Update Strategy Transformation
• XML Source Qualifier Transformation
• Advanced External Procedure Transformation
• External Transformation
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
Expression Transformation
Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.
Filter Transformation
Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and $1000.
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.
In order to join two sources, there must be atleast one matching port. at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
• Normal
• Master Outer
• Detail Outer
• Full Outer
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
Lookup Transformation
Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.
Difference between Connected and UnConnected Lookup Transformation:
Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values.
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.
Rank Transformation
Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.
Router Transformation
Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups. For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.
Sorter Transformation
Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation. The Source Qualifier performs the various tasks such as overriding default SQL query, filtering records; join data from two or more tables etc.
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data driven.
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources.
Advanced External Procedure Transformation
Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.
External Procedure Transformation
External Procedure transformation is an Active and Connected/UnConnected transformations. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.
Differences between Advanced External Procedure and External Procedure Transformations:
External Procedure returns single value, where as Advanced External Procedure returns multiple values.
External Procedure supports COM and Informatica procedures where as AEP supports only Informatica Procedures.
Data Modelling
A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design.
Data model helps functional and technical team in designing the database. Functional team normally refers to one or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.
The concept of data modeling can be better understood if we compare the development cycle of a data model to the construction of a house. For example Company ABC is planning to build a guest house(database) and it calls the building architect(data modeler) and projects its building requirements (business requirements). Building architect(data modeler) develops the plan (data model) and gives it to company ABC. Finally company ABC calls civil engineers(DBA) to construct the guest house(database).
Need for developing a Data Model:
• A new application for OLTP(Online Transaction Processing), ODS(Operational Data Store), data warehouse and data marts.
• Rewriting data models from existing systems that may need to change reports.
• Incorrect data modeling in the existing systems.
• A data base that has no data models.
Advantages and Importance of Data Model
• The goal of a data model is to make sure that all data objects provided by the functional team are completely and accurately represented.
• Data model is detailed enough to be used by the technical team for building the physical database.
• The information contained in the data model will be used to define the significance of business, relational tables, primary and foreign keys, stored procedures, and triggers.
• Data Model can be used to communicate the business within and across businesses.
Data Modeling Tools
There are a number of data modeling tools to transform business requirements into logical data model, and logical data model to physical data model. From physical data model, these tools can be instructed to generate sql code for creating database
Popular Data Modeling Tools
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Data Modeling Tools: What to Learn?
Data modeling tools are the only way through which we can create powerful data models. Following are the various options that we have to know and learn in data modeling tools before start building data models.
Logical Data Model:
» How to create entity and add definition, business rule?
» How to create domains?
» How to create an attribute and add definition, business rule, validation rules like default values and check constraint?
» How to create supertypes, subtypes?
» How to create primary keys, unique constraint, foreign key relationships, and recursive relationships?
» How to create identifying and non-identifying relationship?
» How to assign relationship cardinality?
» How to phrase relationship connecting two tables?
» How to assign role names?
» How to create key groups?
» How to create sequence no's?
Physical Data Model:
» How to rename a table?
» How to rename a column,validation rules like default and check constraints?
» How to assign NULL and NOT NULL to columns?
» How to name foreign key constraints?
» How to connect to databases like MS Access, Oracle, Sibase, Terradata etc?
» How to generate sql code from data model to run against databases like MS Access, Oracle, Sibase, Terradata etc.?
» How to create a data model from an existing database like MS Access, Oracle, Sibase, Terradata etc.?
» How to add database related properties to tables, indexes?
» How to check different versions of the data model?
» How many data modelers can concurrently work on the same version of a data model?
Dimensional Data Model:
» Is there any specific notation to identify a Data Warehouse/Data mart data models?
Subject Area:
» How to create subject area and assign relevant entities to subject area?
Reports:
» How to generate reports from data model and export to .XLS, .DOC, .XML file formats?
Naming Options:
» Is there any method to change the entity/table, attribute/column name from upper case to lower case or lower case to upper case?
Import & Export:
» How to create data models from .xls, .txt files etc.?
» How to import and export meta data into ETL tools?
Abbreviation Document:
» How to create/attach a standard abbreviation document(for naming tables, columns etc.)?
Print:
» How to send data models to printer/plotter/Acrobat Reader?
Backup:
» How to take backup of data model?
Others:
» How to split a data model to logical and physical data model?
» How to copy and paste objects within data model and across data models?
» How to search an object within a data model?
» How to change the font size and color of entities,attributes,relationship lines?
» How to create a legend?
» How to show a data model in different levels like entity level, attribute level, and definition level?
Erwin Tutorial
All Fusion Erwin Data Modeler commonly known as Erwin, is a powerful and leading data modeling tool from Computer Associates. Computer Associates delivers several softwares for enterprise management, storage management solutions, security solutions, application life cycle management, data management and business intelligence.
Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by using its Forward Engineering technique or Erwin can be used to create data models from the existing database by using its Reverse Engineering technique.
Erwin workplace consists of the following main areas:
• Logical: In this view, data model represents business requirements like entities, attributes etc.
• Physical: In this view, data model represents physical structures like tables, columns, datatypes etc.
• Modelmart: Many users can work with a same data model concurrently.
What can be done with Erwin?
• Logical, Physical and dimensional data models can be created.
• Data Models can be created from existing systems(rdbms, dbms, files etc.).
• Different versions of a data model can be compared.
• Data model and database can be compared.
• SQl scripts can be generated to create databases from data model.
• Reports can be generated in different file formats like .html, .rtf, and .txt.
• Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers, .sql, .cmt, .df, .dbf, and .mdb files.
• By using ModelMart, concurrent users can work on the same data model.
In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed in your system. If you have installed Modelmart, then more than one user can work on the same model.
How to create a Logical Data Model:
In the following section, a simple example with a step by step procedure to create a logical data model with two entities and their relationship are explained in detail.
1: Open All Fusion Erwin Data Modeler software.
2: Select the view as "Logical" from the drop-down list. By default, logical will be your workplace.
3: Click New from File menu. Select the option "Logical/Physical" from the displayed wizard. Click Ok.
4: To create an Entity, click the icon "Entity" and drop it on the workplace. By default E/1 will be displayed as the entity name. Change it to "Country".
5: To create an Attribute, Place the cursor on the entity "Country" and right click it. From the displayed menu, click attributes which will take you to the attribute wizard. Click "New" button on the wizard and type attribute name as "Country Code". Select the data type as "String" and click OK. Select the option Primary Key to identify attribute "Country Code" as the primary key. Follow the same approach and create another attribute "Country Name" without selecting the primary key option. Click ok, and now you will be having 2 attributes Country Code, and Country Name under the entity "Country" in the current logical workplace.
6: Create another entity "Bank" with two attributes namely Bank Code and Bank Name by following steps 4 and 5.
7: In order to relate these two tables country, bank, a Foreign Key relationship must be created. To create a Foreign Key relationship, follow these steps.
(a) Click the symbol "Non Identifying Relationship".
(b) Place the cursor on the entity "Country".
(c) Place the cursor on the entity "Bank". Now you can see the relationship(a line drawn from Bank to Country) between "Country" and "Bank". Double click on that relationship line to open "Relationships wizard" and change the option from "Nulls Allowed" to "No Nulls" since bank should have a country code.
The Logical Data Model created by following the above steps looks similar to the following diagram.
How to create a Physical Data Model:
1: Change the view from "Logical to Physical" from the drop down list.
2: Click "Database" from main menu and then click "Choose Database" from the sub menu. Then select your target database server where the database has to be created. Click ok.
3: Place the cursor on the table "Country" and right click it. From the displayed menu, click columns which will take you to the column wizard. Click the "Database Tab", which is next to "General Tab" and assign datatypes "VARCHAR2(10), VARCHAR2(50) for columns COUNTRY_CODE and COUNTRY_NAME respectively. Change the default NULL to NOT NULL for the column COUNTRY_NAME. Similarly, repeat the above step for the BANK table. Once you have done all of these, you can see the physical version of the logical data model in the current workplace.
The Physical Data Model created by following the above steps looks similar to the following diagram.
How to generate DDL(sql) scripts to create a database:
1: Select the view as Physical from the drop down list.
2:Click "Tools" from main menu and then click "Forward Engineer/Schema Generation" from the sub menu which will take you to the "Schema Generation Wizard". Select the appropriate properties that satisfies your database requirements like schema, table, primary key etc. Click preview to see your scripts. Either you can click to generate the table in a database or you can store the scripts and run against the database later.
The DDL(sql) scripts generated by Erwin by following the above steps looks similar to the following script.
CREATE TABLE Country(Country_Code VARCHAR2(10) NOT NULL, Country_Name VARCHAR2(50) NOT NULL, CONSTRAINT PK_Country PRIMARY KEY (Country_Code)); CREATE TABLE Bank(Bank_Code VARCHAR2(10) NOT NULL, Bank_Name VARCHAR2(50) NOT NULL, Country_Code VARCHAR2(10) NOT NULL, CONSTRAINT PK_Bank PRIMARY KEY(Bank_Code) ); ALTER TABLE Bank ADD( CONSTRAINT FK_Bank FOREIGN KEY (Country_Code) REFERENCES Country );
Data Modeling Development Cycle
Gathering Business Requirements - First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.
Conceptual Data Modeling(CDM) - Second Phase:
This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.
Logical Data Modeling(LDM) - Third Phase:
This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.
Physical Data Modeling(PDM) - Fourth Phase:
This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.
Database - Fifth Phase:
DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.
Standardization Needs | Modeling data:
Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
Nowadays, business to business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.
For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.
Table Names Standardization:
Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organization’s standards. If the table name’s length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.
Examples:
• Lookup - LKP - Used for Code, Type tables by which a fact table can be directly accessed.
e.g. Credit Card Type Lookup - CREDIT_CARD_TYPE_LKP
• Fact - FCT - Used for transaction tables:
e.g. Credit Card Fact - CREDIT_CARD_FCT
• Cross Reference - XREF - Tables that resolves many to many relationships.
e.g. Credit Card Member XREF - CREDIT_CARD_MEMBER_XREF
• History - HIST - Tables the stores history.
e.g. Credit Card Retired History - CREDIT_CARD_RETIRED_HIST
• Statistics - STAT - Tables that store statistical information.
e.g. Credit Card Web Statistics - CREDIT_CARD_WEB_STAT
Column Names Standardization:
Some general guidelines are listed below that may be used as a prefix or suffix for the column.
Examples:
• Key - Key System generated surrogate key.
e.g. Credit Card Key - CRDT_CARD_KEY
• Identifier - ID - Character column that is used as an identifier.
e.g. Credit Card Identifier - CRDT_CARD_ID
• Code - CD - Numeric or alphanumeric column that is used as an identifying attribute.
e.g. State Code - ST_CD
• Description - DESC - Description for a code, identifier or a key.
e.g. State Description - ST_DESC
• Indicator - IND - to denote indicator columns.
e.g. Gender Indicator - GNDR_IND
Database Parameters Standardization:
Some general guidelines are listed below that may be used for other physical parameters.
Examples:
• Index - Index - IDX - for index names.
e.g. Credit Card Fact IDX01 - CRDT_CARD_FCT_IDX01
• Primary Key - PK - for Primary key constraint names.
e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01
• Alternate Keys - AK - for Alternate key names.
e.g. Credit Card Fact AK01 - CRDT_CARD_FCT_AK01
• Foreign Keys - FK - for Foreign key constraint names.
e.g. Credit Card Fact FK01 - CRDT_CARD_FCT_FK01
Steps to create a Data Model
These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.
Sometimes, data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.
1» Get Business requirements.
2» Create High Level Conceptual Data Model.
3» Create Logical Data Model.
4» Select target DBMS where data modeling tool creates the physical schema.
5» Create standard abbreviation document according to business standard.
6» Create domain.
7» Create Entity and add definitions.
8» Create attribute and add definitions.
9» Based on the analysis, try to create surrogate keys, super types and sub types.
10» Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain.
11» Create primary or unique keys to attribute.
12» Create check constraint or default to attribute.
13» Create unique index or bitmap index to attribute.
14» Create foreign key relationship between entities.
15» Create Physical Data Model.
15» Add database properties to physical data model.
16» Create SQL Scripts from Physical Data Model and forward that to DBA.
17» Maintain Logical & Physical Data Model.
18» For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences.
19» Create a change log document for differences between the current version and previous version of the data model.
Data Modeler Role
Business Requirement Analysis:
» Interact with Business Analysts to get the functional requirements.
» Interact with end users and find out the reporting needs.
» Conduct interviews, brain storming discussions with project team to get additional requirements.
» Gather accurate data by data analysis and functional analysis.
Development of data model:
» Create standard abbreviation document for logical, physical and dimensional data models.
» Create logical, physical and dimensional data models(data warehouse data modelling).
» Document logical, physical and dimensional data models (data warehouse data modelling).
Reports:
» Generate reports from data model.
Review:
» Review the data model with functional and technical team.
Creation of database:
» Create sql code from data model and co-ordinate with DBAs to create database.
» Check to see data models and databases are in synch.
Support & Maintenance:
» Assist developers, ETL, BI team and end users to understand the data model.
» Maintain change log for each data model.
Data Modeling Report
From Data Modeling tools, reports can be easily generated for technical and business needs. The reports that have been generated from logical data model and physical data model are called as business reports and technical reports respectively. Most of the data modeling tools provide default reports like subject area reports, entity reports, attribute reports, table reports, column reports, indexing reports, relationship reports etc. The advantage of these reports is, whether they are technical or non-technical, everybody would understand what is going on within the organization.
Other than default reports provided by data modeling tools, a data modeler can also create customized reports as per the needs of an organization. For example, if an expert asks of both logical and physical reports of a particular subject area in one file(e.g in .xls), logical and physical reports can be easily merged and reports can be easily generated accordingly. Data Modeling tools provide the facility of sorting, filtering options and the reports can be exported into file formats like .xls, .doc, .xml etc.
Logical Data Model Report:
Logical Data Model Report describes information about business such as the entity names, attribute names, definitions, business rules, mapping information etc.
Logical Data Model Report Example:
Physical Data Model Report:
Physical Data Model Report describes information such as the ownership of the database, physical characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc), performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is used by the technical team.
Physical Data Model Report Example:
Conceptual Data Modeling
Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.
Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.
Conceptual Data Model - Highlights
• CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business of an organization.
• CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.
• CDM discussion starts with main subject area of an organization and then all the major entities of each subject area are discussed in detail.
• CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities.
• CDM contains data structures that have not been implemented in the database.
• In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model.
Print this post
No comments:
Post a Comment