Business Object
Data Integration
Module
1: Introduction to BI and ETL
Module
ObjectivesAfter completing this unit , you should be able to :
•Describe
about Business Intelligence
•Describe
about Databwarefeatures, Datawarehouseschemas
•List
and describe about OLTP and DWH
•Describe
about ETL Process
•List
and Describe about Various BI tools including ETL,Database, Reporting and
design tools
What is BI?
•Business
intelligence (BI) is a broad category of application programs and technologies
for gathering, storing, analyzing, and providing access to data to help
enterprise users make better business decisions.
•BI
applications include the activities of decision support, query and reporting,
online analytical processing (OLAP), statistical analysis, forecasting, and
data mining.
BI-Nutshell
Data, Data
everywhere yet ...
•I can’t
find the data I need
–data is
scattered over the network
–many
versions, subtle differences
I can’t get
the data I need
need an expert
to get the data
I can’t
understand the data I found
available data
poorly documented
I can’t use
the data I found
results are
unexpected
data needs to
be transformed from one form to other
What is a
Data Warehouse?
A single,
complete and consistent store of data obtained from a variety of different
sources made available to end users in a what they can understand and use in a
business context.
[Barry Devlin]
What are the
users saying...
•Data
should be integrated across the enterprise
•Summary
data has a real value to the organization
•Historical
data holds the key to understanding data over time
•What-if
capabilities are required
What is Data
Warehousing?
A processof
transforming data into information and making it available to users in a timely
enough manner to make a difference
Data
Warehouse
•A data
warehouse is a
–subject-oriented
–integrated
–time-varying
–non-volatile
–Accessible
collection of
data that is used primarily in organizational decision making.
--Bill Inmon,
Building the Data Warehouse 1996
OLTP vs Data Warehouse
|
|
•OLTP
|
•Warehouse (DSS)
|
–Application Oriented
|
–Subject Oriented
|
–Used to run business
|
–Used to analyze business
|
–Detailed data
|
–Summarized and refined
|
–Current up to date
|
–Snapshot data
|
–Isolated Data
|
–Integrated Data
|
–Repetitive access
|
–Ad-hoc access
|
–Clerical User
|
–Knowledge User (Manager)
|
OLTP vs Data Warehouse
|
|
•OLTP
|
•Data Warehouse
|
–Performance Sensitive
|
–Performance relaxed
|
–Few Records accessed at a time (tens)
|
–Large volumes accessed at a time(millions)
|
–Read/Update Access
|
–Mostly Read (Batch Update)
|
–No data redundancy
|
–Redundancy present
|
–Database Size 100MB -100 GB
|
–Database Size 100 GB -few terabytes
|
–Thousands of users
|
–Hundreds of users
|
To summarize
...
•OLTP
Systems are used to “run”a business
•The
Data Warehouse helps to “optimize”the business
Data
Warehouses:Architecture, Design & Construction
•DW
Architecture
•Loading,
refreshing
•Structuring/Modeling
•DWs and
Data Marts
Data Warehouse
Architectures
•Generic
Two-Level Architecture
–Independent
Data Mart
–Dependent
Data Mart and Operational Data Store
All involve some form of extraction, transformationand
loading(ETL)
Conceptual
Modeling of Data WarehousesModeling data warehouses: dimensions & measures
–Star schema
–Snowflake
schema
–Fact
constellations
*********************************************************************************
*****************************************************************************
The ETL Process
•Capture
•Scrub or data
cleansing
•Transform
•Load
ETL = Extract,
transform, and load
*****************************************************************************************************************************************************************
********************************************************************************
*********************************************************************************
Print this post
No comments:
Post a Comment