Thursday, April 21, 2011

ETL Process



ETL process

ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:

- extracting the data from source systems (SAP, ERP, other oprational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.

# - transforming the data may involve the following tasks: applying business rules (so-called derivations, e.g., calculating new measures and dimensions),
# cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
# filtering (e.g., selecting only certain columns to load),
# splitting a column into multiple columns and vice versa,
# joining together data from multiple sources (e.g., lookup, merge),
# transposing rows and columns,
# applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing)

- loading the data into a data warehouse or data repository other reporting applications

ETL Tols:

ETL tools

List of the most popular ETL tools:

# Informatica - Power Center
# IBM - Websphere DataStage(Formerly known as Ascential DataStage)
# SAP - BusinessObjects Data Integrator
# IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream)
# Microsoft - SQL Server Integration Services
# Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor)
# SAS - Data Integration Studio
# Oracle - Warehouse Builder
# AB Initio
# Information Builders - Data Migrator
# Pentaho - Pentaho Data Integration
# Embarcadero Technologies - DT/Studio
# IKAN - ETL4ALL
# IBM - DB2 Warehouse Edition
# Pervasive - Data Integrator
# ETL Solutions Ltd. - Transformation Manager
# Group 1 Software (Sagent) - DataFlow
# Sybase - Data Integrated Suite ETL
# Talend - Talend Open Studio
# Expressor Software - Expressor Semantic Data Integration System
# Elixir - Elixir Repertoire
# OpenSys - CloverETL

DWH Databse Softwares

OLAP DATA BASES (DWH)LIST

1.TERADATA from NCR
2.GREENPLUM from EMC (Build on POST GRE SQL)
3.NETIZZA
4.ORACLE (after10 version)
5.MS SQL SERVER
These are supported SMP,MPP

MEATADATA DATA BASE

ORACLE
MS SQL SERVER

OLTP SOURCE DATABASES
ORACLE
MSSQL
MYSQL
Progress
SAP (inside ORACLE DATABASE only)
CYBASE

Teradata's main competitors are other high-end solutions from vendors such as Oracle, IBM, and Sybase IQ
which is based on a massively parallel/shared nothing architecture. Recent competition has arisen from data warehouse appliance vendors such as Netezza[14] (IBM is set to acquire Netezza[15]), DATAllegro (acquired in August 2008 by Microsoft), Aster Data Systems (Teradata bought an 11 % stake in Aster Data Systems in September 2010, and agreed to pay additional $263m to take a full ownership in March 2011[16]), ParAccel, Greenplum (acquired in July 2010 by EMC), and Vertica Systems (acquired in Feb 2011 by HP), and from packaged data warehouse applications such as SAP and Kalido. These have slowed Teradata's penetration into the mid-market and some verticals, particularly energy. Kickfire, once a competitor, was acquired by Teradata in August 2010.

BI Software Tool

Business Intelligence(BI) software vendors List
The business intelligence market includes a number of platform or software vendors, often categorized into:
The consolidated big four "megavendors", which are
1. SAP BusinessObjects
2. IBM Cognos
3. Oracle Hyperion -ESSBase-OWB,OBI
4. Microsoft BI
These are corporations who have entered the BI market through a recent trend of acquisitions in the industry.
The remaining independent "pure-play" vendors,
The largest and invidual being
1.SAS
2.MicroStrategy