A data warehouse has four basic components. These four components are used to create an integrated and centralized collection of data that creates a strategy to help promote decision making and support amongst managerial staff in organizations and companies. Although a data warehouse can be time consuming and expensive, it holds many benefits. It is typically used for data analysis and queries, with the data being accessed using application software that puts the data in a usable form. The four data warehouse components are:
Integrated – the data warehouse consists of data taken from all sources from an organization. It is integrated and consolidated into one format across all sources, departments, etc. For example: one department of an organization might consider a product ranking as a “1,” “2,” or “3,” while another department might rank their products as an “A,” “B,” or “C.” A common format is needed to avoid such redundancy or data cross-overs. By integrating data into a single uniform format, it aids in the operational structure of the organization by making all data elements clear.
Subject-Oriented – Data can be centralized into a data warehouse by certain subjects. The organization of a data warehouse may include finance, statistics, sales, production, marketing, etc. Further, each subject can contain additional data topics such as individual customers or regions. Subject-oriented data differs as it keeps track of specific data instead of operations or processes, as you see in process-oriented organizations.
Time-Warrant – When data is entered into a data warehouse it is time stamped, or given a time ID, which can’t be changed. This promotes accurate back reports. By doing this data becomes more useful as data reports can be generated to show the flow of data over a certain amount of time whether it is a day, week, month, or a year. By having time-variant data organizations can also predict and project data as the warehouse constantly updates itself with current uploaded data. Time-variant data allows organizations to see a snap-shot in time of data history.
Nonvolatile – Data entered into the data warehouse is never deleted or changed, it remains static. This allows accurate data history with the allowance of database growth with constant updated new data. A data warehouse can grow to require vast amounts of storage space, into the multi-terabytes, which often requires organizations to “roll-off” the oldest data as new data is entered.
Normalized versus dimensional approach for storage of data
There are two leading approaches to storing data in a data warehouse — the dimensional approach and the normalized approach. The dimensional approach “Kimballites” that believe in Ralph Kimball’s approach in which it is stated that the data warehouse should be modeled using a Dimensional Model/star schema and the normalized approach also called the 3NF model “Inmonites” the believe in Bill Inmon approach in which it is stated that the data warehouse should be modeled using an E-R model/normalized model.
In a dimensional approach, transaction data are divided into either "facts", which are generally numeric transaction data, or "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
A key advantage of a dimensional approach is ease of use and understanding of the data warehouse. Additionally, the retrieval of data from the data warehouse tends to operate very quickly. Dimensional structures are also easy to understand for business users. This is based on the fact that the structured is divided into measurements/facts and context/dimensions. Facts are related to the organization’s business processes and operational system whereas the dimensions surrounding them contain context about the measurement (Kimball, Ralph 2008).
The main disadvantages of the dimensional approach are:
- In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is complicated.
- It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does business.
In the normalized approach, the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.). The normalized structure divides data into entities, which creates several tables in a relational database. When applied in large enterprises the result is dozens of tables that are linked together by a web of joints. Furthermore, each of the created entities is converted into separate physical tables when the database is implemented (Kimball, Ralph 2008). The main advantage of this approach is that it is straightforward to add information into the database. A disadvantage of this approach is that, because of the number of tables involved, it can be difficult for users both to:
- join data from different sources into meaningful information and then
- access the information without a precise understanding of the sources of data and of the data structure of the data warehouse.
It should be noted that both normalized – and dimensional models can be represented in entity-relationship diagrams as both contain jointed relational tables. The difference between the two models is the degree of normalization.
These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree (Kimball, Ralph 2008).
References: "Intranet Journal: Feature: Finding Your Way Around E-commerce." Intranet Journal - Intranet Design, Development, Content Management, Collaboration Tools, Microsoft SharePoint, Wikis, Lotus Notes - Intranet Journal. Web. 18 Oct. 2009. http://www.intranetjournal.com/features/datawarehousing.html.
Rob, Peter, and Carlos Coronel. Database Systems Design, Implementation, and Management, Eighth Edition. Boston: Course Technology, 2009.
"Kimball Group: Data Warehouse Training: Articles." Kimball Group: Data Warehouse Training, Consulting, and Kimball University. Web. 30 Nov. 2010. <http://www.kimballgroup.com/html/articles.html>.