文章基本信息

标题：Data warehouse: emphasis in decision support.
作者：Malley, John ; Griffin, Ken ; Smallwood, Keith 等
期刊名称：Academy of Information and Management Sciences Journal
印刷版ISSN：1524-7252
出版年度：1999
期号：July
语种：English
出版社：The DreamCatchers Group, LLC
关键词：Corporations;Customer relations;Information storage and retrieval;Metadata;Risk assessment

Data warehouse: emphasis in decision support.

Malley, John ; Griffin, Ken ; Smallwood, Keith 等

INTRODUCTION

A data warehouse is the main essential element to the development of strategic information for the majority of corporations. A data warehouse is a read only analytical database that is used as the foundation of a decision support system (Baum, 1997). A decision support system supplies information to assist employees in making decisions and to enhance job performance. Decision support systems can be used for short term tactical decision making or for long term strategic decision making (Davis, 1996). Many support systems provide for operational systems which run the day-to-day business of the company.

Analytical databases provide information which is used to analyze a problem or situation. Analytical processing is primarily done through comparisons, or by examining patterns or trends. Analytical databases provide a snapshot of data (generally time specific) and are often quite large because they track huge volumes of historical data (Baum, 1997). A data warehouse is an analytical database that is used as the foundation of a decision support system.

Data warehouses exist to facilitate strategic and tactical decision making. A data warehouse is updated periodically, on a predefined basis; an operational system is updated in real time. Management Information System (MIS) reporting systems deliver standardized reports that are limited to a small number templates and the data warehouse must support users performing iterative, ad hoc analysis (Baum, 1997). The data warehouse provides users an analytical foundation for making decisions. The historical information contained in data warehouse is used to develop applications, algorithms, and models for business decisions. An integrated data warehouse gives an organization consistent quantitative figures, so that decision makers can all use the same numbers. The data warehouse has a much higher success rate if it is separate from the operational database, in a hardware environment distinct from operational systems, so that end users can use the warehouse with out effecting the day-to-day operations of the business (Griffin, 1996).

SUCCESSFUL WAREHOUSES

In successful warehouses, historical data is used to develop applications with clear business benefits that directly impact the bottom line. Applications that maximize core business benefits include (Poe, et al., 1998):

* Fraud detection systems detect and prevent fraud before losses are incurred

* Target marketing systems give the business an understanding of customer behaviors and product needs so marketing campaigns can be directed towards those individuals

* Profitability analysis shows which individual customers are profitable and which are not, allowing them to develop appropriate customer management programs

* Customer retention applications help companies identify and keep their profitable customers, which is much more cost- effective than acquiring new customers.

* Inventory management allows retailers and manufacturers to have the right products in the right place at the right time, rather than incurring heavy losses from out-of-stock items

* Credit risk analysis enables companies to avoid bad debt by identifying the best risks among prospects with mixed credit histories.

* Long-term value assessment enables companies to predict which customers will be profitable in the future and which will not.

* Competitive pricing enables companies to develop fundamental new pricing structures by understanding product demand, competitive positioning in the marketplace, and profit margins.

The critical success factors involved with a data warehouse include: designing the data warehouse with a focus on the business and not the technology, use of an iterative development methodology with short cycles and frequent deliveries, and including end users on the implementation team (Poe, et al., 1998). The data warehouse provides the necessary information to key level decision makers in an organization so they can make strategic (long run) or operational (short run) decisions about their organizations (Hackathorn, 1995).

WAREHOUSE ARCHITECTURE

An architecture is a set of rules or structures providing a framework for the overall design of a system or product. The data architecture provides the framework by identifying and understanding how the data will move throughout the system and how it will be used within the corporation.

[FIGURE 1 OMITTED]

The data architecture for a data warehouse has the following characteristics (Poe, et al., 1998):

* Data is extracted from source systems, databases, and files

* The data from the source systems is integrated before being loaded into the data warehouse

* The data warehouse is a separate, read only database designed specifically for decision support processing of large volumes of data

* Users access the data warehouse via some front-end tool or application

A primary component of data architecture for a data warehouse is a read-only database used for decision support. Source fields for a data warehouse may come from different databases, platforms, and a variety of data types and formats. The architecture should help resolve the technical solutions needs of the decision support system and create a solid data warehouse architecture within the parameters you have to work with (Griffin, 1996).

Technical infrastructures are the technologies, platforms, databases, gateways, and other components necessary to make the architecture functional within the corporation. Infrastructures provides the means through which independent users operate and manipulate information resources within the data warehouse (Singh, 1996). The typical levels of users for many organizations are (Poe, et al., 1998):

* Novice--this is the causal user

* Business Analyst--makes decisions from provided information

* Power User--Person that has interaction with different functional areas

* Application developer--designer of the system

[FIGURE 2 OMITTED]

Identifying data warehouse architecture and infrastructures should be a separate project from the actual development of the data warehouse. Figure 3 presents the relationship between architecture and infrastructures.

[FIGURE 3 OMITTED]

DECISION SUPPORT SYSTEM

A DSS (Decision Support System) application is collection of one or more predefined reports, analyses, or data navigation paths which are developed in advance by an application developer or a power user (Hackathorne, 1995). A specific predefined report can generate many unique variations simply by changing the constraints. The data warehouse stores two types of data: source data (run day-to-day business) and target data (data the is inserted into fields within the data warehouse database). The classes of tools users have available to them to manipulate the information include (Poe, et al., 1998):

* Data access/query tools provide a graphical user database to the data warehouse.

* Report writers may also provide a layer of abstraction that allows the assigning of business names to the different columns and tables.

* Multidimensional Database Management Systems (MDBMSs) provide advanced metric support with extensive cut and paste capabilities

* Advanced decision support tools provide advanced multidimensional analysis directly against the relational database management system.

* Executive Information Systems (EISs) provide a structured, big button interface to predefined reports that provide highly summarized top-line information about the business.

The Decision Support Life Cycle flows through ten general phases (Poe, et al., 1998):

* Planning

* Gathering data requirements and modeling

* Physical database design and development

* Data souring, integration, and mapping

* Populating the data warehouse

* Automating the data management process

* Creating the starter set of reports

* Data validation and testing

* Training

* Rollout

PLANNING

Planning involves creating a project plan and defining realistic time estimates which may be difficult because there are altogether new tasks within the decision support life cycle. It is imperative that the data warehouse data architecture and technical infrastructures be thought through before the project development begins. If the data architecture and technical infrastructures have not been established then all of the architecture and infrastructure analysis will need to be added to the project plan. Planning, is basically concerned with the following: defining and/ or clarifying the project scope, creating the project plan, defining the necessary technical resources, both internal and external, defining the business participants and responsibilities, defining the tasks and deliverables, defining the time lines, and defining the final project deliverables.

DATA MODELING AND DESIGN

Gathering data requirements and modeling is concerned with the understanding of the business needs and data requirements of the users of the system. Gathering data requirements includes understanding the following (Poe, et al., 1998):

* How the user does business

* What the business drivers are

* What attributes the user needs

* Which attributes are absolutely required and which attributes are a "wish list"

* What are the business hierarchies

* What data users have now and what would they like to have

* What levels of detail or summary the users need

* What type of front-end data access tools will be used

* How the user expects to see the results of their queries

To minimize these dilemmas, tasks, deliverables, and schedules should be defined that will assist analysts in moving through this phase quickly. The process of building a data warehouse is iterative in nature. Once the first round of data is loaded into the data warehouse and users have a chance to see what data is available to them, there will be changes and additions requested (Darling, 1997). Information collected during this phase will directly feed the data modeling phase.

The central focus of data modeling is to provide a logical data model covering the scope of the development project including relationships, cardinality, attributes, definitions and candidate keys (Greenfield, 1996). The dimensional business model that diagrams the facts, dimensions, hierarchies, relationships, and candidate keys for the scope of the development project. These issues can affect the timing of the data warehouse development and should be addressed before development begins if they were not resolved as technical infrastructures.

The physical database design and development phase covers database design and denormalization. The design and development phase focuses on:

* Designing the database, including fact tables, relationship tables, and description (lookup) tables

* Denormalizing the data

* Identifying keys

* Creating indexing strategies

* Creating appropriate database objects

For this phase, it is imperative that the user have an understanding of and training in the following: decision support concepts, the concepts of hierarchical dimensions and facts, and star schema database design concepts(type of database design used to support analytical processing). This phase of development should also be concerned with (Poe, et al., 1998): development of aggregation strategies, development of partitioning strategies, and refining capacity planning estimates.

The data souring, integration, and mapping phase is done in conjunction with the database design phase, because of the need to target data warehouse database design for the source of target mapping. Figure 4 displays data integration in a data warehouse

[FIGURE 4 OMITTED]

This phase will accomplish the following (Baum, 1997):

* Defining the possible source systems

* Defining file layouts

* Performing data analysis to determine the best (and cleanest) source of data

* Performing data analysis to integrate the data

* Developing written data conversion specifications for each field and refining the integration strategy

* Mapping source to target data

The data that is actually possible to source, which is often quite different from the data requested by end users, may modify your requirements, dimensional business model, and database design.

Populating the data warehouse is concerned with the full process of extracting, converting, and loading data into the target database. This process is often done with the assistance of data conversion technology. Using a data conversion tool will affect the timing of the life-cycle phases and may consolidate tasks and deliverables. This phase focuses on (Poe, et al., 1998):

* Developing programs or using tools to extract and move the data

* Developing load strategies

* Developing the procedures to load the data into the warehouse

* Developing programs or using data conversion tools to integrate data

* Developing update/ refresh strategies

* Testing extract, integration, and load programs and procedures

Technical infrastructures should be in place to assist with the crucial steps of data mapping, conversion, extraction, and loading. These infrastructures may include (Baum, 1997): DBA expertise, data conversion tool programming expertise, source programming expertise, quality assurance procedures, capacity planning expertise, and system/ platform expertise. Figure 5 is an example of technical infrastructures in a data warehouse architecture.

Automating the data load process is primarily concerned with automating the extraction, integration, and load of the data warehouse. This phase includes five steps (Poe et al., 1998):

* Automating and scheduling the data extraction process

* Automating and scheduling the data conversion process

* Automating and scheduling the data load process

* Creating backup and recovery procedures

* Conducting a full test of all the automated procedures

The development of a starter set of reports can begin as soon as the user loads a test subset of data. DSS application development is generally done through the use of data access tools to pre-build several reports. This phase is primarily concerned with (Poe, et al., 1998):

* Creating the starter set of predefined reports

* Testing reports

* Documenting applications

* Developing navigation paths

[FIGURE 5 OMITTED]

Data validation and testing processes should be included throughout the data extract, integration, and load development phases. Basically data validation can occur by three means: using the set of starter reports, using standard processes, and repeatedly changing the data. The new data modifications will be located, extracted, mapped, integrated, and loaded into the data warehouse (Varney, 1996). The training phase of the decision support life cycle is focused on creating training programs for the user community. To gain real business value from the warehouse development, users of all levels will need to be trained in (Poe, et al., 1998):

* The scope of the data in the warehouse

* The front-end access tool and how it works

* How to access and navigate metadata to get information on the data in the warehouse

* The DSS application or starter set of reports--the capabilities and navigation paths

* Ongoing training/user assistance as the system evolves

The rollout phase of the life cycle includes the necessary tasks for the deployment of the data warehouse to the user community. The rollout phase includes (Poe, et al., 1998):

* Installing the physical infrastructures for all users. The components that must be in place for the end user are the LAN/WAN, database connectivity, configured workstations, data access software, and managed metadata

* Deploying the DSS application

* Creating end-user support structures

* Creating procedures for adding new reports and expanding the DSS application

* Setting up procedures to back up the DSS application, not just the data warehouse

* Creating procedures for investing and resolving data integrity and related issues.

* Setting up procedures for metadata management

* Creating change management procedures

Ad hoc feedback from system users should be obtained to modify these steps accordingly as this phase develops.

The overall steps involved in developing the decision support system should match the requirements of the company. The benefits realized from an efficient DSS are (Baum, 1997): gaining a competitive advantage, increased revenues, reduced costs, improved profit, and creating new opportunities. The DSS life cycle is a tedious task and should include horizontal decision making across functional areas in the organization in order to include all constraints. Before moving into the creation of a full blown data warehouse, the company should first develop a pilot conversion project. The pilot project allows users to gain experience with the system, show users the value of decision support information, and clarifies the purpose of the pilot project. There are two main pilot implementation programs available: proof of concept pilot and the architecture and infrastructure. The architecture and infrastructure pilot is concerned with: understanding the complexities involved in developing a data warehouse for decision support, gaining experience with new tools and technologies, getting a sense of realistic time lines and learning curves for tasks, and providing a data warehouse for the purpose of supplying decision support information to users. The proof of the pilot conversion is a short run process and involves the use and conversion of existing technologies and equipment. Doing a pilot of either kind will result in the following advantages (Poe, et al. 111-112):

* A first cut of a dimensional business model

* Ideas on how to design your physical database

* An understanding on the cleanliness of your data

* A working prototype of a data warehouse

* Concrete analytical examples to serve as thought starters for end users

* An understanding of how the data access tools work

The pilot should be treated as a development process separate from the main conversion project. The main technical infrastructures should be in place in order to insure that the conversion process can be accomplished (Baum, 1997). The main understanding involved should clarify several key areas such as: owner of project, scope of project, data access software, training of personnel, finding the best source of data, choosing platforms, creating starter set of reports, and establishing securities. The pilot is a conversion stage that is just preliminary to the design of databases for the warehouse. The entire process is the components that derive the Decision Support Database (DSD). Decision Support Databases are designed to allow users to access information quickly and easily. The data analysis is historical in nature: daily, weekly, or monthly even yearly reports. Decision support systems have the following characteristics:

* Understandable: Data structures must be readily understood by users

* Mostly static: Most changes to the database occur in a controlled manner when data is loaded according to a predefined schedule

* Unpredictable and complex SQL queries: SQL query statements vary considerably from query to query.

* Advanced business measurements often require multiple SQL statements

* Multiple/large/iterative result sets should be supported

* Recoverable: Regular backups, or snapshots, of the static database ensure against data loss

A typical decision support database goals are achieved through the use of a star schema design. A star schema is a simple structure with few tables and well-defined join paths. This database design provides fast query response time and a simple schema that is readily understood by analysts and end users. The star is often used for data warehouse design because it provides faster response times, a simple database design allows users to yield better execution plans, simplifies the understanding and use of metadata for developers, and broadens the choices of front-end data access tools (Poe, et al., 1998).

METADATA

Metadata is data about data and is an important concept development of efficient warehouses. Figure 6 shows how Metadata interfaces with the data warehouse and end user.

A star schema provides two types of tables, fact tables and dimensional tables. Fact tables contain quantitative or factual data about a business and the information being queried. Dimensional tables hold descriptive data that reflects the dimensions and scopes of the business. A star design contains multiple fact tables and the primary key for the fact table is comprised of foreign keys from the dimensional tables. In schemas where the foreign keys from the dimensional tables do not provide a unique identifier, a multi-star schema can be used. In a multi-star schema, the fact table has both a set of foreign keys referencing the dimensions, and a primary key to provide a unique identifier to each row. Dimensions often contain business hierarchies to allow users to drill up and down to the level of detail necessary to provide answers. Aggregation is accumulating fact data along predefined attributes. Aggregation data that is requested by users on a daily basis will often be pre-calculated and loaded into the data warehouse to improve end-user query performance and reduce the number of CPU cycles needed (Zweig, 1996). The determination of which aggregates should be pre-stored will be based upon the frequency of end-user access, as well as the reduction in the total number of rows returned from a query. Data access is a major issue in developing a Decision Support System (DSS). The most common ways people receive information from the data warehouse include (Poe, et al., 1998):

[FIGURE 6 OMITTED]

* Parameter based ad-hoc report--Fixed report formats where the user can change the parameters.

* Electronic access to predefined reports--locations must be easily accessible locations for users to pull up for viewing as needed.

* Full ad hoc analysis--The user interacts directly with the tool to create a brand new analysis from scratch.

* Hard copy reports--predefined fixed format reports are generated, printed, and delivered to the user.

* Executive information system--provides navigation along predefined paths to access predefined analysis.

* Structured decision supports--provides navigation along predefined paths to access predefined and ad hoc reports.

* Unstructured decision support--provides access to all predefined and ad hoc reports.

Not all business users across functional areas have the same data and analytical requirements. An environment for data access includes the data access software, training, support, and a starter set of applications to enable users to access information from the data warehouse. A DSS application is a "starter set" of predefined reports created in the front-end tool to accommodate the need for different levels of users to have pre-built reports to begin their analysis (Baum, 1997). Different classes of tools may be used: report writers, data access/query tools, advanced decision support tools, and multidimensional database management systems (Zweig, 1996). Generally data access have the following characteristics: visualization of the data warehouse, formulation of the request, processing the request, and presentation of the results (Griffin, 1996).

IMPLEMENTATION AND INSTALLATION

The implementation and installation of a decision support data warehouse can not be accomplished effectively without the proper training, support, and rollout. Training should be comprehensive and focus on the following (Poe, et al., 1998):

* Introduction of data warehouse concepts

* Introduction to data, location in the warehouse, and how it relates to already installed reports and systems

* The mechanics of using the tool (Navigation)

* Type of analysis that can be preformed

* Using the tool against the data.

The best instruction is usually customized classes which utilize the particular tools of the required data. Proper support provided either through third party vendors of the installer of the system should be in place. The major support issues involved include (Poe, et al., 1998):

* Validity of the data

* Data use

* Pick reports

* Changes in the front-end application

* Building applications

* Navigation through metadata

* Adding new subject areas (data) to the existing warehouse

The best possible approach to supporting uses is to anticipate problems. Proactive decision thinking will allow system developers to anticipate problems and establish predefined solution models (Greenfield, 1996).

In planning a rollout of the DSS, a company should focus on the user requirements and establish effective strategies in order to effectively meet their needs. A phased rollout will allow management and users of the system to convert key systems first and allow debugging to occur before implementing other areas (Poe, et al., 1998). A rollout plan should have an established time line in order for managers and implementations to judge how effective their efforts at conversion are (Greenfield, 1996).

CONCLUSION

Overall, a database warehouse system is a strong development tool for providing information to key managers for decision support. The process is ad hoc in nature and no predefined established rules govern particular companies. The data warehouse is not an operational system and in many cases users are not required to use it. If the system is too difficult to use and appropriate levels of support and training are not provided then personnel may not use them. From the development of personal computers and the use of strategic information many companies have realized the importance of decision support information and the advances that it can bring to their organizations. The strategic use of information is an important component of successful businesses and will continue to be very important in the future. Managers should develop the skills necessary to manage this technology into the next millennium.

REFERENCES

Baum, D. (1997). Planning and implementing a data warehouse. Byte, June, 120c-120d.

Darling, C. (1997). Ease implementation woes with packaged datamarts. Datamation, March, 94-98.

Davis, D. (1996). Warehousing wannabe. Datamation, November, 146.

Gardner, S. (1997). The quest to standardize Metadata. Byte, November, 47-48.

Greenfield, L. (1996). Don't let data warehousing gotchas getcha. Datamation, Mar., 76-77.

Griffin, J. (1996). Avoid data warehousing maintenance migraines. Datamation, Aug. 74-76.

Hackathorn, R. (1995). Data warehousing energizes your enterprise. Datamation, Feb. 38-40.

Poe, V., P. Klauer & S. Brobst. (1998). Building a data warehouse for decision support. Englewood Cliffs: Prentice Hall Press.

Singh, D. (1996). An empirical investigation of the impact of process monitoring on computer-mediated decision-making performance. Organizational Behavior and Human Decision Processes, Aug., 156-169.

Varney, S. (1996). Datamarts: Coming to an IT mall near you. Datamation, June, 44-46.

Watterson, K. (1997). Attention, data-mart shoppers. Byte, July, 73-74.

Wilson, R. (1994). GIS and decision support systems. Journal of Systems Management, 36-40.

Zweig, P. (1996). Beyond bean-counting. Business Week, Oct. 28, 130-132.

John Malley, University of Central Arkansas

Ken Griffin, Francis Marion University

Keith Smallwood, University of Central Arkansas