Data warehouse: emphasis in decision support.
Malley, John ; Griffin, Ken ; Smallwood, Keith 等
INTRODUCTION
A data warehouse is the main essential element to the development
of strategic information for the majority of corporations. A data
warehouse is a read only analytical database that is used as the
foundation of a decision support system (Baum, 1997). A decision support
system supplies information to assist employees in making decisions and
to enhance job performance. Decision support systems can be used for
short term tactical decision making or for long term strategic decision
making (Davis, 1996). Many support systems provide for operational
systems which run the day-to-day business of the company.
Analytical databases provide information which is used to analyze a
problem or situation. Analytical processing is primarily done through
comparisons, or by examining patterns or trends. Analytical databases
provide a snapshot of data (generally time specific) and are often quite
large because they track huge volumes of historical data (Baum, 1997). A
data warehouse is an analytical database that is used as the foundation
of a decision support system.
Data warehouses exist to facilitate strategic and tactical decision
making. A data warehouse is updated periodically, on a predefined basis;
an operational system is updated in real time. Management Information
System (MIS) reporting systems deliver standardized reports that are
limited to a small number templates and the data warehouse must support
users performing iterative, ad hoc analysis (Baum, 1997). The data
warehouse provides users an analytical foundation for making decisions.
The historical information contained in data warehouse is used to
develop applications, algorithms, and models for business decisions. An
integrated data warehouse gives an organization consistent quantitative
figures, so that decision makers can all use the same numbers. The data
warehouse has a much higher success rate if it is separate from the
operational database, in a hardware environment distinct from
operational systems, so that end users can use the warehouse with out
effecting the day-to-day operations of the business (Griffin, 1996).
SUCCESSFUL WAREHOUSES
In successful warehouses, historical data is used to develop
applications with clear business benefits that directly impact the
bottom line. Applications that maximize core business benefits include
(Poe, et al., 1998):
* Fraud detection systems detect and prevent fraud before losses
are incurred
* Target marketing systems give the business an understanding of
customer behaviors and product needs so marketing campaigns can be
directed towards those individuals
* Profitability analysis shows which individual customers are
profitable and which are not, allowing them to develop appropriate
customer management programs
* Customer retention applications help companies identify and keep
their profitable customers, which is much more cost- effective than
acquiring new customers.
* Inventory management allows retailers and manufacturers to have
the right products in the right place at the right time, rather than
incurring heavy losses from out-of-stock items
* Credit risk analysis enables companies to avoid bad debt by
identifying the best risks among prospects with mixed credit histories.
* Long-term value assessment enables companies to predict which
customers will be profitable in the future and which will not.
* Competitive pricing enables companies to develop fundamental new
pricing structures by understanding product demand, competitive
positioning in the marketplace, and profit margins.
The critical success factors involved with a data warehouse
include: designing the data warehouse with a focus on the business and
not the technology, use of an iterative development methodology with
short cycles and frequent deliveries, and including end users on the
implementation team (Poe, et al., 1998). The data warehouse provides the
necessary information to key level decision makers in an organization so
they can make strategic (long run) or operational (short run) decisions
about their organizations (Hackathorn, 1995).
WAREHOUSE ARCHITECTURE
An architecture is a set of rules or structures providing a
framework for the overall design of a system or product. The data
architecture provides the framework by identifying and understanding how
the data will move throughout the system and how it will be used within
the corporation.
[FIGURE 1 OMITTED]
The data architecture for a data warehouse has the following
characteristics (Poe, et al., 1998):
* Data is extracted from source systems, databases, and files
* The data from the source systems is integrated before being
loaded into the data warehouse
* The data warehouse is a separate, read only database designed
specifically for decision support processing of large volumes of data
* Users access the data warehouse via some front-end tool or
application
A primary component of data architecture for a data warehouse is a
read-only database used for decision support. Source fields for a data
warehouse may come from different databases, platforms, and a variety of
data types and formats. The architecture should help resolve the
technical solutions needs of the decision support system and create a
solid data warehouse architecture within the parameters you have to work
with (Griffin, 1996).
Technical infrastructures are the technologies, platforms,
databases, gateways, and other components necessary to make the
architecture functional within the corporation. Infrastructures provides
the means through which independent users operate and manipulate
information resources within the data warehouse (Singh, 1996). The
typical levels of users for many organizations are (Poe, et al., 1998):
* Novice--this is the causal user
* Business Analyst--makes decisions from provided information
* Power User--Person that has interaction with different functional
areas
* Application developer--designer of the system
[FIGURE 2 OMITTED]
Identifying data warehouse architecture and infrastructures should
be a separate project from the actual development of the data warehouse.
Figure 3 presents the relationship between architecture and
infrastructures.
[FIGURE 3 OMITTED]
DECISION SUPPORT SYSTEM
A DSS (Decision Support System) application is collection of one or
more predefined reports, analyses, or data navigation paths which are
developed in advance by an application developer or a power user
(Hackathorne, 1995). A specific predefined report can generate many
unique variations simply by changing the constraints. The data warehouse
stores two types of data: source data (run day-to-day business) and
target data (data the is inserted into fields within the data warehouse
database). The classes of tools users have available to them to
manipulate the information include (Poe, et al., 1998):
* Data access/query tools provide a graphical user database to the
data warehouse.
* Report writers may also provide a layer of abstraction that
allows the assigning of business names to the different columns and
tables.
* Multidimensional Database Management Systems (MDBMSs) provide
advanced metric support with extensive cut and paste capabilities
* Advanced decision support tools provide advanced multidimensional
analysis directly against the relational database management system.
* Executive Information Systems (EISs) provide a structured, big
button interface to predefined reports that provide highly summarized
top-line information about the business.
The Decision Support Life Cycle flows through ten general phases
(Poe, et al., 1998):
* Planning
* Gathering data requirements and modeling
* Physical database design and development
* Data souring, integration, and mapping
* Populating the data warehouse
* Automating the data management process
* Creating the starter set of reports
* Data validation and testing
* Training
* Rollout
PLANNING
Planning involves creating a project plan and defining realistic
time estimates which may be difficult because there are altogether new
tasks within the decision support life cycle. It is imperative that the
data warehouse data architecture and technical infrastructures be
thought through before the project development begins. If the data
architecture and technical infrastructures have not been established
then all of the architecture and infrastructure analysis will need to be
added to the project plan. Planning, is basically concerned with the
following: defining and/ or clarifying the project scope, creating the
project plan, defining the necessary technical resources, both internal
and external, defining the business participants and responsibilities,
defining the tasks and deliverables, defining the time lines, and
defining the final project deliverables.
DATA MODELING AND DESIGN
Gathering data requirements and modeling is concerned with the
understanding of the business needs and data requirements of the users
of the system. Gathering data requirements includes understanding the
following (Poe, et al., 1998):
* How the user does business
* What the business drivers are
* What attributes the user needs
* Which attributes are absolutely required and which attributes are
a "wish list"
* What are the business hierarchies
* What data users have now and what would they like to have
* What levels of detail or summary the users need
* What type of front-end data access tools will be used
* How the user expects to see the results of their queries
To minimize these dilemmas, tasks, deliverables, and schedules
should be defined that will assist analysts in moving through this phase
quickly. The process of building a data warehouse is iterative in
nature. Once the first round of data is loaded into the data warehouse
and users have a chance to see what data is available to them, there
will be changes and additions requested (Darling, 1997). Information
collected during this phase will directly feed the data modeling phase.
The central focus of data modeling is to provide a logical data
model covering the scope of the development project including
relationships, cardinality, attributes, definitions and candidate keys
(Greenfield, 1996). The dimensional business model that diagrams the
facts, dimensions, hierarchies, relationships, and candidate keys for
the scope of the development project. These issues can affect the timing
of the data warehouse development and should be addressed before
development begins if they were not resolved as technical
infrastructures.
The physical database design and development phase covers database
design and denormalization. The design and development phase focuses on:
* Designing the database, including fact tables, relationship
tables, and description (lookup) tables
* Denormalizing the data
* Identifying keys
* Creating indexing strategies
* Creating appropriate database objects
For this phase, it is imperative that the user have an
understanding of and training in the following: decision support
concepts, the concepts of hierarchical dimensions and facts, and star
schema database design concepts(type of database design used to support
analytical processing). This phase of development should also be
concerned with (Poe, et al., 1998): development of aggregation
strategies, development of partitioning strategies, and refining
capacity planning estimates.
The data souring, integration, and mapping phase is done in
conjunction with the database design phase, because of the need to
target data warehouse database design for the source of target mapping.
Figure 4 displays data integration in a data warehouse
[FIGURE 4 OMITTED]
This phase will accomplish the following (Baum, 1997):
* Defining the possible source systems
* Defining file layouts
* Performing data analysis to determine the best (and cleanest)
source of data
* Performing data analysis to integrate the data
* Developing written data conversion specifications for each field
and refining the integration strategy
* Mapping source to target data
The data that is actually possible to source, which is often quite
different from the data requested by end users, may modify your
requirements, dimensional business model, and database design.
Populating the data warehouse is concerned with the full process of
extracting, converting, and loading data into the target database. This
process is often done with the assistance of data conversion technology.
Using a data conversion tool will affect the timing of the life-cycle
phases and may consolidate tasks and deliverables. This phase focuses on
(Poe, et al., 1998):
* Developing programs or using tools to extract and move the data
* Developing load strategies
* Developing the procedures to load the data into the warehouse
* Developing programs or using data conversion tools to integrate
data
* Developing update/ refresh strategies
* Testing extract, integration, and load programs and procedures
Technical infrastructures should be in place to assist with the
crucial steps of data mapping, conversion, extraction, and loading.
These infrastructures may include (Baum, 1997): DBA expertise, data
conversion tool programming expertise, source programming expertise,
quality assurance procedures, capacity planning expertise, and system/
platform expertise. Figure 5 is an example of technical infrastructures
in a data warehouse architecture.
Automating the data load process is primarily concerned with
automating the extraction, integration, and load of the data warehouse.
This phase includes five steps (Poe et al., 1998):
* Automating and scheduling the data extraction process
* Automating and scheduling the data conversion process
* Automating and scheduling the data load process
* Creating backup and recovery procedures
* Conducting a full test of all the automated procedures
The development of a starter set of reports can begin as soon as
the user loads a test subset of data. DSS application development is
generally done through the use of data access tools to pre-build several
reports. This phase is primarily concerned with (Poe, et al., 1998):
* Creating the starter set of predefined reports
* Testing reports
* Documenting applications
* Developing navigation paths
[FIGURE 5 OMITTED]
Data validation and testing processes should be included throughout
the data extract, integration, and load development phases. Basically
data validation can occur by three means: using the set of starter
reports, using standard processes, and repeatedly changing the data. The
new data modifications will be located, extracted, mapped, integrated,
and loaded into the data warehouse (Varney, 1996). The training phase of
the decision support life cycle is focused on creating training programs
for the user community. To gain real business value from the warehouse
development, users of all levels will need to be trained in (Poe, et
al., 1998):
* The scope of the data in the warehouse
* The front-end access tool and how it works
* How to access and navigate metadata to get information on the
data in the warehouse
* The DSS application or starter set of reports--the capabilities
and navigation paths
* Ongoing training/user assistance as the system evolves
The rollout phase of the life cycle includes the necessary tasks
for the deployment of the data warehouse to the user community. The
rollout phase includes (Poe, et al., 1998):
* Installing the physical infrastructures for all users. The
components that must be in place for the end user are the LAN/WAN,
database connectivity, configured workstations, data access software,
and managed metadata
* Deploying the DSS application
* Creating end-user support structures
* Creating procedures for adding new reports and expanding the DSS
application
* Setting up procedures to back up the DSS application, not just
the data warehouse
* Creating procedures for investing and resolving data integrity
and related issues.
* Setting up procedures for metadata management
* Creating change management procedures
Ad hoc feedback from system users should be obtained to modify
these steps accordingly as this phase develops.
The overall steps involved in developing the decision support
system should match the requirements of the company. The benefits
realized from an efficient DSS are (Baum, 1997): gaining a competitive
advantage, increased revenues, reduced costs, improved profit, and
creating new opportunities. The DSS life cycle is a tedious task and
should include horizontal decision making across functional areas in the
organization in order to include all constraints. Before moving into the
creation of a full blown data warehouse, the company should first
develop a pilot conversion project. The pilot project allows users to
gain experience with the system, show users the value of decision
support information, and clarifies the purpose of the pilot project.
There are two main pilot implementation programs available: proof of
concept pilot and the architecture and infrastructure. The architecture
and infrastructure pilot is concerned with: understanding the
complexities involved in developing a data warehouse for decision
support, gaining experience with new tools and technologies, getting a
sense of realistic time lines and learning curves for tasks, and
providing a data warehouse for the purpose of supplying decision support
information to users. The proof of the pilot conversion is a short run
process and involves the use and conversion of existing technologies and
equipment. Doing a pilot of either kind will result in the following
advantages (Poe, et al. 111-112):
* A first cut of a dimensional business model
* Ideas on how to design your physical database
* An understanding on the cleanliness of your data
* A working prototype of a data warehouse
* Concrete analytical examples to serve as thought starters for end
users
* An understanding of how the data access tools work
The pilot should be treated as a development process separate from
the main conversion project. The main technical infrastructures should
be in place in order to insure that the conversion process can be
accomplished (Baum, 1997). The main understanding involved should
clarify several key areas such as: owner of project, scope of project,
data access software, training of personnel, finding the best source of
data, choosing platforms, creating starter set of reports, and
establishing securities. The pilot is a conversion stage that is just
preliminary to the design of databases for the warehouse. The entire
process is the components that derive the Decision Support Database
(DSD). Decision Support Databases are designed to allow users to access
information quickly and easily. The data analysis is historical in
nature: daily, weekly, or monthly even yearly reports. Decision support
systems have the following characteristics:
* Understandable: Data structures must be readily understood by
users
* Mostly static: Most changes to the database occur in a controlled
manner when data is loaded according to a predefined schedule
* Unpredictable and complex SQL queries: SQL query statements vary
considerably from query to query.
* Advanced business measurements often require multiple SQL
statements
* Multiple/large/iterative result sets should be supported
* Recoverable: Regular backups, or snapshots, of the static
database ensure against data loss
A typical decision support database goals are achieved through the
use of a star schema design. A star schema is a simple structure with
few tables and well-defined join paths. This database design provides
fast query response time and a simple schema that is readily understood
by analysts and end users. The star is often used for data warehouse
design because it provides faster response times, a simple database
design allows users to yield better execution plans, simplifies the
understanding and use of metadata for developers, and broadens the
choices of front-end data access tools (Poe, et al., 1998).
METADATA
Metadata is data about data and is an important concept development
of efficient warehouses. Figure 6 shows how Metadata interfaces with the
data warehouse and end user.
A star schema provides two types of tables, fact tables and
dimensional tables. Fact tables contain quantitative or factual data
about a business and the information being queried. Dimensional tables
hold descriptive data that reflects the dimensions and scopes of the
business. A star design contains multiple fact tables and the primary
key for the fact table is comprised of foreign keys from the dimensional
tables. In schemas where the foreign keys from the dimensional tables do
not provide a unique identifier, a multi-star schema can be used. In a
multi-star schema, the fact table has both a set of foreign keys
referencing the dimensions, and a primary key to provide a unique
identifier to each row. Dimensions often contain business hierarchies to
allow users to drill up and down to the level of detail necessary to
provide answers. Aggregation is accumulating fact data along predefined
attributes. Aggregation data that is requested by users on a daily basis
will often be pre-calculated and loaded into the data warehouse to
improve end-user query performance and reduce the number of CPU cycles
needed (Zweig, 1996). The determination of which aggregates should be
pre-stored will be based upon the frequency of end-user access, as well
as the reduction in the total number of rows returned from a query. Data
access is a major issue in developing a Decision Support System (DSS).
The most common ways people receive information from the data warehouse
include (Poe, et al., 1998):
[FIGURE 6 OMITTED]
* Parameter based ad-hoc report--Fixed report formats where the
user can change the parameters.
* Electronic access to predefined reports--locations must be easily
accessible locations for users to pull up for viewing as needed.
* Full ad hoc analysis--The user interacts directly with the tool
to create a brand new analysis from scratch.
* Hard copy reports--predefined fixed format reports are generated,
printed, and delivered to the user.
* Executive information system--provides navigation along
predefined paths to access predefined analysis.
* Structured decision supports--provides navigation along
predefined paths to access predefined and ad hoc reports.
* Unstructured decision support--provides access to all predefined
and ad hoc reports.
Not all business users across functional areas have the same data
and analytical requirements. An environment for data access includes the
data access software, training, support, and a starter set of
applications to enable users to access information from the data
warehouse. A DSS application is a "starter set" of predefined
reports created in the front-end tool to accommodate the need for
different levels of users to have pre-built reports to begin their
analysis (Baum, 1997). Different classes of tools may be used: report
writers, data access/query tools, advanced decision support tools, and
multidimensional database management systems (Zweig, 1996). Generally
data access have the following characteristics: visualization of the
data warehouse, formulation of the request, processing the request, and
presentation of the results (Griffin, 1996).
IMPLEMENTATION AND INSTALLATION
The implementation and installation of a decision support data
warehouse can not be accomplished effectively without the proper
training, support, and rollout. Training should be comprehensive and
focus on the following (Poe, et al., 1998):
* Introduction of data warehouse concepts
* Introduction to data, location in the warehouse, and how it
relates to already installed reports and systems
* The mechanics of using the tool (Navigation)
* Type of analysis that can be preformed
* Using the tool against the data.
The best instruction is usually customized classes which utilize
the particular tools of the required data. Proper support provided
either through third party vendors of the installer of the system should
be in place. The major support issues involved include (Poe, et al.,
1998):
* Validity of the data
* Data use
* Pick reports
* Changes in the front-end application
* Building applications
* Navigation through metadata
* Adding new subject areas (data) to the existing warehouse
The best possible approach to supporting uses is to anticipate
problems. Proactive decision thinking will allow system developers to
anticipate problems and establish predefined solution models
(Greenfield, 1996).
In planning a rollout of the DSS, a company should focus on the
user requirements and establish effective strategies in order to
effectively meet their needs. A phased rollout will allow management and
users of the system to convert key systems first and allow debugging to
occur before implementing other areas (Poe, et al., 1998). A rollout
plan should have an established time line in order for managers and
implementations to judge how effective their efforts at conversion are
(Greenfield, 1996).
CONCLUSION
Overall, a database warehouse system is a strong development tool
for providing information to key managers for decision support. The
process is ad hoc in nature and no predefined established rules govern
particular companies. The data warehouse is not an operational system
and in many cases users are not required to use it. If the system is too
difficult to use and appropriate levels of support and training are not
provided then personnel may not use them. From the development of
personal computers and the use of strategic information many companies
have realized the importance of decision support information and the
advances that it can bring to their organizations. The strategic use of
information is an important component of successful businesses and will
continue to be very important in the future. Managers should develop the
skills necessary to manage this technology into the next millennium.
REFERENCES
Baum, D. (1997). Planning and implementing a data warehouse. Byte,
June, 120c-120d.
Darling, C. (1997). Ease implementation woes with packaged
datamarts. Datamation, March, 94-98.
Davis, D. (1996). Warehousing wannabe. Datamation, November, 146.
Gardner, S. (1997). The quest to standardize Metadata. Byte,
November, 47-48.
Greenfield, L. (1996). Don't let data warehousing gotchas
getcha. Datamation, Mar., 76-77.
Griffin, J. (1996). Avoid data warehousing maintenance migraines.
Datamation, Aug. 74-76.
Hackathorn, R. (1995). Data warehousing energizes your enterprise.
Datamation, Feb. 38-40.
Poe, V., P. Klauer & S. Brobst. (1998). Building a data
warehouse for decision support. Englewood Cliffs: Prentice Hall Press.
Singh, D. (1996). An empirical investigation of the impact of
process monitoring on computer-mediated decision-making performance.
Organizational Behavior and Human Decision Processes, Aug., 156-169.
Varney, S. (1996). Datamarts: Coming to an IT mall near you.
Datamation, June, 44-46.
Watterson, K. (1997). Attention, data-mart shoppers. Byte, July,
73-74.
Wilson, R. (1994). GIS and decision support systems. Journal of
Systems Management, 36-40.
Zweig, P. (1996). Beyond bean-counting. Business Week, Oct. 28,
130-132.
John Malley, University of Central Arkansas
Ken Griffin, Francis Marion University
Keith Smallwood, University of Central Arkansas