文章基本信息

标题：Part 4: Getting to the Bottom of Things
作者：Rooney, James J
期刊名称：The Journal for Quality and Participation
印刷版ISSN：1040-9602
电子版ISSN：1931-4019
出版年度：2005
卷号：Summer 2005
出版社：American Society for Quality

Part 4: Getting to the Bottom of Things

Rooney, James J

A seven-part series to improve your organization's problem-solving efforts

In the past three issues, we have reviewed ways to improve structured problem solving by making the process more enjoyable for the participants. The sixsteps for problem solving described in the sidebar (see p. 16) are designed to identify and eliminate root causes. This series focuses on identifying human psychological factors that generate resistance to a structured, "facts-and-data-based" approach to problem solving and presenting suggestions on how to tap into the participants' natural creativity and intuition -without jeopardizing reliability.

In the first installment, we compared the six-step methodology to the process used for treasure hunting, a pursuit that most people find fun and rewarding. The next two segments reviewed how to define the problem and gather data about the problem's current state.

Moving From Suspected Causes to Root Causes

As described in the spring issue, the tasks in the second step of the structured problem-solving process involve brainstorming a list of potential hypotheses (suspected causes) and conducting a data-gathering jamboree to compile facts and data in support or refutation of those hypotheses. The three most likely potential causes are then identified and additional, more in-depth data are collected for each of them.

Analysis of the more comprehensive facts and data generally indicates that one of the three most likely causes contributes more significantly to the problem. The Pareto principle, or the 80/20 rule, predicts that 80% of the problem is generated by 20% of the causes. In other words, a vital few causes contribute significantly to the poor performance results and warrant problemsolving efforts. The "trivial many" causes generally can be ignored, allowing team efforts to focus on solving other problems.

If, however, none of the initially investigated causes appears to greatly influence the problem, the team returns to its original brainstorming and identifies three other hypotheses to investigate. Once a significant cause emerges, the third step, "determine the root cause," begins.

What is a Root Cause?

Although there is substantial debate concerning the term root cause, we'll define it as a specific, underlying cause that can be identified reasonably and for which effective recommendations for preventing recurrence can be generated (ABS Consulting, 1999). Additionally, we will limit root causes to those over which the team or its sponsors can exert influence and control.

This definition contains the following four key elements:

* Root causes are underlying causes. One pitfall that many teams encounter is confusing symptoms and causes. This matter is discussed in the next section of this article, "Symptom or Cause?"

* Root causes can be identified reasonably. Occurrence investigations must be cost beneficial. It is not practical to keep valuable manpower occupied indefinitely searching for the root causes of occurrences. Another article in this issue of The Journal for Quality and Participation, "Practical Tips: How Many Causes Should You Pursue?" offers a framework for determining how many root causes to eliminate before the team moves on to a new problem.

* Effective recommendations can be generated for root causes. Recommendations should address the root causes directly and should describe specific change requirements. Teams should avoid using general cause classifications such as "employee error" or "equipment failure" or offering vague recommendations, such as "Improve adherence to written policies and procedures." Only when a specific root cause is identified can a specific solution be developed and a change plan implemented.

* The team and/or its sponsors have influence and control over root causes. We also must identify a root cause for which the team and/or its managers can drive change. The root cause, "Supplier sends us offspecification parts," is outside the team/sponsor's control and is really an excuse, not a root cause. On the other hand, "We accept off-specification parts," or "We haven't properly defined our requirements," are root causes that the organization can address.

Symptom or Cause?

Webster's Online Dictionary defines a symptom as "a sign or token; that which indicates the existence of something else." Indeed, the "something else" is the cause, which is defined as "that which produces or effects a result; that from which anything proceeds, and without which it would not exist."

Doctors clearly understand the difference between these two phenomena. Although we may receive treatment to alleviate symptoms, few of us would feel satisfied if our medical professionals didn't identify and attempt to eliminate (or instruct us on how to eliminate) the underlying causes. For example, topical anti-itch lotions or corticosteroid injections may relieve the symptoms of poison ivy, but exposure to urushiol, the plant's active allergen, is the cause.

In the workplace, however, it may not be so easy to separate symptoms from causes. The tendency for team members to have preconceived solutions in mind and, therefore, to want to rush through the root-cause analysis step so they can design creative fixes exacerbates this issue. Numerous studies of successful problem solving have shown that identifying true root causes -and later implementing solutions that permanently eliminate those root causes -is the single most important factor in generating enduring change and improved results.

Root-Cause Analysis

Root-cause analysis is a process designed for use in investigating and identifying not only what and bow a problem occurred but also why it happened (AIChE, 1992). Only when teams are able to determine why a problem occurred are they able to devise workable corrective measures that prevent its future occurrence.

Imagine a problem occurs when an employee is instructed to open door one but instead closes door two. A typical investigation might conclude that "employee error" was the cause. This is an accurate description of what happened and how it happened.

In this case, we probably would see recommendations such as "Retrain the employee on the procedure," "Remind all employees to be alert when using doors," or "Emphasize to all personnel that careful attention to the job should be maintained at all times." These recommendations would do little to prevent future occurrences of the problem.

Generally, mistakes do not "just happen." They can be traced to some well-defined causes. In the case of the door error, we might use a process often referred to as "asking why five times" to get to the root cause. This approach is direct and works well when the initial cause relates to machinery, materials, measurements, methods, information, or environment.

When the initial causes involve specific people, their decisions, or their actions, asking "why" may make them feel as if you are blaming them for the problem; therefore, it is best to rephrase the questions in a way that explores why without seeming to censure the involved parties. We do this by exploring the circumstances that surrounded the initial cause.

Let's revisit the example of the employee closing door two instead of opening door one. Instead of saying, "Why did you do that?" you can ask questions such as the following:

* What verbal/written instructions did you receive?

* What training did you receive?

* Have you ever performed this task before?

* What did you do immediately prior to performing this task?

* What did you do immediately after performing this task?

* Are the doors clearly labeled?

* Was door one unlocked, unblocked, and in usable condition?

Even an indirect question, such as "What caused you to close door two?" or "What prevented you from opening door one?" is less likely to create tension than asking direct "why" questions.

The answers to these and other questions will help determine why the error took place and what the organization can do to prevent recurrence.

Identifying root causes is the key to preventing similar occurrences of the problem in the future. An added benefit of effective root-cause analysis is that, over time, root causes associated with multiple problems can be used to target major opportunities for improvement. For example, if a significant number of analyses point to inadequacies in the procurement processes, resources can be focused on improving this system. This means that the "trivial many" causes eventually are tackled and solved if they contribute to many problems and significantly deplete the organization's overall performance.

Cause Systems

We can categorize causes into systems that show the relations among the causes. In some cases, the causes in a system are independent -an occurrence of any one of them causes the problem to occur to some degree. In other cases, the causes in a system are dependent-they mostly occur in combination for the problematic result to occur. In many complex problems, both independent and dependent cause systems are present.

Independent causes are often hierarchical in nature. If we ask "why" a particular independent cause happens, we will find that there is a lower-level cause that triggers it-like a tree with leaves on branches that connect to the trunk that finally connects to the roots.

Several data collection and analysis tools are used regularly to help display the relationships among causes and help the team determine which causes are significant enough to warrant change. These include the cause-and-effect diagram, fault tree, and causal factor charts, as described below.

Although we will not describe the specific steps used to generate these tools, many books and articles are available from the American Society for Quality's online library, Quality InfoSearch (http://qic.asq.org/) to help team members learn the steps. We will focus on how these tools are used to help teams fulfill the third step of the structured problem-solving process. Examples are provided for each tool, but they are greatly simplified so that the emphasis is on the format and purpose of the tool, not the specific content of the example.

Cause-and-Effect Diagram

The cause-and-effect diagram, pioneered by Kaoru Ishikawa, arranges causes according to their level of importance or detail, depicting relationships and a hierarchy of events. A statement of the problem (e.g., the effect) appears on the right side of the diagram and is enclosed in a box. The left side shows major potential causes as branches and related subcauses as "twigs" (Kane, 1989). Since they resemble the skeleton of a fish, they often are called fishbone diagrams. This diagram can help team members search for root causes and compare the relative importance of different causes.

One format for cause-and-effect diagrams is called "4Ms and a PIE," or dispersion analysis. It organizes the causes into a basic set of seven major branches: machinery, methods, materials, measurements (4Ms), and people, information, and environment (PIE). This format is particularly useful when you want to trigger brainstorming about potential causes, such as the team does when it "gathers data about the current state" in the second step of the problem-solving process. It provides a simple direction, to guide the brainstorming process without limiting the input. Figure 1 shows a dispersion analysis cause-and-effect diagram for the problem of a car that won't start.

In a cause-systems diagram, the second type of cause-and-effect diagram, the major bones (branches) are categories of causes and the minor bones (twigs) are more specific, lower-level causes. This format is most useful for planning data analysis because it groups similar causes together; generally, similar causes are proved or disproved by the same set of collected data.

When teams struggle with differentiating symptoms and causes, they can use a modified cause system diagram that displays symptoms on the major bones and related causes on the minor bones. Figure 2 shows an example of this for a medical problem. Although for a particular patient any one of the causes shown on this diagram might prove the most significant and require treatment, it becomes clear that diabetes is the most commonly occurring cause and is therefore worth attention.

Less familiar to many practitioners is the process flow cause-and-effect diagram that displays the problematic process steps on the major bones and potential causes related to those steps on the minor bones. This format is particularly valuable when the onset of the problem is traceable to a particular date/dates, which usually will correspond to a change in process design or periodic influence on the process. Figure 3 displays a process flow cause-and-effect diagram that shows a checking account with an overdraft.

Fault Trees

A fault tree shows the different combinations of events that must occur for the problem, which is called a loss event, to occur (Ferry, 1988). The tree is developed level by level, starting at the top with the problem and working down. As each level is developed, data are gathered to determine if each branch could have contributed to the event. Development of branches that are eliminated is stopped. For each branch that is not eliminated, the tree is developed further until the root causes are identified. This results in a complete, effective, and efficient analysis.

The fault tree in Figure 4 depicts a situation where a critical pump is not operating as intended. That situation occurs because of both the online pump failure and no timely responses to the pump failure. The pump failure is potentially caused by any of the three subsidiary failures shown in the fault tree; similarly, the lack of timely response has two possible subsidiary causes. Verification testing eliminates three of the potential causes, indicating that the problem occurred because of two contingent causes: The downstream valve spuriously failed to open, and no position indication occurred on the control panel for the downstream valve or pump parameters.

Fault trees assist the team in thinking logically through the problem, data-gathering efforts and identifying causal factors and root causes in the most efficient and effective manner.

Causal Factor Chart

A causal factor chart displays the sequence of events that led up to the problem/loss event and the causeeffect relationship between those events (Department of Energy, 1985).

Causal factor charts provide a structure for team members to organize and analyze the information gathered during the investigation and to identify gaps and deficiencies in knowledge as the investigation progresses. The causal factor chart is simply a sequence diagram with logic tests that describe the events leading up to an occurrence, as well as the conditions surrounding these events (Department of Energy, 1991).

Causal factors are the key events that would have prevented the problem from occurring. They are steps in the process that either should not have occurred or should have occurred in a. different way. They may or may not be the same as the root causes of the problem.

For instance, suppose we go back to our example of the two doors. As we investigate the problem we learn that there are two reasons why employees do not consistently close door one. The problem-solving team finds that one group of employees never received training on the need to keep door one closed during general operations; in these cases, the root cause of the problem is the lack of training.

On the other hand, the team also learns that door one provides access to the sprinkler control. Occasionally, a trained employee will approach door one to close it and discover that fire department personnel are conducting an inspection, which requires to door to remain open. Unfortunately, local fire codes require that the sprinkler system be maintained in this area; therefore, the team/sponsors do not have the authority to change this usage of the door. In this case, the team has identified a causal factor that is not also a root cause. At that point, the team must investigate further to determine why the process continued during the fire department inspection since having door one open negatively affected its results. Their findings to that question become the root cause.

The causal factor chart illustrated in Figure 5 details an investigation for the occurrence of a flammable material that is released inside a building, as indicated in the solid block on the right side of the diagram.

The chart is developed from left to right and from bottom to top. The main event line at the bottom of the chart, which is outlined in color, summarizes the process that led up to the problematic incident.

Once the main event line is established, each of those events is investigated in more detail, asking "why" it occurred. They are the shaded boxes in this diagram, and the preventive solution developed by the team is designed to prevent their recurrence. In the case shown in Figure 5, the loss event had five causal factors.

This technique can be very useful for evaluating problems that involve many individuals and/or a large number of human actions. All of the data gathered during the analysis (from interviews, parts analysis, paper reviews, etc.) are combined on one chart. This helps to identify inconsistencies in the data, irrelevant information, and gaps in understanding the sequence of events.

Causal factor charts have benefits similar to those for fault trees. In addition, these charts provide a useful illustration to describe the sequence of events to others.

Verifying the Root Cause(s)

Data gathering and analysis lead the team to identify causal factors and root causes- based on logical conclusions that emerge from the investigation process. Before the team begins to develop preventive solutions, however, the structured problem-solving process requires that the root causes are verified through experimentation.

An experiment is, "a trial or special observation, made to confirm or disprove something uncertain; especially one under controlled conditions," according to Webster's Online Dictionary. It is, "an act or operation undertaken in order to discover some unknown principle or effect, or to test, establish, or illustrate some hypothesis, theory, or known truth; practical test; proof."

Since the six-step problem-solving process is based on the scientific method, it's easy to understand why the root causes identified through inductive reasoning must be verified through formal experimentation. Team members often find this task cumbersome and may even consider it a waste of time. By the time team members have reached agreement on the root causes, their hearts and minds have accepted the hypothesized causes completely, and they are eager to devise solutions.

As mentioned in the last issue, "Although instinct can serve as an important part of the process of proposing and narrowing the field of potential hypotheses, only reliable processes for gathering and analyzing data should be accepted for proving/disproving those hypotheses...."

There are many reasons why the experimental verification step is necessary, but the most obvious involves avoiding implementation of a solution that doesn't significantly reduce occurrences of the problem. In many cases, solution implementation requires a major investment of time, effort, and money. All personnel involved with the process must change in some way, disrupting their comfort zones and potentially affecting productivity during the learning period. Proceeding directly from logically induced causes to solution design could even be risky, introducing the potential for unanticipated safety, health, or environmental side effects. Much like the data gathering and analysis step, the key to success is to balance team members' experience and intuition against the need for reliable corroboration.

Research by social psychologists has indicated some interesting findings regarding how individuals in groups react to tasks. "Peer pressure" or group influence can be a highly motivating factor in the performance of simple tasks. On the other hand, it can undermine performance on more difficult tasks and even create "social loathing" and active resistance. The difference seems to relate to whether the people involved are confident in their ability to perform the task and view their peers as a supportive and enthusiastic audience. As Dr. David Myers writes, "What you do well, you are likely to do better in front of an audience, especially a friendly audience; what you normally find difficult may seem all but impossible when you are being watched."

How does this research affect team members' ability to stay on task and verify root causes? Well, it gives clues on how to structure the experimentation process. Many team coaches/facilitators are trained in experimental design and statistics. Their concept of a well-designed verification experiment would be far more rigorous than the typical team member's, creating a gap in perceived performance capability.

In most cases, however, a rigorous experimental design is not necessary to verify the root cause. Instead, the process needs only to ensure that the root cause is eliminated and to verify that the problem disappears or is greatly reduced under that circumstance. Ideally, the result that is addressed in the problem statement will change to approximately the same degree projected by the root-cause analysis.

A simple "rubber bands and chewing gum" approach to the experiment is usually sufficient to block the effect of the root cause temporarily and make it possible for the team to compare "before" and "after" performance results. The experiment need not evaluate every possible set of conditions, just the expected extremes and the normal operating level. If the problem goes away when the root cause is blocked under normal and extreme conditions, the risk of implementing an unnecessary solution is low enough to warrant moving forward. Use simple graphs and charts, rather than intensive statistical analyses to verify the experimental results.

Since the team members have knowledge and experience related to the problematic process, they are able to encourage each other to stay on task while authenticating the root cause. Team members will feel that they have the skills to conduct the experiment reliably, and they'll work more effectively. Once the significantly contributing root causes are proven, the team can move on to developing comprehensive solutions to permanently eliminate them.

In the Next Issue

In this article, root-cause analysis and verification were discussed, along with the differences among symptoms, causal factors, and root causes. The remainder of this series will show how to generate and maintain solutions.

References

American Institute of Chemical Engineers, Center for Chemical Process Safety, Guidelines for Investigating Chemical Process Incident. (New York, NY: AIChE, 1992).

Department of Energy, Accident/Incident Investigation Manual, second ed. (DOE/SSDC 76-45/27).

Department of Energy, Events and Causal Factors Charting. (DOE/SSDC 76-45/14, 1985).

Department of Energy, Root Cause Analysis Handbook. (WSRC-IM91-3, 1991 and earlier versions).

Ferry, Ted S., Modern Accident Investigation and Analysis, second ed. (New York, NY: John Wiley and Sons, 1988).

Kane, Victor E., Defect Prevention: Use of Simple Statistical Tools. (New York, NY: Marcel Dekker, Inc., 1989), p. 542.

Myers, David G., Psychology. (New York, NY: Worth Publishers, 2004), pp. 709-711.

Root Cause Analysis Handbook: A Guide to Effective Investigation. (Knoxville, TN: ABSG Consulting Inc., 1999).

James J. Rooney is a senior risk and reliability engineer with ABS Consulting's risk consulting division in Knoxville, TN. He earned a master's degree in nuclear engineering from the University of Tennessee. Rooney is a Fellow of ASQ and an ASQ certified quality auditor, quality auditor-HACCP, quality engineer, quality improvement associate, quality manager, and reliability engineer.

Deborah Hopen is editor of The Journal for Quality and Participation and News for a Change. She is a past president of the American Society for Quality. After more than 20years as a practitioner in quality and human resources management, she began consulting to the private, public, and not-for-profit sectors. She can be reached at debhopen@nventure.com.