An attribute repeatability & reproducibility study.
Simion, Carmen ; Bondrea, Ioan
Abstract: A high degree of competitiveness in the national and
international manufacturing markets has created a demand for process
controls that build quality into products, rather than relying on
inspection to sort out costly rejects. Manufacturers use Statistical
Process Control to determine and control the variations in products
during the manufacturing process, but the SPC alone does not ensure that
the measurement of the product is correct. Unfortunately, organizations
frequently overlook the impact of not having quality measurement
systems. This paper describes an attribute Repeatability &
Reproducibility study, made in a local company from the Sibiu region
supplying parts for the automotive industry.
Key words: attribute data, measurement system analysis,
repeatability and reproducibility.
1. INTRODUCTION
Competition for markets has caused manufacturers to question their
reliance on inspections as a means of achieving quality in the products
they build. Manufacturers use Statistical Process Control (SPC) to
determine and control the variations in products during the
manufacturing process; this allows continued improvement of processes
and products. However, SPC alone does not ensure that the measurement of
the product is correct. In fact, measurements taken on products have
hidden errors that cause unknown variations. These errors are
measurement system errors: gage errors (any device used to obtain
measurements; frequently used to refer specifically to the devices used
on the shop floor; includes Go/No-Go devices), fixture errors, personnel
errors, procedure errors and environmental influences on measurements
(Hart, 1994).
Measurement system errors can be classified into two categories:
accuracy - bias, linearity and stability and precision - repeatability
and reproducibility. To estimate repeatability, each appraiser measures
each part at least twice. To estimate reproducibility, at least two
appraisers must measure the parts.
To assess the adequacy of a measurement system, different
strategies have been develop that are outlined in the "Measurement
Systems Analysis" reference manual published by the Automotive
Industry Action Group (AIAG, 2002).
2. PROCEDURE FOR AN ATTRIBUTE GAGE R&R STUDY
An approach commonly used is a repeatability and reproducibility
(R&R) study. This attempts to determine measurement errors by making
repeated measurements of products on a gage and comparing the results to
the product variation to determine measurement influence; it also
requires comparison of those measurements with measurements of the same
group of products by other appraisers to determine appraiser influence.
The gage R&R study as it applies to continuous data is widely used
and written about. But another form of this tool, the attribute gage
R&R (Windor, 2003) can improve process yields and reduce costs
dramatically.
The attribute gage R&R study can be used in a very simple
form--the short method or expanded to include confidence intervals and
probabilities of defects within particular ranges - the long method. The
study is conducted by selecting at least 25 or 30 "conforming"
and "nonconforming" parts. The selected parts must be
numbered, too. Two or three appraisers have to measure each part
independently at least two times in a random order that will prevent
appraiser bias. The appraisers record their results in a check sheet and
it results the following:
* Repeatability or appraiser (inspector) score: ability of the
appraiser to "repeat" his/her decisions for the same inspected
part (agreed with own results). Calculated as "number of agreements
per number of parts inspected" and commonly referred in percentage;
90% is usually considered as acceptable (AIAG, 2002);
* Reproducibility or between appraisers (inspectors): ability of
all the appraisers as a whole to "repeat" their decisions
among them (agreed with each other on all trials). Calculated as
"number of agreements among all appraisers per number of parts
inspected", in percentage; 90% again is acceptable (AIAG, 2002).
Since the exact condition of each part is already known, the data
analysis indicates:
> Agreement or concordance (agreed with standard): each
appraiser versus standard. Calculated as "number of parts that
agree with the standard per number of parts inspected", in
percentage; 90% again (AIAG, 2002).
> Disagreement OK/NOK, Wrong Classification or Miss. Calculated
as "number of parts classified as conforming when in fact is
nonconforming per number of nonconforming parts", in percentage; 2%
is acceptable (AIAG, 2002). This is a serious type of error since a
nonconforming part is accepted.
> Disagreement NOK/OK or False Alarm. Calculated as "number
of parts classified as nonconforming when in fact is conforming per
number of conforming parts", in percentage; 5% is acceptable (AIAG,
2002). This type of error is not as serious as a Wrong Classification,
since a conforming part is rejected. However, rejecting a conforming
part causes rework and re inspection to be performed when it is not
necessary. If False Alarm gets too large, large sums of money are wasted
on rework and re inspection.
> Disagreement mixed or simply Mixed; calculated as "number
of parts that were classified inconsistently along the inspections per
number of parts inspected".
> Overall effectiveness or all appraisers vs. standard: agreed
with each other and with the standard. Calculated as "the
percentage of time each inspector agreed with himself and with the
standard"; 90% is usually considered as acceptable (AIAG, 2002);
The decision criteria in an attribute R&R study are:
* > 90% => the inspection process is acceptable;
* 70% - 90% => corrective action is required, focus on specific
area and then, the R&R study must be redone;
* < 70% => the inspection process is unacceptable
In conclusion, the inspection process is acceptable if all
measurement decisions agree. If the measurement decisions do not agree,
the inspection process must be improved and reevaluated. If the
inspection process cannot be improved, it is unacceptable and an
acceptable alternate measurement system should be found.
3. CASE STUDY
An investigation was conducted, in a local company from Sibiu
supplying parts for the automotive industry, to analyze the inspection
process for a new product that is to be introduced in the serial
production. The material aspect of the parts was 100% visually
inspected. The simplest form of gage R&R was applied. The study was
conducted with a sample of 30 parts selected by their degree of
compliance to the actual engineering requirement: fourteen of the parts
were considered unacceptable to varying degrees and sixteen were
considered acceptable. The presented case study (shown in table 1) is a
typical form for a process which must be improved, because:
--inspector 1 had agreement between the first and the second
attempt on 25 of the 30 parts, so inspector 1 agreed with himself in
83,3% of the cases; he also had 14 of 30 parts when the results were
consistent between the trials and with the standard, so inspector 1
agreed with the standard in 46,7% of the cases;
--inspector 2 had agreement between the first and the second
attempt on 24 of the 30 parts, so inspector 2 agreed with himself in 80%
of the cases; he also had 13 parts when the results were consistent
between the trials and with the standard, so inspector 2 agreed with the
standard in 43,3% of the cases;
--inspector 3 had agreement between the first and the second
attempt on 23 of the 30 parts, so inspector 3 agreed with himself in
76,7% of the cases; he also had 17 parts when the results were
consistent between the trials and with the standard, so inspector 3
agreed with the standard in 56,7% of the cases;
--in total, the percentage of time each inspector agreed with his
own results and with the standard was 26,7%;
--the inspector score percentage (repeatability)--between 70% and
90%, the agreement with standard percentage--under 70% and the overall
effectiveness--under 70% show that all inspectors must be retrained;
--in 33,3% of the cases, the inspectors agreed with each other on
both trials but not necessarily with the standard. The above
reproducibility (between inspector) percentage--under 70% underlines the
necessity to clarify the acceptance criteria/admissible limits for the
aspect of part material; either the inspectors are not good trained,
either they can not make the difference between the conforming and
nonconforming parts;
--the risk to send nonconforming parts to the customer is high:
26,7% for inspector 1 and inspector 2, and 10% for inspector 3. So,
inspector 1 and 2 consistently accepted discrepant parts on both trials
on 8 occasions, while inspector 3 only on 3 occasions. The conclusion is
that the acceptance criteria are not enough well known, so there is the
risk of PPM; a priority is to define inspection standards.
--there is also a risk to reject conforming parts, not so high, but
above the AIAG recommended percentage, too; the percentages of 10% for
inspector 1 and 3, and respectively 6 % for inspector 2 denote a risk of
supra inspection. So, inspector 1 and 3 are much likely to reject a part
than is inspector 2.
In conclusion, after this case study the following problems were
identified: risk of PPM nonconforming parts at the customer, risk of
supra inspection and inconsistency of the appraisers to repeat his
decisions for the same inspected part.
The data show that inspectors did not understand the requirements,
took a more critical view of the requirements or were just afraid to
accept any part that had a small inconsequential defect. To address the
problem, the company's engineering and quality representatives
worked with the customer's quality group to create a standard for
the most common defect types along with minimum/maximum type photos and
all inspectors were trained in this specification and the actual
requirements were discussed.
In the weeks after the training session, the gage R&R study was
performed again using 20 of the original parts, with the results shown
in table 2.
The new results show that the inspection process was improved: the
repeatability of all inspectors is greater or equal with 90%, but
because reproducibility and overall effectiveness percentage is 70%
(between 70% and 90%) and there also is a risk to send nonconforming
parts to the customer (Disagreement A/R or Wrong Classification) greater
than 2% more corrective action is required, focus especially on more
inspection/test samples (inspection standards) for the most common
defect types and afterwards inspector's training.
4. CONCLUSION
The application of the attribute R&R demonstrates the variation
in inspection methods between inspectors when inspection standards are
not utilized. The control phase of a project involving visual
repeatability and reproducibility is an important consideration.
Publication and ongoing document control for visual standards, along
with periodic training, are critical to ensure visual inspection methods
remain consistent.
An attribute gage R&R can normally be performed at very low
cost with little impact on the process. Significant benefits can be
gained from looking at even the most basic processes, because an
attribute gage R&R can improve process yields and reduce costs
dramatically.
5. REFERENCES
Automotive Industry Action Group (2002). Measurement Systems
Analysis-MSA Reference Manual, 3rd edition, AIAG, U.S.A.
Hart, M. & Hart, R. F. (1994). The Evaluation of a Measurement
System. Production and Inventory Management Journal, (Fourth Quarter
1994), page 22-26.
Hill, T. & Lewicki, P. (2006). STATISTICS Methods and
Applications. StatSoft, Tulsa. Available from:
http://www.statsoft.com/textbook/stathome.html.
Montgomery, D.C. (1994). Design and Analysis of Experiments, 5th
edition, John Wiley & Sons, New York.
Wheeler, D. J. & Lyday, R. W. (1989). Evaluating the
Measurement Process, 2nd edition, SPC Press, Inc., Knoxville, Tennessee.
Windor, S.E. (2003). Attribute Gage R&R. Six Sigma Forum
Magazine, Vol. 2, No. 4, (August 2003), page 23-28.
Table 1. Initial attribute gage R&R results (%)
Assessments Inspect.1 Inspect.2 Inspect.3
Repeatability 83,3 80,0 76,7
Agreement 46,7 43,3 56,7
Wrong Classif. 26,7 26,7 10,0
False Alarm 10,0 6,67 10,0
Mixed 16,7 20,0 23,3
Reproducibility 33,3
Overall effect. 26,7
Table 2. Second attribute gage R&R results (%)
Assessments Inspect.1 Inspect. Inspect.3
Repeatability 100,0 95,0 90,0
Agreement 90,0 75,0 90,0
Wrong Classif. 10,0 20,0 0,0
False Alarm 0,0 0,0 0,0
Mixed 0,0 5,0 10,0
Reproducibility 70,0
Overall effect. 70,0