Interpretation of shifted binary interpretive framework coefficients using a classical regression problem.
Gober, R. Wayne ; Freeman, Gordon L.
ABSTRACT
In regression analysis, dummy variables are usually introduced by
using binary coding and the designation of a single reference group for
the purpose of interpretation. Other interpretive frameworks are
available that allow comparison of designated category coefficients to
an "average" value for the overall sample dependent variable.
These shifted interpretive framework coefficients are usually easier to
understand and interpret than the coefficients based on the binary-coded
framework. Utilizing the processes suggested (1) by Suits and (2) by
Sweeney-Ulveling, the binary framework coefficients can be shifted to
allow for comparison about an "average" value. Each framework
shifting process is accomplished without the assistance of a computer
statistical package.
The purpose of this paper is to use binary framework coefficients,
taken from a classical regression problem presented by Chatterjee and
Price, 1977, to illustrate the two shifted interpretive frameworks and
to discuss the interpretation of the resultant coefficients.
INTRODUCTION
In most introductory courses in statistics, the binary interpretive
framework is the framework of choice when introducing the topic of dummy
variables in regression analysis (Chatterjee and Price, 1977; Daniel and
Terrell, 1992; Weiers, 2002). A recent Internet search for "dummy
variable interpretation" resulted in access to hundreds of links
for the request. A majority of these links emphasized interpretations
based on the binary-coded framework.
The binary interpretive framework uses binary coding and the
omission of one category of the dummy variable. The interpretation of
each category coefficient is made relative to the omitted category. When
a regression model has two or more dummy variables, the interpretation
of the binary dummy coefficients is even more complex to interpret.
Simplifying the interpretation of the binary interpretive framework
coefficients can be addressed by utilizing the shifting of the
binary-coded framework. The shifted interpretive frameworks allow
interpretations of the coefficients for every dummy classification
relative to an "average" for the dependent variable. The
shifted interpretive frameworks are preferable particularly when the
regression analysis is being employed by practitioners who are not
accomplished statisticians, or when the results of the analysis are to
be disseminated to individuals who are heterogeneous in regard to their
various dummy classification membership. The shifted interpretive
processes are accomplished without the use of a statistical computer
package. The purpose of this paper is to use existing binary
interpretive framework coefficients, taken from a classical regression
problem, to illustrate two interpretive framework shifting processes and
to discuss the interpretation of the resulting coefficients.
A CLASSICAL PROBLEM
In 1977, Chatterjee and Price presented a problem utilizing the
binary interpretive framework and included two dummy variables. A few
years later, Berenson, Levine and Goldstein (1983) used the
Chatterjee-Price problem in their presentation of the binary
interpretive framework. The authors considered the Chatterjee-Price
problem as a standard in regard to the presentation and discussion of
the binary interpretive frameworks of two or more dummy variables. This
paper utilizes the Chatterjee-Price binary interpretive framework
coefficients to illustrate and discuss two delineated shifted binary
interpretive frameworks. Thus, the Chatterjee-Price problem is
designated as a classical problem.
The Chatterjee-Price problem was developed from a salary survey of
computer professionals in a large corporation. The objective of the
survey was to identify and quantify those factors that determine salary
differential. The salary variable was measured in dollars per annum. The
explanatory variables included such factors as education, years of
experience, and management responsibilities. The years of experience
variable was measured in years. Education and management
responsibilities were treated as categorical variables. Education was
coded as 1, for completion of high school, 2 for completion of a college
degree, and 3 for completion of an advanced degree. Management
responsibility was coded as 1 for a person with such responsibility and
0 otherwise. The binary interpretive framework was used in the coding of
the two categorical variables: with advanced degree the omitted category
for education and without management responsibility the omitted category
for management experience.
The binary interpretive framework dummy variables for education
were D1 = (1,0) for high school graduate (HS), D2 = (0,1) for
bachelor's degree (BS), and D3 = (0,0) for advanced degree (AD).
The management responsibility dummy variables were D4 = (1,0) for
management responsibility and D5 = (0,0) otherwise. The codes used in
the computer solution are presented in Table 1.
For this study, the experience variable was recoded as deviations
from the mean of experience. The deviations are denoted as X. The
effects of all three variables on salary were measured using linear
regression analysis performed on a computer and are presented in Table
2.
The general regression model for the Chatterjee-Price classical
problem is as follows:
Y = [b.sub.0] + [b.sub.1] * X + [b.sub.2] * [D.sub.1] + [b.sub.3] *
[D.sub.2] + [b.sub.5] * [D.sub.4] (1)
where [b.sub.0] is the intercept or constant for the model. The
fitted regress ion model for the Chatterjee-Price problem is stated as:
Y = 15128.2 + 546.2 * X--2996.2 * D 1 + 147.8 * D 2 + 6883.5 * D 4
(2)
Appropriate residual and influence analyses should be applied to
the fitted model in order to satisfy the use and interpretation of the
model's estimated coefficients. For this study, the fitted model is
assumed to be satisfactory.
BINARY INTERPRETIVE FRAMEWORK INTERPRETATION
In terms of salary for computer professionals, the coefficients for
Equation (2) are interpreted as follow:
1. The experience coefficient, [b.sub.1], is $546.16, meaning that
each additional year of experience is estimated to be worth an annual
salary increment of $546.16;
2. The coefficient of the management responsibility dummy variable,
[b.sub.5], is estimated to be $6,883.50. This amount is interpreted to
be the average incremental value in annual salary associated with a
management position;
3. a) The HS education coefficient, [b.sub.2], is $-2,996.2,
measures the salary differential for the HS category relative to the AD
category;
b) The BS education coefficient, [b.sub.3], is $147.8, measuring
the salary differential for the BS category relative to the AD category;
and
c) The difference, [b.sub.3] - [b.sub.2], is $3,144.00, measures
the differential salary for the HS category relative to the BS category.
The salary differentials may be restated as follows: AD is worth
$2,996.20 more than HS, whereas BS is worth $147.80 more than AD, and BS
is worth about $3,144.00 more than HS. The fitted regression model
represented by Equation (2) assumes that these salary differentials hold
for all fixed levels of experience.
SHIFTED BINARY INTERPRETIVE FRAMEWORKS
The choice of a middle category as a reference category rather than
one of the extreme categories for a categorical variable is sometimes
considered a way of constructing a form of group comparison that
contrasts the categories to middle or "average" groups.
However, this procedure does not address the complexity of coefficient
interpretations because the interpretations remain relative to an
omitted category. Perhaps a better way of constructing category
comparisons is to shift the interpretive framework to an
"average" of the dependent variable. The shift in the
interpretative framework is such that the contrast of a regression
coefficient for a designated category is made to an "average"
value for the dependent variable and not to a specified zero-coded
category. While the shifting processes will yield numerically different
coefficients, the overall fit and significance of the regression model
remain unchanged. A main advantage of shifting the interpretative
framework of binary-coded dummy variables to an "average" is
that the coefficients are no longer sensitive to which class is treated
as the omitted class.
The process of shifting the interpretive framework of binary-coded
coefficients can be made without the use of a computer program by adding
a constant, k, to the coefficients within each set of coefficients for a
qualitative variable and subtracting k from the regression equation constant or intercept. The general relationship for determining k is:
Sum ([b.sup.*.sub.i]) = Sum (w ([b.sub.i] + k)) (3)
where [b.sub.i] represent the binary-coded regression coefficients,
[b.sup.*.sub.i] represent the shifted regression coefficients, and w
represents a weight for the importance of each coefficient within a set
of regression coefficients for a qualitative variable. The resulting
value of k yields the condition that the new set of coefficients,
[b.sup.*.sub.i], will average zero.
The two shifting processes used in this study are (1) the Suits
process (Suits, 1983) and (2) the Sweeney-Ulveling process (Sweeney and
Ulveling, 1972). Starting with binary-coded coefficients, usually
generated with the assistance of a statistical computer package, the
shifting process can be accomplished with or without the assistance of a
computer program. Each process uses the extended regression model. The
extended model includes dummy variables for the omitted categories. The
general extended regression model is stated as:
Y = [b.sub.0] + [b.sub.1] * X + [b.sub.2] * [D.sub.1] + [b.sub.3] *
[D.sub.2] + [b.sub.4] * [D.sub.3] + [b.sub.5] * [D.sub.4] + [b.sub.6] *
[D.sub.5] (4)
The fitted extended binary model for Equation (4) is stated as:
Y = 15128.2 + 546.2 * X - 2996.2 * D 1 + 147.8 * D 2 + 0 * D 3 +
6883.5 * D 4 + 0 * D 5 (5)
The coefficients [b.sub.4] and [b.sub.6] are for the omitted
category dummy variables, D3 and D5, respectively. For the binary
interpretive framework, these coefficients have a value of zero.
Suits (1983) suggested a shifting process, Shifting Process I,
which expresses the category regression coefficients as deviations from
an "average," where the "average" is the unweighted
mean of the dependent variable across all categories for a categorical
variable. In calculating the unweighted mean of means, each category
receives an equal weight of 1, regardless of the number of cases in that
category. Thus, when binary-coded coefficients are shifted using
Shifting Process I, the value of w in Equation (3) is set at 1. The
unweighted mean of all group means is reported as the regression
equation constant, [b.sub.0], and is the reference point from which all
category differences can be calculated. Since two sets of dummy
variables are included in the Chatterjee-Price problem, a constant must
be computed for each set and added to the coefficients of the respective
sets, [k.sub.1] and [k.sub.2]. The sum of the constants, k, is
subtracted from [b.sub.0]. Referring to Equation (5), the dummy
variables representing education are D1, D2 and D3, and the constant
[k.sub.1] is computed as--(-2996.2 + 147.8 + 0) / 3. Likewise, for the
dummy variables representing management responsibility, the constant
[k.sub.2] is computed as--(6883.5 + 0) / 2. The required constants are
[k.sub.1] = 949.5 and [k.sub.2] = -3441.8. The sum of the constants, k,
is -2492.3.
To shift the interpretation framework of the coefficients to an
"average" that is the overall mean of the dependent variable,
referred to as Shifting Process II, Sweeney and Ulveling (1972)
suggested using the sample proportions for categories of each
qualitative variable as weights in Equation (3). For the
Chatterjee-Price problem, a summary of the qualitative variables,
education level and management responsibility, is presented in Table 3.
Using Table 3 and Equation (5), for education level, [k.sub.1 is
computed as--(0.30435 * -2996.2 + 0.41304 * 147.8) and [k.sub.2] is
computed as--(0.43478 * 6883.5). The required constants are [k.sub.1] =
+ 850.8 and [k.sub.2] = -2992.8. The sum of the constants, k, is--2142.
As in Shifting Process I, k is subtracted from the constant and each
constant, [k.sub.1] and [k.sub.2], is added to the coefficients of their
respective dummy regression coefficients in Equation (5). For a more
complete discussion and illustration of these processes see Gober
(2003).
The adjustment terms for the delineated interpretive frameworks are
presented in Table 4. The Suits process yields coefficients that
estimate the difference between the mean value of annual salary for a
category and the unweighted mean of the means of annual salary across
all categories. The Sweeney-Ulveling process yields coefficients that
estimate the difference between the mean value of annual salary for a
category and the weighted mean for annual salary.
From Table 4, the shifted interpretive framework adjustments for
the Suits process are $949.50 for each level of education, $-3,441.80
for each management level, and $2,492.30 for the intercept. Likewise,
the adjustments for the Sweeney-Ulveling process are $850.80 for
education levels, $-2,992.80 for management levels and $2,142.00 for the
intercept. The extended binary framework coefficients and the shifted
framework coefficients are presented in Table 5.
The Suits interpretive framework model is stated as:
Y = 17620.5 + 546.2 X--2046.7 D1 + 1097.3 D2 + 949.5 D3 + 3441.8
D4--3441.8 D5 (6)
and the Sweeney-Ulveling interpretive framework model is stated as:
Y = 17270.2 + 546.2 X--2145.4 D1 + 998.7 D2 + 850.8 D3 + 3890.7
D4--2992.8 D5 (7)
INTERPRETATION OF SHIFTED INTERPRETIVE FRAMEWORK MODELS
The Suits and Sweeney-Ulveling models yield the same coefficient
for the quantitative variable, experience, as the binary interpretive
model. The Suits shifted coefficients for the categorical variables are
interpreted as follows:
1. a) Management responsibility, b5, adds $3,441.80 to the
"unweighted average" of annual salary (averaged over all
subgroups);
b) Without management responsibility, b6, subtracts $3,441.80 from
the "unweighted average" of annual salary (averaged over all
subgroups);
2. a) The HS education category, b2, $-2,046.7, measures the salary
differential for the HS category relative to the "unweighted
average" of annual salary;
b) The BS education category, b3, $1097.3, measures the salary
differential for the BS-category relative to the "unweighted
average" of annual salary; and
c) The AD education variable, b4, $949.5, measures the salary
differential for the advanced degree category relative to the
"unweighted average" of annual salary.
The Suits coefficients yield salary differentials for the
categorical variables as follows: AD is worth $2,996.20 (2046.7 + 949.5)
more than HS. BS is worth $147.80 (1097.3-949.5) more than an AD, and BS
is worth about $3,144.00 (2046.7 + 1097.3) more than HS. These salary
differentials are the same as the differentials using the binary
framework.
The Sweeney-Ulveling's model coefficients are interpreted as
follows:
1. a) Management responsibility, b5, adds $3,890.7 to the
"weighted average" of annual salary (averaged over all
subgroups);
b) Without management responsibility, b6, subtracts $3,890.7 from
the "weighted average" of annual salary (averaged over all
subgroups);
2. a) The HS education category, b2, $-2,145.4, measures the salary
differential for the HS category relative to the "weighted
average" of annual salary;
b) The BS education category, b3, $998.7, measures the salary
differential for the BS category relative to the "weighted
average" of annual salary; and
c) The AD education variable, b4, $850.8, measures the salary
differential for the advanced degree category relative to the
"weighted average" of annual salary.
The Sweeney-Ulveling's coefficients yield the following salary
differentials for the categorical variables: AD is worth $2,996.20
(2145.4 + 850.8) more than HS diploma, BS is worth $147.90
(998.7--850.8) more than AD, and BS is worth about $3,144.10 (2145.4 +
998.7) more than HS. Except for differences due to rounding, these
salary differentials are the same as the differentials for the binary
and Suits frameworks.
Equations (5), (6), and (7) differ in appearance, but all have the
same coefficient of determination and the same standard error of
estimate. The three frameworks allow for interpretations that are viewed
from different angles. Rather than assessing each category relative to a
particular omitted category that sometimes is chosen arbitrarily, the
shifted interpretive frameworks show the extent to which management and
education level salary coefficients deviate from the company
"average" salary. An approximation of the salary of any
individual or group of individuals can be computed by simply determining
to which classifications the individual or group belongs, and then
summing the increments (positive or negative) associated with these
classifications with the "average" salary. The exactness of
the approximation depends upon the degree of interaction that exists
between the explanatory variables, because the equations assume no
interaction among the explanatory variables.
As an additional note, when Equation (2) coefficients are
generated, the variances for the coefficients are generally available.
Thus, a t test applied to one of the coefficients will test salary in
the selected category from the salary for the omitted category for that
particular dummy variable. Whereas, a t test applied to one of the
coefficients of Equations (6) or (7) will test the salary level for that
category against the "average" salary for the company. A
significant t test for any selected category indicates the selected
category is significantly different from the "average" of the
response variable. The variances of the coefficients for Equations (6)
or (7) are different from those for Equation (2) and may be calculated
from the variance-covariance matrix for Equation (2). When using a
computer statistical package, dummy variable coding schemes are
available which yield the shifted interpretive coefficients and their
variances.
SUMMARY
Regression models that contain dummy variables are economically
fitted by using the binary interpretive framework, Equation (2). Once
Equation (2) is available, shifted interpretive frameworks (6) and (7)
can be used to form coefficients that are more easily interpreted. For
the Chatterjee-Price problem, the salary differentials for the
categorical variables were shown to have the same values and
interpretations for each of the three equations.
The Suits process of shifting the interpretive framework is quickly
accomplished with just the knowledge of the binary interpretive
framework coefficients. However, the Sweeney-Ulveling framework requires
the frequency of occurrence for the categories within each categorical
variable. When available, the Sweeney-Ulveling framework should be
chosen to simplify the interpretation of coefficients because the
coefficients are conveniently interpreted as deviations about the
average of the
dependent variable. The Suits framework provides coefficients that
are interpreted as deviations about the unweighted average of the
dependent variable when averaged across all subcategory averages. When
the Sweeney-Ulveling framework is not available, the Suits framework is
suggested as the framework of choice to simplify the interpretation of
coefficients when compared to the binary interpretive framework.
REFERENCES
Anderson, D., D. Sweeney and T. Williams (2002). Statistics for
Business and Economics, (8th Ed.), Cincinnati, OH: South-Western
Publishing.
Berenson, M., D. Levine and M. Goldstein (1983). Intermediate
Statistical Methods and Applications: A Computer Package Approach,
Englewood Cliffs, NJ: Prentice Hall.
Chatterjee, S., and B. Price (1977). Regression Analysis by
Example, New York, NY: John Wiley & Sons.
Daniel, W. and J. Terrell (1992). Business Statistics, (6th Ed.),
Boston, MA: Houghton Mifflin Company.
Gober, R. Wayne (2003). Shifting the Interpretive Framework of
Binary Coded Dummy Variables, Academy of Information and Management
Sciences Journal, 1(6), 1-8.
Suits, Daniel B. (1983). Dummy Variables: Mechanic v.
Interpretation, The Review of Economics and Statistics, (66), 177-180.
Sweeney, R. and E. Ulveling (1972). A Transformation for
Simplifying the Interpretation of Coefficients of Binary Variables in
Regression Analysis, The American Statistician, 5(26), 30-32.
Weiers, R. M. (2002). Introduction to Business Statistics, (4th
Ed.), Belmont, CA: Duxbury.
R.Wayne Gober, Middle Tennessee State University Gordon L. Freeman,
Middle Tennessee State University
Table 1
Computer Coding for the Binary Interpretive Framework
Dummy Variable Dummy Variable
Management
Education D1 D2 D3 Responsibility D4 D5
High School 1 0 0 Yes 1 0
Bachelor's Degree 0 1 0 No 0 1
Advanced Degree 0 0 1
Table 2
Binary Interpretive Framework Fitted Model
Coefficient
Variable Name Estimate Standard Error t
(Intercept) 15128.2 349.6 43.28
Experience (X) 546.2 30.52 17.9
High School Graduate (D1) -2996.2 411.8 -7.28
Bachelor's degree (D2) 147.8 387.7 0.38
Management Responsibility (D4) 6883.5 313.9 21.93
R2 = 95.7%
S = 1027
Table 3
Summary of the Cases (n = 46) for the Qualitative Variables and
Categories
Management
Cases Education Level Responsibility
High Bachelor's Advanced
School Degree Degree Yes No
Frequency 14 19 13 20 26
Proportion 0.30435 0.41304 0.28261 0.43478 0.56522
Table 4
Adjustments for Shifted Frameworks
Coefficients Suits Sweeney-Ulveling
(Intercept) (-) -2492.3 -2142
Education (+) 949.5 850.8
Management (+) -3441.8 -2992.8
Table 5
Interpretive Framework Coefficients
Framework
Sweeney-
Variable Name Binary Suits Ulveling
(Intercept) 15128.2 17620.5 17270.2
Experience (X) 546.2 546.2 546.2
High School Graduate (D1) -2996.2 -2046.7 -2145.4
Bachelor's Degree (D2) 147.8 1097.3 998.7
Advanced Degree (D3) 0 949.5 850.8
Management Responsibility (D4) 6883.5 3441.8 3890.7
No Management Responsibility (D5) 0 -3441.8 -2992.8