文章基本信息

标题：Interpretation of shifted binary interpretive framework coefficients using a classical regression problem.
作者：Gober, R. Wayne ; Freeman, Gordon L.
期刊名称：Academy of Information and Management Sciences Journal
印刷版ISSN：1524-7252
出版年度：2005
期号：January
语种：English
出版社：The DreamCatchers Group, LLC
摘要：In regression analysis, dummy variables are usually introduced by using binary coding and the designation of a single reference group for the purpose of interpretation. Other interpretive frameworks are available that allow comparison of designated category coefficients to an "average" value for the overall sample dependent variable. These shifted interpretive framework coefficients are usually easier to understand and interpret than the coefficients based on the binary-coded framework. Utilizing the processes suggested (1) by Suits and (2) by Sweeney-Ulveling, the binary framework coefficients can be shifted to allow for comparison about an "average" value. Each framework shifting process is accomplished without the assistance of a computer statistical package.

Interpretation of shifted binary interpretive framework coefficients using a classical regression problem.

Gober, R. Wayne ; Freeman, Gordon L.

ABSTRACT

In regression analysis, dummy variables are usually introduced by using binary coding and the designation of a single reference group for the purpose of interpretation. Other interpretive frameworks are available that allow comparison of designated category coefficients to an "average" value for the overall sample dependent variable. These shifted interpretive framework coefficients are usually easier to understand and interpret than the coefficients based on the binary-coded framework. Utilizing the processes suggested (1) by Suits and (2) by Sweeney-Ulveling, the binary framework coefficients can be shifted to allow for comparison about an "average" value. Each framework shifting process is accomplished without the assistance of a computer statistical package.

The purpose of this paper is to use binary framework coefficients, taken from a classical regression problem presented by Chatterjee and Price, 1977, to illustrate the two shifted interpretive frameworks and to discuss the interpretation of the resultant coefficients.

INTRODUCTION

In most introductory courses in statistics, the binary interpretive framework is the framework of choice when introducing the topic of dummy variables in regression analysis (Chatterjee and Price, 1977; Daniel and Terrell, 1992; Weiers, 2002). A recent Internet search for "dummy variable interpretation" resulted in access to hundreds of links for the request. A majority of these links emphasized interpretations based on the binary-coded framework.

The binary interpretive framework uses binary coding and the omission of one category of the dummy variable. The interpretation of each category coefficient is made relative to the omitted category. When a regression model has two or more dummy variables, the interpretation of the binary dummy coefficients is even more complex to interpret. Simplifying the interpretation of the binary interpretive framework coefficients can be addressed by utilizing the shifting of the binary-coded framework. The shifted interpretive frameworks allow interpretations of the coefficients for every dummy classification relative to an "average" for the dependent variable. The shifted interpretive frameworks are preferable particularly when the regression analysis is being employed by practitioners who are not accomplished statisticians, or when the results of the analysis are to be disseminated to individuals who are heterogeneous in regard to their various dummy classification membership. The shifted interpretive processes are accomplished without the use of a statistical computer package. The purpose of this paper is to use existing binary interpretive framework coefficients, taken from a classical regression problem, to illustrate two interpretive framework shifting processes and to discuss the interpretation of the resulting coefficients.

A CLASSICAL PROBLEM

In 1977, Chatterjee and Price presented a problem utilizing the binary interpretive framework and included two dummy variables. A few years later, Berenson, Levine and Goldstein (1983) used the Chatterjee-Price problem in their presentation of the binary interpretive framework. The authors considered the Chatterjee-Price problem as a standard in regard to the presentation and discussion of the binary interpretive frameworks of two or more dummy variables. This paper utilizes the Chatterjee-Price binary interpretive framework coefficients to illustrate and discuss two delineated shifted binary interpretive frameworks. Thus, the Chatterjee-Price problem is designated as a classical problem.

The Chatterjee-Price problem was developed from a salary survey of computer professionals in a large corporation. The objective of the survey was to identify and quantify those factors that determine salary differential. The salary variable was measured in dollars per annum. The explanatory variables included such factors as education, years of experience, and management responsibilities. The years of experience variable was measured in years. Education and management responsibilities were treated as categorical variables. Education was coded as 1, for completion of high school, 2 for completion of a college degree, and 3 for completion of an advanced degree. Management responsibility was coded as 1 for a person with such responsibility and 0 otherwise. The binary interpretive framework was used in the coding of the two categorical variables: with advanced degree the omitted category for education and without management responsibility the omitted category for management experience.

The binary interpretive framework dummy variables for education were D1 = (1,0) for high school graduate (HS), D2 = (0,1) for bachelor's degree (BS), and D3 = (0,0) for advanced degree (AD). The management responsibility dummy variables were D4 = (1,0) for management responsibility and D5 = (0,0) otherwise. The codes used in the computer solution are presented in Table 1.

For this study, the experience variable was recoded as deviations from the mean of experience. The deviations are denoted as X. The effects of all three variables on salary were measured using linear regression analysis performed on a computer and are presented in Table 2.

The general regression model for the Chatterjee-Price classical problem is as follows:

Y = [b.sub.0] + [b.sub.1] * X + [b.sub.2] * [D.sub.1] + [b.sub.3] * [D.sub.2] + [b.sub.5] * [D.sub.4] (1)

where [b.sub.0] is the intercept or constant for the model. The fitted regress ion model for the Chatterjee-Price problem is stated as:

Y = 15128.2 + 546.2 * X--2996.2 * D 1 + 147.8 * D 2 + 6883.5 * D 4 (2)

Appropriate residual and influence analyses should be applied to the fitted model in order to satisfy the use and interpretation of the model's estimated coefficients. For this study, the fitted model is assumed to be satisfactory.

BINARY INTERPRETIVE FRAMEWORK INTERPRETATION

In terms of salary for computer professionals, the coefficients for Equation (2) are interpreted as follow:

1. The experience coefficient, [b.sub.1], is $546.16, meaning that each additional year of experience is estimated to be worth an annual salary increment of $546.16;

2. The coefficient of the management responsibility dummy variable, [b.sub.5], is estimated to be $6,883.50. This amount is interpreted to be the average incremental value in annual salary associated with a management position;

3. a) The HS education coefficient, [b.sub.2], is $-2,996.2, measures the salary differential for the HS category relative to the AD category;

b) The BS education coefficient, [b.sub.3], is $147.8, measuring the salary differential for the BS category relative to the AD category; and

c) The difference, [b.sub.3] - [b.sub.2], is $3,144.00, measures the differential salary for the HS category relative to the BS category.

The salary differentials may be restated as follows: AD is worth $2,996.20 more than HS, whereas BS is worth $147.80 more than AD, and BS is worth about $3,144.00 more than HS. The fitted regression model represented by Equation (2) assumes that these salary differentials hold for all fixed levels of experience.

SHIFTED BINARY INTERPRETIVE FRAMEWORKS

The choice of a middle category as a reference category rather than one of the extreme categories for a categorical variable is sometimes considered a way of constructing a form of group comparison that contrasts the categories to middle or "average" groups. However, this procedure does not address the complexity of coefficient interpretations because the interpretations remain relative to an omitted category. Perhaps a better way of constructing category comparisons is to shift the interpretive framework to an "average" of the dependent variable. The shift in the interpretative framework is such that the contrast of a regression coefficient for a designated category is made to an "average" value for the dependent variable and not to a specified zero-coded category. While the shifting processes will yield numerically different coefficients, the overall fit and significance of the regression model remain unchanged. A main advantage of shifting the interpretative framework of binary-coded dummy variables to an "average" is that the coefficients are no longer sensitive to which class is treated as the omitted class.

The process of shifting the interpretive framework of binary-coded coefficients can be made without the use of a computer program by adding a constant, k, to the coefficients within each set of coefficients for a qualitative variable and subtracting k from the regression equation constant or intercept. The general relationship for determining k is:

Sum ([b.sup.*.sub.i]) = Sum (w ([b.sub.i] + k)) (3)

where [b.sub.i] represent the binary-coded regression coefficients, [b.sup.*.sub.i] represent the shifted regression coefficients, and w represents a weight for the importance of each coefficient within a set of regression coefficients for a qualitative variable. The resulting value of k yields the condition that the new set of coefficients, [b.sup.*.sub.i], will average zero.

The two shifting processes used in this study are (1) the Suits process (Suits, 1983) and (2) the Sweeney-Ulveling process (Sweeney and Ulveling, 1972). Starting with binary-coded coefficients, usually generated with the assistance of a statistical computer package, the shifting process can be accomplished with or without the assistance of a computer program. Each process uses the extended regression model. The extended model includes dummy variables for the omitted categories. The general extended regression model is stated as:

Y = [b.sub.0] + [b.sub.1] * X + [b.sub.2] * [D.sub.1] + [b.sub.3] * [D.sub.2] + [b.sub.4] * [D.sub.3] + [b.sub.5] * [D.sub.4] + [b.sub.6] * [D.sub.5] (4)

The fitted extended binary model for Equation (4) is stated as:

Y = 15128.2 + 546.2 * X - 2996.2 * D 1 + 147.8 * D 2 + 0 * D 3 + 6883.5 * D 4 + 0 * D 5 (5)

The coefficients [b.sub.4] and [b.sub.6] are for the omitted category dummy variables, D3 and D5, respectively. For the binary interpretive framework, these coefficients have a value of zero.

Suits (1983) suggested a shifting process, Shifting Process I, which expresses the category regression coefficients as deviations from an "average," where the "average" is the unweighted mean of the dependent variable across all categories for a categorical variable. In calculating the unweighted mean of means, each category receives an equal weight of 1, regardless of the number of cases in that category. Thus, when binary-coded coefficients are shifted using Shifting Process I, the value of w in Equation (3) is set at 1. The unweighted mean of all group means is reported as the regression equation constant, [b.sub.0], and is the reference point from which all category differences can be calculated. Since two sets of dummy variables are included in the Chatterjee-Price problem, a constant must be computed for each set and added to the coefficients of the respective sets, [k.sub.1] and [k.sub.2]. The sum of the constants, k, is subtracted from [b.sub.0]. Referring to Equation (5), the dummy variables representing education are D1, D2 and D3, and the constant [k.sub.1] is computed as--(-2996.2 + 147.8 + 0) / 3. Likewise, for the dummy variables representing management responsibility, the constant [k.sub.2] is computed as--(6883.5 + 0) / 2. The required constants are [k.sub.1] = 949.5 and [k.sub.2] = -3441.8. The sum of the constants, k, is -2492.3.

To shift the interpretation framework of the coefficients to an "average" that is the overall mean of the dependent variable, referred to as Shifting Process II, Sweeney and Ulveling (1972) suggested using the sample proportions for categories of each qualitative variable as weights in Equation (3). For the Chatterjee-Price problem, a summary of the qualitative variables, education level and management responsibility, is presented in Table 3.

Using Table 3 and Equation (5), for education level, [k.sub.1 is computed as--(0.30435 * -2996.2 + 0.41304 * 147.8) and [k.sub.2] is computed as--(0.43478 * 6883.5). The required constants are [k.sub.1] = + 850.8 and [k.sub.2] = -2992.8. The sum of the constants, k, is--2142. As in Shifting Process I, k is subtracted from the constant and each constant, [k.sub.1] and [k.sub.2], is added to the coefficients of their respective dummy regression coefficients in Equation (5). For a more complete discussion and illustration of these processes see Gober (2003).

The adjustment terms for the delineated interpretive frameworks are presented in Table 4. The Suits process yields coefficients that estimate the difference between the mean value of annual salary for a category and the unweighted mean of the means of annual salary across all categories. The Sweeney-Ulveling process yields coefficients that estimate the difference between the mean value of annual salary for a category and the weighted mean for annual salary.

From Table 4, the shifted interpretive framework adjustments for the Suits process are $949.50 for each level of education, $-3,441.80 for each management level, and $2,492.30 for the intercept. Likewise, the adjustments for the Sweeney-Ulveling process are $850.80 for education levels, $-2,992.80 for management levels and $2,142.00 for the intercept. The extended binary framework coefficients and the shifted framework coefficients are presented in Table 5.

The Suits interpretive framework model is stated as:

Y = 17620.5 + 546.2 X--2046.7 D1 + 1097.3 D2 + 949.5 D3 + 3441.8 D4--3441.8 D5 (6)

and the Sweeney-Ulveling interpretive framework model is stated as:

Y = 17270.2 + 546.2 X--2145.4 D1 + 998.7 D2 + 850.8 D3 + 3890.7 D4--2992.8 D5 (7)

INTERPRETATION OF SHIFTED INTERPRETIVE FRAMEWORK MODELS

The Suits and Sweeney-Ulveling models yield the same coefficient for the quantitative variable, experience, as the binary interpretive model. The Suits shifted coefficients for the categorical variables are interpreted as follows:

1. a) Management responsibility, b5, adds $3,441.80 to the "unweighted average" of annual salary (averaged over all subgroups);

b) Without management responsibility, b6, subtracts $3,441.80 from the "unweighted average" of annual salary (averaged over all subgroups);

2. a) The HS education category, b2, $-2,046.7, measures the salary differential for the HS category relative to the "unweighted average" of annual salary;

b) The BS education category, b3, $1097.3, measures the salary differential for the BS-category relative to the "unweighted average" of annual salary; and

c) The AD education variable, b4, $949.5, measures the salary differential for the advanced degree category relative to the "unweighted average" of annual salary.

The Suits coefficients yield salary differentials for the categorical variables as follows: AD is worth $2,996.20 (2046.7 + 949.5) more than HS. BS is worth $147.80 (1097.3-949.5) more than an AD, and BS is worth about $3,144.00 (2046.7 + 1097.3) more than HS. These salary differentials are the same as the differentials using the binary framework.

The Sweeney-Ulveling's model coefficients are interpreted as follows:

1. a) Management responsibility, b5, adds $3,890.7 to the "weighted average" of annual salary (averaged over all subgroups);

b) Without management responsibility, b6, subtracts $3,890.7 from the "weighted average" of annual salary (averaged over all subgroups);

2. a) The HS education category, b2, $-2,145.4, measures the salary differential for the HS category relative to the "weighted average" of annual salary;

b) The BS education category, b3, $998.7, measures the salary differential for the BS category relative to the "weighted average" of annual salary; and

c) The AD education variable, b4, $850.8, measures the salary differential for the advanced degree category relative to the "weighted average" of annual salary.

The Sweeney-Ulveling's coefficients yield the following salary differentials for the categorical variables: AD is worth $2,996.20 (2145.4 + 850.8) more than HS diploma, BS is worth $147.90 (998.7--850.8) more than AD, and BS is worth about $3,144.10 (2145.4 + 998.7) more than HS. Except for differences due to rounding, these salary differentials are the same as the differentials for the binary and Suits frameworks.

Equations (5), (6), and (7) differ in appearance, but all have the same coefficient of determination and the same standard error of estimate. The three frameworks allow for interpretations that are viewed from different angles. Rather than assessing each category relative to a particular omitted category that sometimes is chosen arbitrarily, the shifted interpretive frameworks show the extent to which management and education level salary coefficients deviate from the company "average" salary. An approximation of the salary of any individual or group of individuals can be computed by simply determining to which classifications the individual or group belongs, and then summing the increments (positive or negative) associated with these classifications with the "average" salary. The exactness of the approximation depends upon the degree of interaction that exists between the explanatory variables, because the equations assume no interaction among the explanatory variables.

As an additional note, when Equation (2) coefficients are generated, the variances for the coefficients are generally available. Thus, a t test applied to one of the coefficients will test salary in the selected category from the salary for the omitted category for that particular dummy variable. Whereas, a t test applied to one of the coefficients of Equations (6) or (7) will test the salary level for that category against the "average" salary for the company. A significant t test for any selected category indicates the selected category is significantly different from the "average" of the response variable. The variances of the coefficients for Equations (6) or (7) are different from those for Equation (2) and may be calculated from the variance-covariance matrix for Equation (2). When using a computer statistical package, dummy variable coding schemes are available which yield the shifted interpretive coefficients and their variances.

SUMMARY

Regression models that contain dummy variables are economically fitted by using the binary interpretive framework, Equation (2). Once Equation (2) is available, shifted interpretive frameworks (6) and (7) can be used to form coefficients that are more easily interpreted. For the Chatterjee-Price problem, the salary differentials for the categorical variables were shown to have the same values and interpretations for each of the three equations.

The Suits process of shifting the interpretive framework is quickly accomplished with just the knowledge of the binary interpretive framework coefficients. However, the Sweeney-Ulveling framework requires the frequency of occurrence for the categories within each categorical variable. When available, the Sweeney-Ulveling framework should be chosen to simplify the interpretation of coefficients because the coefficients are conveniently interpreted as deviations about the average of the

dependent variable. The Suits framework provides coefficients that are interpreted as deviations about the unweighted average of the dependent variable when averaged across all subcategory averages. When the Sweeney-Ulveling framework is not available, the Suits framework is suggested as the framework of choice to simplify the interpretation of coefficients when compared to the binary interpretive framework.

REFERENCES

Anderson, D., D. Sweeney and T. Williams (2002). Statistics for Business and Economics, (8th Ed.), Cincinnati, OH: South-Western Publishing.

Berenson, M., D. Levine and M. Goldstein (1983). Intermediate Statistical Methods and Applications: A Computer Package Approach, Englewood Cliffs, NJ: Prentice Hall.

Chatterjee, S., and B. Price (1977). Regression Analysis by Example, New York, NY: John Wiley & Sons.

Daniel, W. and J. Terrell (1992). Business Statistics, (6th Ed.), Boston, MA: Houghton Mifflin Company.

Gober, R. Wayne (2003). Shifting the Interpretive Framework of Binary Coded Dummy Variables, Academy of Information and Management Sciences Journal, 1(6), 1-8.

Suits, Daniel B. (1983). Dummy Variables: Mechanic v. Interpretation, The Review of Economics and Statistics, (66), 177-180.

Sweeney, R. and E. Ulveling (1972). A Transformation for Simplifying the Interpretation of Coefficients of Binary Variables in Regression Analysis, The American Statistician, 5(26), 30-32.

Weiers, R. M. (2002). Introduction to Business Statistics, (4th Ed.), Belmont, CA: Duxbury.

R.Wayne Gober, Middle Tennessee State University Gordon L. Freeman, Middle Tennessee State University

Table 1
Computer Coding for the Binary Interpretive Framework

 Dummy Variable Dummy Variable

 Management
Education D1 D2 D3 Responsibility D4 D5

High School 1 0 0 Yes 1 0
Bachelor's Degree 0 1 0 No 0 1
Advanced Degree 0 0 1

Table 2
Binary Interpretive Framework Fitted Model

 Coefficient
Variable Name Estimate Standard Error t

(Intercept) 15128.2 349.6 43.28
Experience (X) 546.2 30.52 17.9
High School Graduate (D1) -2996.2 411.8 -7.28
Bachelor's degree (D2) 147.8 387.7 0.38
Management Responsibility (D4) 6883.5 313.9 21.93

R2 = 95.7%
S = 1027

Table 3
Summary of the Cases (n = 46) for the Qualitative Variables and
Categories

 Management
Cases Education Level Responsibility

 High Bachelor's Advanced
 School Degree Degree Yes No

Frequency 14 19 13 20 26
Proportion 0.30435 0.41304 0.28261 0.43478 0.56522

Table 4
Adjustments for Shifted Frameworks

Coefficients Suits Sweeney-Ulveling

(Intercept) (-) -2492.3 -2142
Education (+) 949.5 850.8
Management (+) -3441.8 -2992.8

Table 5
Interpretive Framework Coefficients

 Framework

 Sweeney-
Variable Name Binary Suits Ulveling

(Intercept) 15128.2 17620.5 17270.2
Experience (X) 546.2 546.2 546.2
High School Graduate (D1) -2996.2 -2046.7 -2145.4
Bachelor's Degree (D2) 147.8 1097.3 998.7
Advanced Degree (D3) 0 949.5 850.8
Management Responsibility (D4) 6883.5 3441.8 3890.7
No Management Responsibility (D5) 0 -3441.8 -2992.8