文章基本信息

标题：An empirical analysis of factors affecting honors program completion rates.
作者：Savage, Hallie ; Raehsler, Rod D. ; Fiedor, Joseph 等
期刊名称：Journal of the National Collegiate Honors Council
印刷版ISSN：1559-0151
出版年度：2014
期号：March
语种：English
出版社：National Collegiate Honors Council
关键词：Educational programs;Honors curriculum

An empirical analysis of factors affecting honors program completion rates.

Savage, Hallie ; Raehsler, Rod D. ; Fiedor, Joseph 等

INTRODUCTION

One of the most important issues in any educational environment is identifying factors that promote academic success. A plethora of research on such factors exists across most academic fields, involving a wide range of student demographics, and the definition of student success varies across the range of studies published. While much of the research is devoted to looking at student performance in particular courses and concentrates on examination scores and grades, many authors have directed their attention to student success in the context of an entire academic program; student success in this context usually centers on program completion or graduation and student retention. The analysis in this paper follows the emphasis of McKay on the importance of conducting repeated research on student completion of honors programs at different universities for different time periods. This paper uses a probit regression analysis as well as the logit regression analysis employed by McKay in order to determine predictors of student success in the honors program at a small, public university, thus attempting to answer McKay's call for a greater understanding of honors students and factors influencing their success. The use of two empirical models on completion data, employing different base distributions, provides more robust statistical estimates than observed in similar studies

PREVIOUS LITERATURE

The early years of our research was concurrent with the work of McKay, who studied the 2002-2005 entering honors classes at the University of North Florida and published his work in 2009. The development of our methodology was dependent on important previous work in this area. Yang and Raehsler, in an article published in 2005, described their use of an ordered probit model to show that the total score on the Scholastic Aptitude Test (SAT), the cumulative grade point average, and the choice of academic major significantly influenced expected grades in an intermediate microeconomics course. The use of a probit model, which differs in only underlying probability distributions, is mimicked in this paper, which also uses logit model analysis.

Research in program effectiveness rather than success in a particular class varies across many different student cohorts. In a 2007 qualitative analysis of field research, for instance, Creighton outlines important factors influencing graduation rates among minority student populations. The study concentrates equally on institutional factors, personal factors, environmental factors, individual student attributes, and socio-cultural characteristics to explain differences in graduation rates for underrepresented student populations. The basic issues in that study are complex, and unfortunately no clear empirical evidence is provided. Zhang et al. do provide an earlier (2002) empirical analysis of student success in engineering programs across nine universities for the years 1987 through 2000. That paper boasted a sample of 39,277 students and used a multiple logistic regression model to show that high school grade point average and mathematics scores on the Scholastic Aptitude Test (SAT) were positively correlated with an increase in graduation and retention rates among engineering students. Interestingly, verbal scores on the SAT examination were negatively correlated with graduation and retention rates among engineering students in the longitudinal study. In 2007, Geiser and Santelices described expanding this work in a study of the relevance of high school GPAs to college GPAs among 80,000 students admitted to the University of California system. Using a linear regression model, they found that high school GPAs were consistently the strongest predictors of college grades across all academic disciplines and campuses in the study. They determined that this predictive power actually became stronger after the freshman year.

McKay used a logit regression model to study retention in the honors program at the University of North Florida. Using a sample of 1017 students in the honors program from 2002 through 2005, he found that high school GPA was the best predictor of program completion. The study also found that gender was a strong predictor of student success in the honors program while SAT scores did not display a significant relationship with program completion. Our study builds on this work by employing a different model and incorporating the academic discipline of each student in the analysis. We also divide the SAT score between math and verbal scores similar to that observed in the 2002 Zhang et al. study.

In more recent work published in 2013, Keller and Lacey studied student participation levels in the large honors program at Colorado State University and found that female students and students majoring in the liberal arts and natural sciences were more active in the program. Male students, along with business and engineering majors, tended to be less active in the program as measured by an index developed by the authors. Also in 2013, Goodstein and Szarek discussed program completion from an alternative view; rather than empirically studying factors influencing program completion, the authors outlined common reasons why students might not complete an honors program, especially the need for extra time to study for professional school entrance examinations, an inability to find a workable thesis topic, and additional coursework required after adding another academic major. This area of inquiry is interesting as it provides a possible future line of empirical research.

DATA

Data for this study came from Clarion University, a public university in western Pennsylvania. Enrollment at Clarion University is approximately 6,000, and the school is part of the Pennsylvania System of Higher Education, a collection of fourteen universities that collectively make up the largest higher education provider in the state of Pennsylvania (106,000 students across all campuses). The sample of 449 individuals used for this study includes students who were admitted to the Clarion University Honors Program for the years 2003 through 2013. Data for each student includes whether or not the student successfully completed the Honors Program (COMP), the college affiliation of his or her academic major (using three dummy variables named ARTSC for the College of Arts and Sciences, BUS for the College of Business Administration, and EDUC for the College of Education), the student's gender (GENDER), high school grade point average (HSGPA), and both verbal and math SAT scores (VSAT and MSAT). The size of the entering class (SIZE) is also included in the analysis. Dummy variables included in the model all take values of either 0 or 1 and are meant to distinguish between different qualitative characteristics of students in the sample. The dependent variable in this analysis, COMP, takes on a value of 1 if the student successfully completed the Clarion University Honors Program and 0 otherwise. Likewise, GENDER is assigned a value of 1 when the student is male and a 0 when the student is female. ARTSC is set at 1 if the student is in the College of Arts and Sciences (0 otherwise), BUS is 1 if the student is in the College of Business Administration (0 otherwise), and EDUC is 1 if the student is in the College of Education (0 otherwise).

Given differences in requirements and grading practices across academic disciplines, there is some theoretical support for including dummy variables on academic major (or the college of the academic major) in the analysis. McKay found gender and high school GPA to be significant predictors of success in honors program retention using a slightly different empirical model. As a consequence, we include these variables in our analysis. Table 1 below provides descriptive statistics for each variable in the sample.

Descriptive statistics results show that a little over 66% of students in the sample completed the Clarion University Honors Program during the sample period. Approximately 32% in the sample are males. Academic major by college affiliation of individuals in the sample breaks down to approximately 43% in the College of Arts and Sciences, 13% in the College of Business Administration, and 44% in the College of Education. Students in the sample have an average high school GPA of 3.82 with an average SAT score (combining math and verbal scores) of 1240. Since students in this sample are part of a university honors program, average grades and test scores far exceed similar statistics for the general university student population. The SIZE variable, measuring the number of students in each entering class, averages nearly 42 students per year. With an average 66% completion rate, one would anticipate seeing around 28 students complete the honors program each year.

The measure of skewness provides information on how each variable is distributed around the mean and introduces the first statistical test in this analysis. A value of zero indicates a perfectly symmetric distribution; the normal distribution is the classic example. A significantly negative skewness value suggests a long tail (or relatively few observations) in the lower part of the distribution. A significantly positive skewness measure suggests the reverse. Critical analysis of skewness statistics displayed in Table 1 will be conducted at the beginning of the results section below.

RESULTS AND DISCUSSION

Before looking at the empirical estimates of the logit and probit models described in the appendix, it is worthwhile to look back at basic statistics involving the distribution for the data set utilized. Measures of skewness do not appear to provide surprising results in Table 1. Entering high school GPA is highly skewed to the left indicating that very few students admitted have low GPAs. In addition to summarizing descriptive statistics for variables used in this study, we also need to look at how the measures are correlated with each other to obtain a sense of what variables to consider in the final empirical model. Table 2 displays a correlation matrix of all variables collected in the sample. A strong positive correlation exists between the high school GPA and the completion rate for the honors program. A weaker but statistically significant positive relation exists between the business student dummy variable and honors program completion. As a consequence, students with higher high school grades and who chose to be business majors have a higher probability of completing the honors program. No other variables are significantly correlated with completion rate.

Other values in the correlation matrix are interesting from a pure discussion standpoint and might be worthy of more detailed analysis in the future. For example, some gender differences occur regarding SAT performance and choice of academic major in this sample of honors students. Male students in the sample seem significantly more likely to score higher on the math portion of the SAT given the positive correlation between GENDER and MSAT. Some slight negative correlation between GENDER and VSAT suggests that female students are more likely to score higher on the verbal section of the SAT, but this relationship is not statistically significant. Likewise, male students are more likely to choose an academic major in the College of Arts and Sciences (positive correlation between GENDER and ARTSC) while females are more likely to choose a major in education among students in this select sample (negative correlation between GENDER and EDUC). High school GPA has a significant positive correlation with scores in the math section of the SAT in this sample but not with verbal scores; this is interesting given that the correlation matrix establishes a positive correlation between HSGPA and COMP and between HSGPA and MSAT but not between COMP and MSAT, seeming to indicate that a high GPA in high school among students qualifying for the honors program helps predict completion in the program along with higher scores on the math section of the SAT. High scores on the math section of the SAT alone, however, do not help predict completion rates in the honors program, suggesting some inherent measure in high school grades that is not captured in the math portion of the SAT. Some would argue that high school grades incorporate a measure of effort that would positively link to completion rates for any academic program. A specific empirical determination of this linkage remains for future study.

Figures 1 and 2 provide an illustrative example of how completion rates differ across academic majors and genders in the sample used for this analysis. Figure 1 clearly indicates that the average completion rates among students with majors in the College of Business Administration are substantially higher than honors program completion rates for students in other colleges. Figure 2 illustrates that completion rates are somewhat higher among female students in the honors program than among male students in the program. While results across gender are similar to that seen in McKay, the results concerning academic majors are substantially different than those observed in Keller and Lacy.

A primary drawback to relying entirely on correlation data is that the precise relation between program completion rate (COMP) and each of the explanatory variables is hidden. For example, it is difficult to predict how a change in the high school GPA will influence the probability of honors program completion without a more detailed empirical model. Clearly, the explanatory variables are linked, and simple correlation will not typically provide a complete story of how COMP is influenced by other measures in the sample. Also problematic is a study of correlation values when the primary variable of interest is qualitative (COMP takes on a value of either 0 or 1).

The virtues of the logit and probit models have been described above, and in Table 3 we present maximum likelihood estimates of the latent regression in the most relevant logit and probit model specifications. Logit model 1 includes all the variables in the specification while logit model 2 includes only the most statistically significant explanatory variables (using a 0.10 significance level as a determinant). Likewise, probit model 1 and probit model 2 use the same model specifications for the probit model estimation procedure. In both general specifications, high school GPA is the most important predictor of honors program completion rates while the business college dummy variable (BUS) is significant at the 0.10 level. No other explanatory variables were found to be statistically significant.

From a statistical standpoint, results of the latent regression estimates fit the data well when observing the likelihood-ratio (LR) statistic. All p-values for LR are well below 0.01, indicating that variations in the program completion variable (COMP) are substantially explained by variations in the explanatory variable chosen in the analysis. As stated above, high school GPA and the business college dummy variables are most significant. The positive sign on the coefficient for HSGPA indicates that a higher high school GPA predicts a higher probability of honors program completion. Likewise, the positive sign of BUS suggests that students with majors in the College of Business Administration are more likely to complete the program than students with majors in other colleges. While SAT scores are used to screen students wishing to enter the honors program, they do not help predict completion rate probabilities in the program. Gender is also not a significant predictor of program completion.

For more precision, marginal effects of each variable on COMP using the logit and probit model estimates need to be calculated. Estimates above for the latent regression equations do not incorporate the non-linear nature of probability. Using the cumulative exponential and normal distributions, marginal effects are calculated for each of the four specifications presented in Table 3. Empirical results matching the marginal effects on program completion (COMP) with each change in explanatory variable are presented in Table 4.

The variables that matter the most in Table 4 are high school GPA and the business school dummy variable, so the logit model 2 and probit model 2 are the primary specifications to consider. Results are provided for changes in the high school GPA, including an increase of 0.2, an increase of 0.5, and an increase of 1.0. Results for the logit model specification show that an increase of HSGPA by 0.2 leads to an increase in COMP of 0.067, or a 6.7% increase in the probability of program completion. The probit model specification provides a similar estimate of a 6.8 percent increase for the same grade point interval. When the high school GPA is 0.5 higher, the program completion rates increase by 14.9% and 15.4% when using the logit and probit model estimates respectively. A full increase of 1.0 points in the HSGPA variable increases the probability of completion by 24.0% and 25.2% for logit and probit model specifications respectively. Clearly a student's high school GPA can effectively predict completion outcomes in the honors program.

For the business college dummy variable (BUS), a value of 0 means that the student is not in the business college while a value of 1 means the student does have an academic major within the business college. The 0.111 estimate using logit model 2 means that, all else being equal, a student deciding to select a major in the business college typically displays an 11.1% higher completion rate than students with majors outside the college. The estimate using probit model 2 provides an identical 11.1 percent increase. This shows that the academic major selection with respect to the College of Business Administration does make a difference on predicted completion rates.

Remaining variables in the analysis are displayed in logit model 1 and probit model 1. Since results are nearly identical, a cursory analysis can be made by just looking at the probit model results. Female students, for example, have a completion rate that is approximately three percent higher than males in the sample. An increase in verbal SAT score by 100 predicts a 0.1% higher completion rate while a 100-point increase in the math SAT score predicts a 0.9% increase in completion. Both results are relatively small when compared to high school GPA results. Higher class size by an increment of ten and the choice to select an academic major in the College of Arts and Sciences lead to decreased predicted completion rates by 1.5% and 1.4% respectively. Again, these results are not statistically significant.

CONCLUSION

This study serves as an important addition to the existing literature in that it provides some empirical support for previous work with some interesting variations. As McKay observed, we find that the high school GPA for students in the honors program emerges as the most significant predictor of program completion. The fact that SAT scores do not significantly help predict expected completion rates suggests that high school GPAs may include measures beyond the basic knowledge indicated in standardized tests. A paradox is generated in that both high school GPAs and SAT scores are used to determine whether entering students qualify for the Clarion University Honors Program. One explanation is that, while SAT scores provide a basis for determining academic potential, high school GPAs include an individual's overall work ethic and effort. We read of students who underperform in high school yet score high on standardized tests. These types of students, as predicted by this analysis, would not be as likely to complete the honors program using the same level of effort in college. An empirical establishment of what GPA measures would be an interesting extension of this analysis. One possible policy implication of this result is that, if a program or college in honors wishes to increase completion or participation rate, a director or dean should target for special scrutiny those individuals coming in with below-average high school GPAs as they are more likely to drop the program.

Results in this analysis showing that business college students are more likely than students in the arts and sciences or in education to complete the honors program are different from previous studies. The overall discussion in Goodstein and Szarek may support these findings. Most students from the Clarion University College of Arts and Sciences are natural science majors, typically in biology and physics. Most of these students study for professional (especially medical) or graduate school exams, and the prospect of working on a thesis at the same time can be daunting. Likewise, students in our college of education are busy with student teaching, which takes time away from the senior project. Business students do not consistently face these obstacles, so they may remain in the program, but additional work needs to be done to see if this is the case. Future analysis will attempt to determine how completion rates are influenced by student involvement and whether differences exist among an expanded demographic of students enrolled in the program.

APPENDIX

Because of the discrete nature of the dependent variable in this study (COMP takes on a value of either 0 or 1), ordinary least squares regression would be an inappropriate model. The two most common models utilized when the dependent variable is discrete and binary are the logit and the probit models. The logit model utilizes the logistic or exponential function and is the model of choice in McKay (2009). The probit model utilizes the standard normal distribution in developing probabilities and is the additional method utilized in this analysis. The underlying standard normal distribution allows for a more uniform probability of obtaining a 0 or a 1 when compared to the exponential function, however, both models tend to provide similar results for relatively small changes in the independent variables. It is beneficial to report results from both the logit and probit estimation procedures in order to observe any possible variation in results. If the empirical results show a great deal of variation, the model specification would be placed in question as it is dependent on the assumed distribution of the dependent variable. On the other hand, if the marginal impacts of changes in each variable on the probability of program completion among honors students are consistent, a robust quantitative estimate is verified.

The standard binary logit or probit model is widely used for this dependent variable type and is built around a latent regression of the following form:

(1) [??] = x'[beta] + e

where x and [beta] are standard variable and parameter matrices, and e is a vector matrix of normally distributed error terms. The initial model considered for the latent regression can be formulated as:

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The dummy variable EDUC is not included in the latent regression model in order to avoid the dummy variable trap. For convenience, rather that writing out the entire latent regression formula, the equation above can also be written as:

(3) [y.sub.i] = [beta]'x

In both equation (2) and equation (3) the variable [y.sub.i] is the COMP variable equal to 0 if student i did not finish the Clarion University Honors Program and 1 if that student did successfully complete the program. For the probit model, the probability that y=1 can be calculated as

(4) [[integral].sup.[beta]'.sub.-[infinity]] [phi] dt = [phi] ([beta]'x)

where [phi] is the standard normal distribution function and [phi] is the cumulative standard normal distribution function. For the logit function, the same probability would be

(5) [e.sup.[beta]'x]/(1 + [e.sup.[beta]'x])

for each value of x. With a fair amount of calculation, the coefficients on a binary logit or probit model can be easily interpreted. Rather than treating the slope parameters in a linear fashion, the marginal effect of each explanatory variable can be calculated using the cumulative standard normal distribution in the case of the probit model or the cumulative exponential function for logit analysis. Using the notation above, the marginal effect of variable [x.sub.i] on the dependent variable (y or COMP in this analysis), can be calculated using the following equation for the probit analysis:

(6) [partial derivative]E(ylx)/[partial derivative][x.sub.i] = [dF([beta]'x/d([beta]'x) x [[beta].sub.i]3 = [DELTA][PHI] ([beta]'x) [[beta].sub.i]

where [DELTA] represents the change in the cumulative logistic distribution when [x.sub.i] is changed. Analysis of the marginal effect of each explanatory variable provides a better empirical description of how each variable influences the probability of a student completing the Clarion University Honors Program given the value of all other explanatory variables. Parameters for the probit model are attained using standard maximum likelihood estimation. Simply put, the marginal effects of any variable in a probit model are determined by calculating the change observed in the cumulative normal distribution when the variable in question incrementally changes.

Likewise, marginal values for the logit model are obtained from the following:

(7) [partial derivative]E(y | x)/[partial derivative][x.sub.i] = [dF([beta]'x/d([beta]'x) x [[beta].sub.i] = [DELTA]a/a + [e.sup.-[summation][beta]x]))

Maximum likelihood estimates are calculated in a similar fashion for the logit model. Comparative statics for each variable can be done to determine how each measure affects the probability students will complete the Honors Program. Again, it is important to use both logit and probit analyses since each assumes a different base distribution in calculating probabilities. As with the probit model, the marginal changes are calculated by looking at changes in the cumulative exponential function due to changes in the variable of interest.

REFERENCES

Creighton, L.M. (2007). Factors affecting the graduation rates of university students from underrepresented populations. International Electronic Journal for Leadership in Learning, 11(Article 7), 1-12.

Geiser, S. & Santelices, M.V. (2007). Validity of high school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes. University of California-Berkeley Center for Studies in Higher Education Research and Occasional Paper Series, CSHE.6.07.

Goodstein, L. and Szarek, P. (2013). They come but do they finish? Journal of the National Collegiate Honors Council, Fall/Winter, 14 (2), 85-104.

Keller, R.R. and Lacy, M.G. (2013). Propensity score analysis of an honors program's contribution. Journal of the National Collegiate Honors Council, Fall/Winter, 14 (2), 73-84.

McKay, K. (2009). Predicting retention in honors programs. Journal of the National Collegiate Honors Council, Spring/Summer, 10 (1), 77-87.

Yang, C.W. & Raehsler, R.D. (2005). An economic analysis on intermediate microeconomics: An ordered probit model. Journal for Economic Educators, 5 (3), 1-11.

Zhang, G., Anderson, T., Ohland, M., Carter, R., & Thorndyke, B. (2002). Identifying factors influencing engineering student graduation and retention: A longitudinal and cross-institutional study. Proceedings of the Annual Conference and Exposition for the American Society for Engineering Education.

HALLIE SAVAGE

Clarion University of Pennsylvania

and the National Collegiate Honor Council

ROD D. RAEHSLER

Clarion University of Pennsylvania

JOSEPH FIEDOR

Indiana University of Pennsylvania

The author may be contacted at rraehsler@clarion.edu.

Table 1: Summary of Descriptive Statistics

Variable   Mean    Standard    Minimum   Maximum   Skewness
                   Deviation

COMP       0.66      0.47         0         1         NA
SIZE       41.60     11.65       19        53      -0.73 ***
VSAT        620      55.95       480       800       0.06
MSAT        621      53.94       490       790       0.06
HSGPA      3.82      0.22       2.33      4.00     -2.46 ***
GENDER     0.32      0.47         0         1         NA
ARTSC      0.45      0.50         0         1         NA
BUS        0.13      0.34         0         1         NA
EDUC       0.42      0.49         0         1         NA

* significant at the 0.10 level

** significant at the 0.05 level

*** significant at the 0.01 level

Table 2: correlation Matrix of Variables

           COMP       SIZE       VSAT        MSAT      HSGPA

COMP        1
SIZE      -.010        1
VSAT      -.006      -.019         1
MSAT       .050       .063       .018         1
HSGPA    .188 ***   .120 ***     .042      178 ***       1
GENDER    -.053       .031       -.052     283 ***    -146 ***
ARTSC     -.051       .025     .151 ***    121 ***     -.037
BUS       .082*       .019     - 174 ***     .030       .021
EDUC      -.004      -.038       -.033     -143 ***     .023

          GENDER      ARTSC       BUS      EDUC

COMP
SIZE
VSAT
MSAT
HSGPA
GENDER      1
ARTSC    173 ***        1
BUS        .056     -.350 ***      1
EDUC     -213 ***   -.768 ***   - 332***    1

* significant at the 0.10 level

** significant at the 0.05 level

*** significant at the 0.01 level

Table 3: Logit and Probit Model Equation Estimates

Variable               Logit     logit    Probit    Probit
or Measure            Model 1   Model 2   Model 1   Model 2

CONSTANT               -5.82     -5.58     -3.53     -3.38
                      (.006)    (.007)    (.006)    (.000)
GENDER                 -0.14               -0.08
                      (.567)              (.569)
SIZE (x [10.sup.2])    -0.71               -0.41
                      (.427)              (.456)
VSAT (x [10.sup.5])                         478
                      (.977)              (.967)
MSAT (x [10.sup.3])    1.13                0.70
                      (.582)              (.572)
HSGPA                  1.58      1.61      0.95      0.98
                      (.000)    (.000)    (.000)    (.000)
ARTSC                  -0.06               -0.04
                      (.783)              (.774)
BUS                    0.53      0.54      0.32      0.33
                      (.133)    (.099)    (.130)    (.091)
LR STATISTIC           22.82     19.67     22.86     21.66
                      (.002)    (.000)    (.002)    (.000)

p-values are in parentheses

Table 4: Marginal Probability Effects on Completion
Probability for Logit and Probit Models

Marginal change          logit     logit    Probit    Probit
                        Model 1   Model 2   Model 1   Model 2

GENDER 0 to 1           -0.030              -0.030
SIZE increase by 10     -0.016              -0.015
VSAT increase by 50     +0.000              +0.001
VSAT increase by 100    +0.001              +0.001
MSAT increase by 50     +0.009              +0.009
MSAT increase by 100    +0.024              +0.025
HSGPA increase by 0.2   +0.065    +0.067    +0.066    +0.068
HSGPA increase by 0.5   +0.147    +0.149    +0.150    +0.154
HSGPA increase by 1.0   +0.237    +0.240    +0.248    +0.252
ARTSC 0 to 1            -0.014              -0.014
BUS 0 to 1              +0.109    +0.111    +0.108    +0.111

Figure 1: completion by Academic Major

Arts & Science            63.682
Business Academic Major   76.271
Education                 66.138

Note: Table made from bar graph.

Figure 2: Completion by Gender

Full Sample         66.370
Males Data sample   62.759
Females             68.092

Note: Table made from bar graph.