文章基本信息

标题：Gender issues in the student ratings of school of business instructors at a regional university.
作者：Caines, W. Royce ; Shurden, Mike C.
期刊名称：Academy of Educational Leadership Journal
印刷版ISSN：1095-6328
出版年度：2001
期号：May
语种：English
出版社：The DreamCatchers Group, LLC
关键词：Business teachers;Sex discrimination;Student evaluation of teachers

Gender issues in the student ratings of school of business instructors at a regional university.

Caines, W. Royce ; Shurden, Mike C.

INTRODUCTION

Over the past several years, student evaluations of instructors have taken on increasing importance as administrators have sought to use as many objective measures as possible to justify tenure/promotion decisions, to differentiate pay raises, and to provide feedback to the public. Student evaluations have been convenient tools because they produce numbers that appear to represent objective assessments of the effectiveness of the instructor. Student evaluation instruments are usually administered during a class session late in the semester and are relatively cheap and easy to administer. Also, such evaluations give the students the appearance of having input into decisions that relate to the quality of educational services offered.

Many instructors have expressed concern about how the data collected will be interpreted. Traditionally, the instructors have preferred that ratings be used for personal improvement rather than as administrative evaluations of the ability and effectiveness of the instructor. Issues of variation in ratings even in the same courses have been noted for some time. For example, Greenwald (1995) reported that his ratings for a course taught one semester placed him in the highest 10 percent of faculty ratings at his university. The next semester, he taught the same class by the same plan but student ratings placed him in the second lowest decile of the university faculty. How could such variation occur assuming similar efforts on the part of the faculty member?

Administrators often appear to subscribe to the view that "student ratings tend to be statistically reliable, valid, and relatively free from bias or the need for control; probably more so than any other data used for evaluation." (Cashin, 1995). Studies such as that by Marsh and Hocevar support the view that student evaluations of teaching performance are stable and consistent over time based on data from 6, 024 classes and 195 instructors.

Given that student evaluations are being used extensively to make administrative decisions, it becomes increasingly important to understand variation and in particular to ascertain whether any systematic bias occurs when students evaluate instructors. For many years, female faculty members have addressed the issue of gender bias in higher education. Concerns about hiring practices, pay differences and other issues have been widely addressed (see Kane, 1998; Colon, 1998); however, very little research appears to have been done in assessing whether students have a systematic bias in regards to the gender of the course instructor. Daufin (1995) declared that "research says students judge white women and people of color more harshly than white male professors" but offered no evidence of such research. Chandler (1996) cited a 1975 study that reported that female students have higher regard for female instructors.

Bennet (1982) reported that his research showed that both male and female students placed greater demands on female instructors for student contact and support, but found no evidence of direct gender bias on teaching evaluations. Basow and Silburg (1987) reviewed evaluations from 553 male students and 527 female students using multivariate analysis of variance. Based on the data, they concluded that female professors were rated lower than male professors on the issue of instructor interaction with the individual student by both male and female students with some differences between disciplines. Then Basow (1992) completed a more in-depth study that included four semesters of data from a private liberal arts college. Her results show that there was no significant effect of student gender on the ratings of male instructors, but female students rated female instructors significantly higher than did than male students.

The focus of this paper is to evaluate the student evaluations of instructors of business at a small regional university over a period of fourteen semesters. The evaluations of female instructors are compared to those of male instructors to determine whether there are significant differences. The purpose is not to discover whether the gender of the student affects the rating for the professor, but rather to ascertain whether business students in general rate instructors differently based on the gender of the instructor.

DATA AND METHODOLOGY

Data for this study were collected from all courses taught in the School of Business at a small (2800 students) regional public university during each regular semester (summer school excluded) from Fall 1992 through Spring 1999, a total of fourteen semesters. Over that period, a total of 693 sections were taught in the School of Business. Several sections were taught by part-time instructors, but most were taught by full-time faculty. Of the total sections, 159 (23%) were taught by female instructors and 534 (77%) sections were taught by male instructors. There were 22 sections taught by female part-time instructors (13.83% of female-taught sections) and 56 sections taught by male part-time instructors (10.49% of male-taught sections).

The student evaluation instrument is purchased from a large state university Center for Teaching and Faculty Development. Therefore, the instrument has been tested for reliability and validity. Also, the instructor reports are compared to a norm group based on the other institutions using the instrument. The surveys are completed on a very formal schedule near the end of each semester.

In each section, an instructor who is not the instructor of the course is assigned to administer the student evaluations at the beginning of the class period. A student is selected to collect the completed forms and deliver them to the School of Business office. They are then mailed to the test center for analysis and results are returned to the School of Business office shortly after the end of the semester. Instructors receive a copy of the reports at the beginning of the next semester. The reports show comparisons of the individual instructor to others in the national group but do not show comparisons to instructors at their own institution.

This analysis is divided into two parts. The first section deals with questions that relate to the overall evaluation of the course. Five questions make up the overall issues that relate to perceptions about the individual course and the individual instructors.

The second part of this analysis focuses on five measures related to methods used by instructors. Thus it will be possible to evaluate both the perception of the overall effectiveness of female versus male instructors as well as the perception of methods employed by the instructors of different gender.

Simple summary statistics are the first part of this analysis. Mean, median, range, standard deviation, and coefficient of variation are calculated to compare female and male instructors. Next the student evaluation ratings are tested for statistical significance by computing the Wilcoxon Rank Sum Test for differences of two medians. The Wilcoxon Rank Sum Test is used because it is a nonparametric procedure that does not require the assumption of normality. The Wilcoxon Rank Sum Test is a powerful test even when conditions of normality are met and is more appropriate when the conditions are not met (Levine, et. al. 1998) The Wilcoxon Rank Sum Test statistic is approximately normally distributed for large sample sizes (Levine, et. al.).

The student evaluation instrument includes a large number of questions that are asked to try to diagnose specific strengths/weaknesses of each instructor. The ratings are compared to peer instructors of all courses and of similar courses where similar courses are defined as those that are of similar class size and self-reported student motivation level. Thus the reported results do not allow comparison to courses specific to the discipline, but are compared to the national test group.

The reports that are printed by the national test center include the results from a few summary questions, which form an overview of the instructor and course. For the first part of this analysis, five of those summary items/questions are compared as shown in Table 1. Questions 1-3 are reported as percentile scores where the individual instructor scores are placed in a percentile of the test group that is compared to all similar classes as reported in the preceding paragraph. Questions 4 and 5 are reported as raw scores on a 1 (strongly disagree) to 5 (strongly agree) scale with 1 being the lowest and 5 being the highest (preferred) score.

In addition to the results from summary questions, data are also collected to compare methods of teaching. The complete analysis includes twenty questions that are aggregated into four major categories for the purpose of this study in keeping with the procedures used on the instrument.

In addition, the student's rating of the amount of work required in the course also is reported in this section to evaluate whether students perceive a difference in the amount of work required by female instructors compared to the amount of work required by male instructors. Questions 6-9 are reported as difference scores where the individual instructor scores are compared to the national average for similar courses. Question 10 is reported as a raw score on a 1 (strongly disagree) to 5 (strongly agree) scale.

RESULTS

First, summary statistics were calculated for all sections taught in the School of Business. Results are reported for the female group and the male group, Table 1. There are some differences as well as some similarities in the results for the two groups. For question 1, ratings for the female instructors are consistently better than for their male counterparts with females having higher mean and median ratings with less variation. Remember these percentile rankings refer to the norm group that represents all institutions using this instrument.

However, the ratings for male instructors are higher for questions 2 and 3 with less variation in the ratings. Thus it appears that students may prefer male instructors as shown by the ratings on question 2, but at the same time progress on objectives is rated better in courses taught by female instructors.

Questions 4 and 5 are reported on a different scale than questions 1 through 3. These questions are reported as raw scores on a scale of 1-5 with 5 the more desirable score. Thus there is no comparison to a norm group. For question 4, female instructors received higher mean ratings (medians were identical) with less variation. On question 5, female and male mean and median ratings were identical with female ratings exhibiting less variation.

The results for the methods questions in Part 2 again show some differences, Table 2. Female instructors have higher ratings for "involving students", "Communicating content and purpose" and for "creating enthusiasm". The scores are also statistically higher for "preparing exams", but that is misleading as the questions are designed such that lower scores are the preferable scores on that section. Female instructors also have higher numerical ratings on the question related to the amount of work in the course.

While these summary statistics allow a visual basis for comparison, the question of statistical significance remains before any conclusions can be drawn about differences between ratings. Only those differences that are statistically significant can be identified as non-random fluctuations.

Therefore the next step is to use the Wilcoxon Rank Sum Test of differences of medians. In each case, the scores of the female instructors are compared to the ratings for male instructors. The Wilcoxon Rank Sum Test is distributed approximately normally; thus a Z-score is calculated and examined for p-value in testing whether significant differences exist.

Table 3 gives the calculated Z-scores and associated p-values for the questions related to the overall perceptions of the instructor and course. The results indicate some significant differences do exist, but the results are not consistent. For question 1 (progress on relevant objectives), females receive higher ratings that are statistically significant when compared to male instructors. Thus students appear to perceive that the female instructors do a better job of stating objectives and covering material that achieves those objectives.

However, question 2 (would like professor again) yields a different perspective. Female instructors receive ratings that are significantly lower than male instructors receive. The immediate question is why students would prefer instructors who are not rated as well at meeting course objectives. Perhaps students do not place great emphasis on the importance of meeting stated objectives.

Similar to question 2, the results for question 3 (course improved attitude toward field) are somewhat surprising. Again, the students rated females significantly lower (though the differences are less statistically significant). Students appear to develop a better attitude about the business disciplines from interaction with male versus female instructors. Again, issues not related to the relevant course objectives must be involved.

The results for question 4 and 5 are also somewhat surprising. Students perceive no significant differences in the overall excellence of instruction between male and female instructors (question 4) though they indicate they would prefer the male instructors based on the results for question 2. In this case at least, students appear to make unbiased evaluations of instructional effectiveness regardless of their "likes". Similarly, students do not indicate any significant differences in learning in the course (question 5). Again, students appear not to be biased by gender in evaluating the learning experience even though they are biased in which instructors they would "like to have again" and even though they report development of improved attitudes about the particular business discipline when interaction occurs with male instructors. The results for Part 2 question are perhaps helpful in explaining the results for the Part 1 analysis.

Students rate female instructors significantly higher for questions 6 through 8 which may explain why female instructors receive significantly higher ratings for question 1. However, female instructors also receive significantly lower ratings for question 9 dealing with preparation of exams (remember, lower scores are preferred on this question). In addition, students perceive that the amount of work expended is significantly higher in courses taught by female instructors. The results from those two questions may well affect the results for questions 2 and 3 and explain why students prefer male instructors though they see no difference in the excellence of instruction or in the amount of learning achieved.

SUMMARY AND CONCLUSIONS

The results of this analysis indicate that significant differences exist in the student evaluations of the instructors of business courses at the regional university that provided the data. The results of the Wilcoxon Rank Sum test indicate that students do perceive male and female instructors differently. Though students rate female instructors higher on meeting course objectives and equal in excellence of instruction as well as in the learning accomplished, students indicate they would prefer to have the male instructors again and received a better attitude about the discipline in courses taught by male instructors.

These results are somewhat encouraging in that students did not let their "likes" bias them when rating instructional performance and outcomes as shown by questions 1, 4, and 5. An analysis of the methods questions indicates that female instructors are rated better than male instructors except in the area of preparing examinations. In addition, students rate the amount of work required to be greater in those taught by female instructors. It is not apparent from the ratings as to whether students relate the amount of work to the difficulty of the examinations. There was no effort to determine the gender mix of students in these classes to determine if that affected the outcomes. The vast majority of the courses were required courses and it is believed that there was roughly an equal mix of the genders, although the researchers have not verified that.

REFERENCES

Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87(4), 656-665.

Basow, S. A. & Silburg, N.T. (1987). Student evaluations of college professors: Are male and female professors rated differently?. Journal of Educational Psychology, 79(3), 308-314.

Bennet, S. K. (1982). Student perceptions and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation. Journal of Educational Psychology, 74(2), 170-179.

Cashin, W. E. (1995). Student ratings of teaching: IDEA Paper No. 32. Manhattan, KS: Kansas State University, Center for Faculty Evaluations and Development.

Chandler, C. (1996). Mentoring and Women in Academia: Reevaluating the Traditional Model: Part One. Vol. 8., Contemporary Women's Issues Database, 09-01-1996, pp 79-86.

Colon, T. L. (1998). Academia: A bastion of sexism, University Wire, 05-15-1998.

Daufin, E. K. (1995). Confessions of a womanist professor, Black Issues in Higher Education, 0309-1995. pp PG.

Greenwald, A.G. (1995). Applying social psychology to reveal a major (but correctable) flaw in student evaluations of teaching, Paper presented at the annual meeting of the American Psychological Association, New York, NY.

Kane, E. (1998). Gender bias divides faculty salaries, University Wire, 10-16-98.

Marsh, H. W. & D. Hocevar. (1991). Student's evaluation of teaching effectiveness: The stability of mean ratings of the same teachers over a 13-year period, Teaching and Teacher Education, 7(4) 303-314.

Levine, D. M., M. I. Berenson & D. Stephan. (1997). Statistics for Managers Using Microsoft Excel, Upper Saddle River, NJ: Prentice Hall.

W. Royce Caines, Lander University

Mike C. Shurden, Lander University

TABLE 1
Summary Statistics: Student Evaluations of School of
Business Instructors' Overall Perception Questions

 Mean Median Range

Question 1--Progress on Relevant Objectives
Female 59 60 95
Male 52 54 98

Question 2--Would Like Instructor Again
Female 45 43 98
Male 52 52 98

Question 3--Improved Attitude Toward Field
Female 46 43 96
Male 49 52 98

Question 4--Overall, I Rate This Instructor An
Excellent Instructor
Female 3.96 4.00 3.4
Male 3.89 4.00 3.2

Question 5--Overall, I Learned a Great Deal
in this Course
Female 3.96 4.1 3.0
Male 3.96 4.1 3.4

 Standard Coefficient
 Deviation of Variation

Question 1--Progress on Relevant Objectives
Female 24 41
Male 29 55

Question 2--Would Like Instructor Again
Female 27 59
Male 29 55

Question 3--Improved Attitude Toward Field
Female 25 55
Male 25 51

Question 4--Overall, I Rate This Instructor An
Excellent Instructor
Female .48 12.0
Male .59 15.3

Question 5--Overall, I Learned a Great Deal
in this Course
Female .59 14.9
Male .66 16.6

TABLE 2
Summary Statistics: Student Evaluations of School of
Business Instructors, Methods Questions

 Standard Coefficient
 Mean Median Range Deviation of Variation

Question 6--Involving Students
Female .10 .12 2.6 .7 710.8
Male -.002 .03 3.0 .6 2608

Question 7--Communicating Content and Purpose
Female .10 .13 2.2 .4 342.2
Male .00 .10 2.6 .4 1196

Question 8--Creating Enthusiasm
Female .10 .16 1.4 .20 154.2
Male .0025 .04 1.7 .30 11662

Question 9--Preparing Examinations
Female .3 .23 2.8 .6 183.9
Male .2 .13 3.4 .5 281.9

Question 10--Amount of Work Required
Female 3.8 3.8 2.6 .7 17.7
Male 3.3 3.2 3.0 .6 16.8

TABLE 3
Tests of Significance: Median Student Ratings of Female
Instructors Compared to Male Instructors

 Question

 1 2 3 4 5

Z-Score 2.664 -2.948 -1.367 1.025 -.374
p-value .004 .002 .085 .153 .369

 Question

 6 7 8 9 10

Z-Score 1.885 1.643 5.836 2.48 8.08
p-value .030 .050 .000 .007 .000