Gender issues in the student ratings of school of business instructors at a regional university.
Caines, W. Royce ; Shurden, Mike C.
INTRODUCTION
Over the past several years, student evaluations of instructors
have taken on increasing importance as administrators have sought to use
as many objective measures as possible to justify tenure/promotion
decisions, to differentiate pay raises, and to provide feedback to the
public. Student evaluations have been convenient tools because they
produce numbers that appear to represent objective assessments of the
effectiveness of the instructor. Student evaluation instruments are
usually administered during a class session late in the semester and are
relatively cheap and easy to administer. Also, such evaluations give the
students the appearance of having input into decisions that relate to
the quality of educational services offered.
Many instructors have expressed concern about how the data
collected will be interpreted. Traditionally, the instructors have
preferred that ratings be used for personal improvement rather than as
administrative evaluations of the ability and effectiveness of the
instructor. Issues of variation in ratings even in the same courses have
been noted for some time. For example, Greenwald (1995) reported that
his ratings for a course taught one semester placed him in the highest
10 percent of faculty ratings at his university. The next semester, he
taught the same class by the same plan but student ratings placed him in
the second lowest decile of the university faculty. How could such
variation occur assuming similar efforts on the part of the faculty
member?
Administrators often appear to subscribe to the view that
"student ratings tend to be statistically reliable, valid, and
relatively free from bias or the need for control; probably more so than
any other data used for evaluation." (Cashin, 1995). Studies such
as that by Marsh and Hocevar support the view that student evaluations
of teaching performance are stable and consistent over time based on
data from 6, 024 classes and 195 instructors.
Given that student evaluations are being used extensively to make
administrative decisions, it becomes increasingly important to
understand variation and in particular to ascertain whether any
systematic bias occurs when students evaluate instructors. For many
years, female faculty members have addressed the issue of gender bias in
higher education. Concerns about hiring practices, pay differences and
other issues have been widely addressed (see Kane, 1998; Colon, 1998);
however, very little research appears to have been done in assessing
whether students have a systematic bias in regards to the gender of the
course instructor. Daufin (1995) declared that "research says
students judge white women and people of color more harshly than white
male professors" but offered no evidence of such research. Chandler (1996) cited a 1975 study that reported that female students have higher
regard for female instructors.
Bennet (1982) reported that his research showed that both male and
female students placed greater demands on female instructors for student
contact and support, but found no evidence of direct gender bias on
teaching evaluations. Basow and Silburg (1987) reviewed evaluations from
553 male students and 527 female students using multivariate analysis of
variance. Based on the data, they concluded that female professors were
rated lower than male professors on the issue of instructor interaction
with the individual student by both male and female students with some
differences between disciplines. Then Basow (1992) completed a more
in-depth study that included four semesters of data from a private
liberal arts college. Her results show that there was no significant
effect of student gender on the ratings of male instructors, but female
students rated female instructors significantly higher than did than
male students.
The focus of this paper is to evaluate the student evaluations of
instructors of business at a small regional university over a period of
fourteen semesters. The evaluations of female instructors are compared
to those of male instructors to determine whether there are significant
differences. The purpose is not to discover whether the gender of the
student affects the rating for the professor, but rather to ascertain
whether business students in general rate instructors differently based
on the gender of the instructor.
DATA AND METHODOLOGY
Data for this study were collected from all courses taught in the
School of Business at a small (2800 students) regional public university
during each regular semester (summer school excluded) from Fall 1992
through Spring 1999, a total of fourteen semesters. Over that period, a
total of 693 sections were taught in the School of Business. Several
sections were taught by part-time instructors, but most were taught by
full-time faculty. Of the total sections, 159 (23%) were taught by
female instructors and 534 (77%) sections were taught by male
instructors. There were 22 sections taught by female part-time
instructors (13.83% of female-taught sections) and 56 sections taught by
male part-time instructors (10.49% of male-taught sections).
The student evaluation instrument is purchased from a large state
university Center for Teaching and Faculty Development. Therefore, the
instrument has been tested for reliability and validity. Also, the
instructor reports are compared to a norm group based on the other
institutions using the instrument. The surveys are completed on a very
formal schedule near the end of each semester.
In each section, an instructor who is not the instructor of the
course is assigned to administer the student evaluations at the
beginning of the class period. A student is selected to collect the
completed forms and deliver them to the School of Business office. They
are then mailed to the test center for analysis and results are returned
to the School of Business office shortly after the end of the semester.
Instructors receive a copy of the reports at the beginning of the next
semester. The reports show comparisons of the individual instructor to
others in the national group but do not show comparisons to instructors
at their own institution.
This analysis is divided into two parts. The first section deals
with questions that relate to the overall evaluation of the course. Five
questions make up the overall issues that relate to perceptions about
the individual course and the individual instructors.
The second part of this analysis focuses on five measures related
to methods used by instructors. Thus it will be possible to evaluate
both the perception of the overall effectiveness of female versus male
instructors as well as the perception of methods employed by the
instructors of different gender.
Simple summary statistics are the first part of this analysis.
Mean, median, range, standard deviation, and coefficient of variation are calculated to compare female and male instructors. Next the student
evaluation ratings are tested for statistical significance by computing the Wilcoxon Rank Sum Test for differences of two medians. The Wilcoxon
Rank Sum Test is used because it is a nonparametric procedure that does
not require the assumption of normality. The Wilcoxon Rank Sum Test is a
powerful test even when conditions of normality are met and is more
appropriate when the conditions are not met (Levine, et. al. 1998) The
Wilcoxon Rank Sum Test statistic is approximately normally distributed
for large sample sizes (Levine, et. al.).
The student evaluation instrument includes a large number of
questions that are asked to try to diagnose specific
strengths/weaknesses of each instructor. The ratings are compared to
peer instructors of all courses and of similar courses where similar
courses are defined as those that are of similar class size and
self-reported student motivation level. Thus the reported results do not
allow comparison to courses specific to the discipline, but are compared
to the national test group.
The reports that are printed by the national test center include
the results from a few summary questions, which form an overview of the
instructor and course. For the first part of this analysis, five of
those summary items/questions are compared as shown in Table 1.
Questions 1-3 are reported as percentile scores where the individual
instructor scores are placed in a percentile of the test group that is
compared to all similar classes as reported in the preceding paragraph.
Questions 4 and 5 are reported as raw scores on a 1 (strongly disagree)
to 5 (strongly agree) scale with 1 being the lowest and 5 being the
highest (preferred) score.
In addition to the results from summary questions, data are also
collected to compare methods of teaching. The complete analysis includes
twenty questions that are aggregated into four major categories for the
purpose of this study in keeping with the procedures used on the
instrument.
In addition, the student's rating of the amount of work
required in the course also is reported in this section to evaluate
whether students perceive a difference in the amount of work required by
female instructors compared to the amount of work required by male
instructors. Questions 6-9 are reported as difference scores where the
individual instructor scores are compared to the national average for
similar courses. Question 10 is reported as a raw score on a 1 (strongly
disagree) to 5 (strongly agree) scale.
RESULTS
First, summary statistics were calculated for all sections taught
in the School of Business. Results are reported for the female group and
the male group, Table 1. There are some differences as well as some
similarities in the results for the two groups. For question 1, ratings
for the female instructors are consistently better than for their male
counterparts with females having higher mean and median ratings with
less variation. Remember these percentile rankings refer to the norm
group that represents all institutions using this instrument.
However, the ratings for male instructors are higher for questions
2 and 3 with less variation in the ratings. Thus it appears that
students may prefer male instructors as shown by the ratings on question
2, but at the same time progress on objectives is rated better in
courses taught by female instructors.
Questions 4 and 5 are reported on a different scale than questions
1 through 3. These questions are reported as raw scores on a scale of
1-5 with 5 the more desirable score. Thus there is no comparison to a
norm group. For question 4, female instructors received higher mean
ratings (medians were identical) with less variation. On question 5,
female and male mean and median ratings were identical with female
ratings exhibiting less variation.
The results for the methods questions in Part 2 again show some
differences, Table 2. Female instructors have higher ratings for
"involving students", "Communicating content and
purpose" and for "creating enthusiasm". The scores are
also statistically higher for "preparing exams", but that is
misleading as the questions are designed such that lower scores are the
preferable scores on that section. Female instructors also have higher
numerical ratings on the question related to the amount of work in the
course.
While these summary statistics allow a visual basis for comparison,
the question of statistical significance remains before any conclusions
can be drawn about differences between ratings. Only those differences
that are statistically significant can be identified as non-random
fluctuations.
Therefore the next step is to use the Wilcoxon Rank Sum Test of
differences of medians. In each case, the scores of the female
instructors are compared to the ratings for male instructors. The
Wilcoxon Rank Sum Test is distributed approximately normally; thus a
Z-score is calculated and examined for p-value in testing whether
significant differences exist.
Table 3 gives the calculated Z-scores and associated p-values for
the questions related to the overall perceptions of the instructor and
course. The results indicate some significant differences do exist, but
the results are not consistent. For question 1 (progress on relevant
objectives), females receive higher ratings that are statistically
significant when compared to male instructors. Thus students appear to
perceive that the female instructors do a better job of stating
objectives and covering material that achieves those objectives.
However, question 2 (would like professor again) yields a different
perspective. Female instructors receive ratings that are significantly
lower than male instructors receive. The immediate question is why
students would prefer instructors who are not rated as well at meeting
course objectives. Perhaps students do not place great emphasis on the
importance of meeting stated objectives.
Similar to question 2, the results for question 3 (course improved
attitude toward field) are somewhat surprising. Again, the students
rated females significantly lower (though the differences are less
statistically significant). Students appear to develop a better attitude
about the business disciplines from interaction with male versus female
instructors. Again, issues not related to the relevant course objectives
must be involved.
The results for question 4 and 5 are also somewhat surprising.
Students perceive no significant differences in the overall excellence
of instruction between male and female instructors (question 4) though
they indicate they would prefer the male instructors based on the
results for question 2. In this case at least, students appear to make
unbiased evaluations of instructional effectiveness regardless of their
"likes". Similarly, students do not indicate any significant
differences in learning in the course (question 5). Again, students
appear not to be biased by gender in evaluating the learning experience
even though they are biased in which instructors they would "like
to have again" and even though they report development of improved
attitudes about the particular business discipline when interaction
occurs with male instructors. The results for Part 2 question are
perhaps helpful in explaining the results for the Part 1 analysis.
Students rate female instructors significantly higher for questions
6 through 8 which may explain why female instructors receive
significantly higher ratings for question 1. However, female instructors
also receive significantly lower ratings for question 9 dealing with
preparation of exams (remember, lower scores are preferred on this
question). In addition, students perceive that the amount of work
expended is significantly higher in courses taught by female
instructors. The results from those two questions may well affect the
results for questions 2 and 3 and explain why students prefer male
instructors though they see no difference in the excellence of
instruction or in the amount of learning achieved.
SUMMARY AND CONCLUSIONS
The results of this analysis indicate that significant differences
exist in the student evaluations of the instructors of business courses
at the regional university that provided the data. The results of the
Wilcoxon Rank Sum test indicate that students do perceive male and
female instructors differently. Though students rate female instructors
higher on meeting course objectives and equal in excellence of
instruction as well as in the learning accomplished, students indicate
they would prefer to have the male instructors again and received a
better attitude about the discipline in courses taught by male
instructors.
These results are somewhat encouraging in that students did not let
their "likes" bias them when rating instructional performance
and outcomes as shown by questions 1, 4, and 5. An analysis of the
methods questions indicates that female instructors are rated better
than male instructors except in the area of preparing examinations. In
addition, students rate the amount of work required to be greater in
those taught by female instructors. It is not apparent from the ratings
as to whether students relate the amount of work to the difficulty of
the examinations. There was no effort to determine the gender mix of
students in these classes to determine if that affected the outcomes.
The vast majority of the courses were required courses and it is
believed that there was roughly an equal mix of the genders, although
the researchers have not verified that.
REFERENCES
Basow, S. A. (1995). Student evaluations of college professors:
When gender matters. Journal of Educational Psychology, 87(4), 656-665.
Basow, S. A. & Silburg, N.T. (1987). Student evaluations of
college professors: Are male and female professors rated differently?.
Journal of Educational Psychology, 79(3), 308-314.
Bennet, S. K. (1982). Student perceptions and expectations for male
and female instructors: Evidence relating to the question of gender bias
in teaching evaluation. Journal of Educational Psychology, 74(2),
170-179.
Cashin, W. E. (1995). Student ratings of teaching: IDEA Paper No.
32. Manhattan, KS: Kansas State University, Center for Faculty
Evaluations and Development.
Chandler, C. (1996). Mentoring and Women in Academia: Reevaluating
the Traditional Model: Part One. Vol. 8., Contemporary Women's
Issues Database, 09-01-1996, pp 79-86.
Colon, T. L. (1998). Academia: A bastion of sexism, University
Wire, 05-15-1998.
Daufin, E. K. (1995). Confessions of a womanist professor, Black
Issues in Higher Education, 0309-1995. pp PG.
Greenwald, A.G. (1995). Applying social psychology to reveal a
major (but correctable) flaw in student evaluations of teaching, Paper
presented at the annual meeting of the American Psychological
Association, New York, NY.
Kane, E. (1998). Gender bias divides faculty salaries, University
Wire, 10-16-98.
Marsh, H. W. & D. Hocevar. (1991). Student's evaluation of
teaching effectiveness: The stability of mean ratings of the same
teachers over a 13-year period, Teaching and Teacher Education, 7(4)
303-314.
Levine, D. M., M. I. Berenson & D. Stephan. (1997). Statistics
for Managers Using Microsoft Excel, Upper Saddle River, NJ: Prentice
Hall.
W. Royce Caines, Lander University
Mike C. Shurden, Lander University
TABLE 1
Summary Statistics: Student Evaluations of School of
Business Instructors' Overall Perception Questions
Mean Median Range
Question 1--Progress on Relevant Objectives
Female 59 60 95
Male 52 54 98
Question 2--Would Like Instructor Again
Female 45 43 98
Male 52 52 98
Question 3--Improved Attitude Toward Field
Female 46 43 96
Male 49 52 98
Question 4--Overall, I Rate This Instructor An
Excellent Instructor
Female 3.96 4.00 3.4
Male 3.89 4.00 3.2
Question 5--Overall, I Learned a Great Deal
in this Course
Female 3.96 4.1 3.0
Male 3.96 4.1 3.4
Standard Coefficient
Deviation of Variation
Question 1--Progress on Relevant Objectives
Female 24 41
Male 29 55
Question 2--Would Like Instructor Again
Female 27 59
Male 29 55
Question 3--Improved Attitude Toward Field
Female 25 55
Male 25 51
Question 4--Overall, I Rate This Instructor An
Excellent Instructor
Female .48 12.0
Male .59 15.3
Question 5--Overall, I Learned a Great Deal
in this Course
Female .59 14.9
Male .66 16.6
TABLE 2
Summary Statistics: Student Evaluations of School of
Business Instructors, Methods Questions
Standard Coefficient
Mean Median Range Deviation of Variation
Question 6--Involving Students
Female .10 .12 2.6 .7 710.8
Male -.002 .03 3.0 .6 2608
Question 7--Communicating Content and Purpose
Female .10 .13 2.2 .4 342.2
Male .00 .10 2.6 .4 1196
Question 8--Creating Enthusiasm
Female .10 .16 1.4 .20 154.2
Male .0025 .04 1.7 .30 11662
Question 9--Preparing Examinations
Female .3 .23 2.8 .6 183.9
Male .2 .13 3.4 .5 281.9
Question 10--Amount of Work Required
Female 3.8 3.8 2.6 .7 17.7
Male 3.3 3.2 3.0 .6 16.8
TABLE 3
Tests of Significance: Median Student Ratings of Female
Instructors Compared to Male Instructors
Question
1 2 3 4 5
Z-Score 2.664 -2.948 -1.367 1.025 -.374
p-value .004 .002 .085 .153 .369
Question
6 7 8 9 10
Z-Score 1.885 1.643 5.836 2.48 8.08
p-value .030 .050 .000 .007 .000