Bias in Course Evaluations

Increasingly, studies have focused on gender and racial biases in the course evaluations students routinely complete at the end of a course. For example, this large-scale study of thousands of course evaluations from across a university over a seven-year period found evidence for both gender and cultural bias (defined as bias against faculty from a non-English speaking background, as determined by country of birth and/or language spoken at home). Overall, female faculty and faculty from non-English speaking backgrounds received lower ratings than their male or English-speaking counterparts. The effect was particularly pronounced for the intersectional analysis (female faculty from non-English speaking backgrounds compared to male faculty from English-speaking backgrounds) and within the Science faculty. Interestingly, these effects vanished when the researchers looked at the questions related to the course itself, as opposed to the instructor, suggesting that “biases creep in when students evaluate the person, not the course” (p. 10).

Various solutions have been proposed for mitigating bias in course evaluations, including making students more aware of the potential for bias in their responses; increasing the proportion of female and underrepresented faculty; focusing questions on student learning and specific instructional practices, rather than more broadly on the instructors themselves; and reducing or even eliminating the emphasis on course evaluations in tenure and promotion decisions.

A more recent study, reported on by Inside Higher Ed on February 17, 2021 (article here, link to download the full study here) compiled findings from more than 100 articles on bias in Student Evaluations of Teaching (SETs). Key findings include:

  • SETs show evidence of measurement bias, such that courses with lighter workloads, those with more favorable grade distributions, non-elective courses, and upper-level discussion-based courses receive better scores from students
  • Students tend to rate courses in the natural sciences lowest and humanities highest
  • An instructor’s gender, race, ethnicity, accent, sexual orientation, or disability status all impact student ratings
    • Male instructors are perceived as more accurate in their teaching, more educated, less sexist, more enthusiastic, competent, organized, easier to understand, prompt in providing feedback, and they are less penalized for being tough graders
    • Both male and female students expect women and men to conform to prescribed gender roles; students seem to prefer professors with masculine traits and penalize women who don’t conform to feminine stereotypes – a sort of double jeopardy for female faculty
    • Students show a “gender affinity” in which they prefer professors of the same gender as themselves
    • Though there is evidence of bias within other categories of identity (particularly against Black, Asian, and Latinx faculty), the authors criticize the lack of research in this area, as well as the lack of intersectional research looking at the impact of multiple converging identities on SET bias
    • The authors attribute this lack of research in “no small part [to] the underrepresentation of people of color among faculty [. . .] there are often too few people of color to make reasonable inferences from the data.”
  • The authors offer recommendations for reforming the SET process, arguing that they should be used to contextualize students’ experiences in the classroom, not evaluate teaching
    • They recommend using SETs not as a comparative metric across faculty, but to compare a faculty member’s own teaching “trajectory” over time
    • Because the distribution of SET scores tend to skew negative, the median or modal response should be used instead of the mean
    • Invitations for open-ended comments should be avoided, as these tend to produce the strongest evidence of bias; instead, students should respond to specific prompts
    • Alternative and complementary measures of teaching effectiveness (e.g., peer evaluations, teaching portfolios, reviews of course materials) should be used in addition to SETs
    • SETs should be used with the utmost caution in hiring, tenure, and promotion decisions; extra caution should be used for SETs collected during the COVID-19 era

Other articles (including some recent popular press articles from Inside Higher Ed and The Chronicle) related to this topic are included below: