Skip to content

Instantly share code, notes, and snippets.

@nicoguaro
Created October 8, 2023 18:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nicoguaro/818b3ac245cfc0bcdd80faee957d4789 to your computer and use it in GitHub Desktop.
Save nicoguaro/818b3ac245cfc0bcdd80faee957d4789 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
#%%
f = open('output_file.txt', 'r', encoding="utf8")
text = f.read()
f.close()
# Stopwords
stop = []
# f = open('stop-words-spanish-snowball-mod.txt', 'r', encoding="utf8")
# stop += f.read().split()
# f.close()
# f = open('spanish_stopwords.txt', 'r', encoding="utf8")
# stop += f.read().split()
# f.close()
wordcloud = WordCloud(stopwords=STOPWORDS.union(set(stop)),
background_color='#3c3c3c',
width=1800,
height=1400,
max_words=100,
colormap="magma",
#font_path='./CabinSketch-Bold.ttf'
)
wordcloud.generate(text)
plt.figure(figsize=(9, 7))
plt.imshow(wordcloud)
plt.axis('off')
plt.tight_layout()
plt.savefig('word_cloud-SLR.png', dpi=300, transparent=True)
plt.show()
This file has been truncated, but you can view the full file.
Refocusing on the Traditional and Effective Teaching Evaluation:
Rational Thoughts About SETEs in Higher Education
Guo Cui
Panzhihua University
Zhong Ni
Huaihua University
Camilla Hong Wang
(Corresponding Author)
Shantou University
Some higher educational institutions often use a student evaluation of teaching effectiveness (SETE) as the
only way to evaluate teaching. Unfortunately, this instrument often fails to serve as a tool for improving
instruction. It often serves as a disincentive to introducing rigor. Studies have found that student feedback
is not enough to be the basis for evaluating teaching. This paper performs a literature review of student
evaluations to measure teaching effectiveness. Problems are highlighted, and suggestions are offered to
improve SETEs and refocus teaching effectiveness on outcome-based academic standards.
Keywords: SETE, teaching interaction, teaching evaluation, performance assessment
INTRODUCTION
Student evaluation of teaching effectiveness (SETE) originated in the United States (Zhou, 2009).
Experts who support SETE believe that students’ evaluation of teachers’ teaching is objective (Zhang, Ma,
and Jiang, 2017). From students’ perspective, the teaching effect can reflect classroom quality and be used
as the primary method to evaluate teaching quality in universities and vocational colleges (Wang and Yu,
2016). However, some scholars believe that if SETE is used only and not combined with other evaluation
bases, students will become the decision-maker of teachers’ appointment, evaluation, promotion and salary
increase (Uttl, White, & Gonzalez, 2017). Some scholars also argue that if teachers are evaluated by student
satisfaction, students are directly empowered to assess teaching effectiveness. It would significantly
negatively impact and lower teaching quality (Emery, Kramer, & Tian, 2003).
Many universities and higher vocational schools regard students as consumers rather than products
(Emery & Tian, 2002). As a result, SETE tends to reflect the popularity of teachers rather than the actual
quality of teaching. SETE results are subject to many factors and do not depend entirely on teachers’
teaching levels and effectiveness. A study conducted by Chang et al. found that students’ “attitude toward
teaching evaluation,” “attitude toward learning,” and “attitude toward the course” significantly affected the
150 Journal of Higher Education Theory and Practice Vol. 22(3) 2022
data error of SETE (Dong, 2014). The author argues that the existing SETE-based teaching evaluation
method can hardly improve the teaching level, so it is necessary to discuss the advantages and disadvantages
of the current SETE method and discuss them from the literature analysis and cases.
LITERATURE REVIEW
SETE was embraced by U.S. colleges and higher vocational education administrators as early as the
1960s and has been prevalent in U.S. higher education for more than 50 years because of its practicality,
sophistication, and accessibility. However, SETE is not the only or the best way to assess the quality of
teaching and learning. The author analyzes and concludes different dimensions of research cases regarding
the reliability and validity of SETE.
Personal Traits and Popularity
Most educational researchers believe that SETE essentially has nothing to do with teaching. In some
courses, the same materials and assessment methods are used, but different instructors teach them, and the
assessment results of teaching effectiveness are not the same for each instructor. Several Chinese and
foreign scholars have reached conclusions supportive of these ideas (Dooris, 1997; Xie & Zhang, 2019;
Guan, 2012; Wu, 2013; Zhong, 2012; Aleamoni, 1987). Research findings indicate that teachers’
performance significantly impacts SETE results but not student achievement (Feldman, 1978). At the time
of SETE, students often base their evaluations on teachers’ attributes (Abrami, Leventhal & Perry, 1982).
Feldman noted a positive correlation between teacher personality and assessment results when evaluations
are based on what students or colleagues know about the teachers (Feldman, 1978). Abrami et al. have
suggested that schools should not decide teacher promotions and tenure based solely on SETE because
teachers who are popular with students receive good SETE scores regardless of teaching ability. Thus, using
SETE to assess teaching quality can be challenging academically (Abrami, Leventhal & Perry, 1982).
Student Achievement
Numerous studies have shown that student achievement is not related to actual evaluation results of
teaching effectiveness. Cohen noted that the coefficient of variation in overall SETE results due to
differences in student achievement was only 14.4% (Cohen, 1983). Dowell and Neal suggested that the
correlation between student achievement and SETE results was only 3.9% (Dowell & Neal, 1982). In a
broader study, Damron noted that SETE scores were not related to teachers’ ability to improve student
achievement. If the weight of classroom satisfaction on SETE results were increased, teachers would
receive lower evaluation scores, potentially depriving teachers of opportunities for promotion, salary
increase, or even succession (Damron, 1996).
Situational Factors and Effectiveness
Some researchers have proposed that situational factors can interfere with SETE (Damron, 1996),
making the results, not representative (Cohen, 1983). Cashin noted that there is a sizeable disciplinary bias
in SETE. Some surveys suggest that teachers in the arts and humanities consistently score higher on the
SETE results, while teachers in business, mathematics, and engineering consistently score lower. In
addition, differences between compulsory and optional courses and between senior and junior students may
affect the evaluation results (Aleamoni, 1989). The amount and intensity of course assignments can also
influence students’ teacher evaluation. A faculty member at a university teaches an introductory course.
Due to adopting a collectively developed syllabus, there is no coursework and only three multi-choice
exams. As a result, students give the teacher high evaluations every year, with scores higher than the college
average. The other two courses taught by the same teacher receive low evaluations from students because
they have developed their syllabus and are assigned more coursework.
It should be noted that the teacher is the leading scholar of these two courses. The textbook used is also
authored by the teacher, who is pretty familiar with the content of the course but has received poor
evaluations simply because of the large amount of coursework. In one of these courses, the average student
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 151
evaluation score was 73. Still, the standard error was as high as 35, and we wondered what the validity of
such a teaching evaluation was.
Assessors
The issue of assessors in SETE deserves attention. Assessors who are not familiar with the assessment
system may be misled by useless data and draw conclusions that deviate from the facts. The evaluation of
teaching effectiveness should focus on scientific statistics, and any sample of fewer than 30 respondents is
a small sample, which requires a specific statistical method. An unscientific statistical approach may lead
to three types of errors. Firstly, data processing is not scientific. Secondly, assessors confuse the critical
difference factors and non-critical difference factors, and thirdly, assessors cannot reasonably explain the
differences of respondents and cannot identify the sources of these differences. Therefore, college
administrators should master scientific, statistical analysis theories and methods (Zhong, 2012).
Qualifications
Many researchers argue that students who are not equipped with critical thinking skills cannot assess
teachers. Therefore, most researchers believe that SETE can be a teaching evaluation. Still, the teaching
effectiveness can only be set to the extent that the student is qualified (Wu, 2004). It has also been proposed
that assessors receive appropriate training before evaluation (Aleamoni, 1989). Conversations between
assessors are generally protected by defamation suits, a fundamental civil right (Cascio and Bernardin,
1981). If the assessors are not qualified but still assess others, the assessors can sue the assessors for
defamation (Chen, 2012).
CASE ANALYSIS
The literature review revealed that administrators’ practice of using SETE as the sole basis for making
decisions about faculty promotions and salary increases had been widely resented and opposed by the
faculty. The following is an analysis from teachers’ and students’ perspectives, illustrating how to
rationalize this approach.
Case 1. What Is Excellent Teaching?
A professor at a university in the United States had a SETE average of 4.25 (out of 5) in the first
semester, 4.23 in the second semester, and 4.21 in the third semester. The professor constantly reflected on
his teaching and made improvements over the past three semesters, but his SETE scores were always below
average. The professor was recognized as an outstanding faculty member, with excellent performance on
all aspects of the performance evaluation. However, based on his SETE score, he was not awarded the
Excellence in Teaching Award. The award was granted to another professor who had a high SETE score
but performed poorly on the performance evaluation. This phenomenon was brought to the president’s
attention, who became aware that the SETE system was flawed (Emery, Kramer & Tian, 2003).
It is also worth noting that the professor’s scores are all above 4.0. In this regard, the authors questioned
how to achieve “good” if a score higher than 4.0 out of 5 is considered not good. If other factors are not
considered, how should SETE scores be measured? If these so-called “other factors” are more influential
than SETE, why is the SETE method used to assess teaching and learning?
Case 2. Differences in Scores of Different Classes Taught by the Same Professor
A professor at Anhui University of Finance and Economics took up the teaching task of 4 classes in
one semester, and his SETE score in one class was 94.33 (100 out of 100), which ranked 6th in the
university, while his score in another class was 62.5, which was the lowest score in the university. In other
words, the same professor is considered by one type to be one of the best teachers in the university, while
students in another class think him to be one of the worst teachers in the university. Assuming that SETE
is an indicator of the actual situation, the scores of the same professor should be very close. The above data
152 Journal of Higher Education Theory and Practice Vol. 22(3) 2022
indicate that such a significant contrast calls into questions about the objectivity and validity of SETE
(Dong, 2014).
Case 3. Differences From the Control Group
A professor at a U.S. university who was not yet tenured received 4.10 and 4.24 in the two classes he
taught in the fall semester. In the following spring semester, he led the same course at the same university
and scored 4.04 and 4.33 in the two classes. The average score for the entire university was 3.99 in the fall
semester and 4.31 in the spring semester. The professor’s scores differed little between the two semesters
when compared longitudinally. However, compared to the school average, his teaching performance was
worse in the spring semester than in the fall semester. Could it be attributed to the improved quality of
teaching throughout the university during the spring semester? The answer is no. To some extent, these
differences depend on the composition of the faculty participating in SETE. In the fall semester, all faculty
members are required to take SETE, whereas, in the spring semester, only non-tenure-track professors and
teaching assistants are required to take SETE (Emery, Kramer & Tian, 2003).
Many researchers believe that teaching assistants are often more “likely” to meet student expectations
and, therefore, are more likely to receive high scores. In addition, because SETE has a significant impact
on faculty careers, non-tenure-track professors tend to make more effort to gain favor with students and
thus earn higher scores. Both of these factors contribute to higher SETE scores for the entire university.
Since SETE scores have little impact on their teaching careers, tenure-track professors are not required to
please their students to get higher student evaluations. Therefore, the overall average score decreases when
tenure-track professors are also involved in the SETE process. This phenomenon is quite common in U.S.
colleges and universities. In this way, does it mean that tenure-track and experienced professors are
considered inferior teachers (Feldman, 1986)?
Case 4. Score Differences and Teachers’ Teaching Styles
The researcher from Nanjing Communications Institute of Technology analyzed the correlation
between the personality traits of the interviewed teachers and the SETE results based on the research and
interviews with full-time teachers in several higher vocational colleges and universities and developed a
comparison table of teachers’ teaching style indicators. It can be seen that the SETE scores are relatively
low for teachers who are more demanding in terms of student attendance and classroom discipline and high
for teachers who are not. The SETE scores are lower for teachers who are more rigorous and formal in their
classroom style or appearance and more elevated for teachers who are not, as shown in the following Table
1 (Schmelkin, Spencer & Gellman, 1997).
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 153
Teachers maintain
the dignity of the
teacher, maintain the
psychological
distance between the
teacher and the
student, and hold
“orthodox” values
Teachers and
students are friends;
teachers can
comment on fashion
or criticize current
affairs and
communicate with
students without
distance.
Teachers should
“teach students
according to their
abilities” so that
students’
performance can
be reasonably
distributed and as
many “good
students” as
possible can
emerge.
Classroom
communication and
break-time
interaction
Teachers should
not leave students
unattended and
should not lower
their standards to
cater to them, or
else the quality of
graduates is
bound to decline.
Examination
standards and
requirements
154 Journal of Higher Education Theory and Practice Vol. 22(3) 2022
Be strict in
attendance.
Teacher and
student are like
Faculty
father and son,
group
and the teacher
with lower
should criticize
SETE
the student if
scores
they make
some mistakes
deserving
criticism.
Teachers are
not necessarily
rigorous;
Faculty
teachers and
group
students are
with
like friends,
higher
and teachers
SETE
should be
scores
tolerant when
they should be
tolerant of
students
Attendance and
classroom
discipline
The teachers are
strict and severe
and dress
traditionally or
with slight
variation.
Classroom
style/teaching
manner and
appearance
The teachers are
Teachers want
relaxed, lively
students to talk to
(female) /
them, even if it is not humorous (male),
related to their
and dressed in
studies
fashionable and
neat styles.
Teachers rarely
communicate with
students outside of
class and do not
communicate with
them on matters
other than academic
work.
Extracurricular
communication and
life interactions
The teachers are skilled
in a case study or
scenario-based
teaching and enjoy
writing school-based
textbooks, reference
books or teaching
casebooks.
The teachers prefer
academic research, are
willing to teach
cutting-edge
educational theories,
and are meticulous in
deriving formulas.
Teaching and
research/teaching
preferences
TABLE 1
COMPARISON TABLE OF TEACHING STYLE INDICATORS FOR TEACHERS WITH SIGNIFICANT
DIFFERENCES IN SETE SCORES
Case 5. Students’ Use of the Right to Evaluate Teaching at a University
A random sample of 350 students at a university was surveyed on how students evaluate their teachers.
The results showed that 68% of the students said they considered their teachers based on how much they
liked them. In other words, 68% of the students valued the teacher’s personality more than the basic
teaching skills or effectiveness. At the same time, 47% of the students surveyed admitted a disciplinary bias
when evaluating their teachers. A student who prefers music to physical education is likely to give a higher
rating to the music teacher and a lower rating to the physical education teacher (See Table 2).
TABLE 2
QUESTIONNAIRE FOR STUDENTS’ EVALUATION OF TEACHERS
Question
Item
I do not attach much importance to the final I agree, I strongly agree
course evaluation, and I do not think it has I don’t know
much influence on the teachers
I can’t entirely agree.
I strongly disagree
The mechanism of student evaluation of I agree, I strongly agree
teachers weakens the authority of teachers
I don’t know
I can’t entirely agree.
I strongly disagree
Number of
Percentage
respondents
51.7%
181
23.7%
83
82
179
104
62
23.4%
51.1%
29.7%
17.7%
*Only valid data were selected.
To ensure the rigor and accuracy of the study, a questionnaire on the credit system and teacher
evaluation was distributed to the students to explore the relationship between course evaluation and teachers
and students in a quantitative way. We found that teacher evaluation did not seem to have the desired effect
based on the in-depth interviews. As shown in Table 2, more than half (51.7%) of the students thought that
course evaluation had little impact on the teachers, while only 23.4% disagreed with this statement. Thus,
it can be seen that most students do not think that course evaluation has much impact on teachers, so students
can hardly take assessment courses seriously. Therefore, students may give teachers positive or negative
comments, discouraging teachers’ motivation and weakening the teacher-student relationship.
In addition, more than half (51.1%) of the students were more optimistic about the statement that “The
mechanism of student evaluation of teachers weakens the authority of teachers,” and only 17.7% of the
students disagreed with this statement. This result is highly consistent with our interviews with some
teachers. It indicates that most students believe that student assessment of teachers’ courses could affect
teachers’ sense of authority. It can be inferred from both teachers and students that teachers’ power has
been weakened due to the SETE mechanism, which is far from the value of “a one-day teacher is a lifelong
father” in traditional Chinese culture. It has a significant negative impact on the teacher-student relationship
in colleges and universities.
At the same time, in-depth interviews also showed that 74% of students would change their teacher’s
opinion, thus changing their evaluation score. They get some unique benefits from the teacher outside of
teaching. A teacher who treats students to chocolate increases student favorability, resulting in higher scores
on student evaluations, which is highly consistent with Professor Emery’s findings (Emery & Tian, 2002).
In addition, it is interesting to note that 52% of the students did not evaluate the teaching based on the
teacher’s actual performance but gave the teacher a full 5 out of 5. There were two reasons for this group
of students to score. One is that they think the teachers work very hard and should be recognized and
appreciated; the other is that they believe it is convenient to achieve all 5s and complete the SETE task
quickly.
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 155
DISCUSSIONS
Many scholars believe that the SETE method has more disadvantages than advantages: (1) SETE tends
to train mediocre people and discourages people from taking risks. (2) The SETE method focuses on shortterm performance and lacks a long-term perspective, ignoring critical factors that are not easily measured.
(3) This method focuses on individuals and is not conducive to teamwork. (4) This method is based on
detection, not aimed at prevention. (5) The method is unfair, and the assessment is highly subjective. (6)
This system does not distinguish between endogenous factors of individual differences and exogenous
factors that are not under human control (Huang and Qi, 2014; Trout, 2000; McGregor, 1972; Meyer, Kay
and& French, 1965).
American scholars Milliman and McFadden conducted a study in which they found that 90% of GM
employees considered themselves to be in the top 10% best employees in the company. In this regard, the
two scholars asked these employees whether their motivation would be seriously undermined if managers
did not evaluate their performance highly. It can be seen that the scientific evaluation of employee
performance has a significant impact on the labor productivity of the company. Likewise, suppose
employees are allowed to evaluate their supervisors backward. In that case, it can seriously affect
supervisors’ managerial motivation and, as a result, hurt the labor productivity of the company (Milliman
& McFadden, 1997). Therefore, the scholar Deming strongly condemned these performance evaluation
procedures (Deming, 1986). Human resource management scholars Porter and Lawler’s expectancy model
of motivation explain motivation models’ importance. If employees disagree that “the harder they work,
the greater the reward,” they will not work as hard as they should and will lose their way (Porter & Lawler,
1968).
In our opinion, the evaluation of teaching has two primary purposes: to serve as a basis for reward and
punishment, and the other serves as a reference for development. In the evaluation case for reward and
punishment, the evaluation results are used as the basis for teachers’ promotion and salary increase. In
contrast, in the case of evaluation for development, the evaluation results are used as a reference and
suggestion for teachers to improve their teaching and enhance their teaching skills. However, from our
observation and research, in China’s universities, rewards and punishments overwhelm development in
practice, and teaching evaluation is more like a convenient means of administrative control. As a result,
teachers who desire to receive feedback from students and improve their teaching seek alternative
approaches.
We also believe that the most significant value of evaluating teaching is to provide a platform for
teachers and students to communicate with each other. In implementing the evaluation system, school
administrators must clarify that evaluation scores should be used only as a reference for teachers to improve
their teaching. The evaluation scores should not be used as the basis for appraisal and promotion, at least
not as the only or primary basis for review and advertisement. In short, business managers may symbolically
provide employees with feedback on their work through performance appraisal methods to be aware of
their strengths and weaknesses. To a certain extent, performance appraisals are helpful for companies to
make decisions related to employee management. The author believes that the primary purpose of the SETE
for educational administrators is to provide information and feedback, but not to serve as a basis for making
decisions about teachers’ promotion. It should be the key to the sustainable development of teaching
evaluation by refocusing on the essence of teaching in higher education and attaching importance to the
practical effectiveness of education (Tan, 2014).
CONCLUSIONS AND RECOMMENDATIONS
The SETE approach, which is widely used today, actually rewards teachers for making high SETE
scores by catering to students, thereby lowering the expectations of students and thus diminishing the
quality of teaching (Emery, Kramer, and Tian, 2003; Zhong, 2012; Feldman, 1986; Tan, 2014). The purpose
of teaching evaluation is to help teachers improve their performance. Still, in practice, administrators use it
to make decisions about the fate of teachers (Abrami, d’Apollonia & Cohen, 1990). Worse still, many
156 Journal of Higher Education Theory and Practice Vol. 22(3) 2022
colleges and universities have adopted various means and regulations to get students involved in teaching
evaluation. Some universities require students to evaluate their teachers before checking their final grades.
Others need students to assess their teachers before they can take a course. Others require that it affect
students’ final grades if they do not evaluate their teachers. The author believes that performance evaluation
is necessary for making decisions about individual teachers. SETE results should only be used as a reference
factor and not as a determinant. In this regard, some recommendations for management are proposed:
(1) The SETE method should be oriented to teaching performance rather than student satisfaction;
simultaneously, the sources of the evaluation data should be broadened, and SETE results
should not be used as the sole basis for measuring teaching quality.
(2) Teachers should be evaluated against some criteria, not just a cross-sectional comparison
between universities. Also, comparisons of course evaluations should be made between similar
courses.
(3) It should be ensured that the measures are feasible and that the data are statistically significant.
If a student gives a grade below satisfactory, the student should be requested to write a comment
to add credibility to the negative assessment.
(4) Assessors and third-party monitors should be trained to ensure that the evaluation system is
legitimate, adaptable, and diverse.
(5) Graduates can be invited to evaluate their former teachers. When there is no longer a stake
between teachers and students, and students are more mentally sophisticated due to their social
experience, the evaluation will be more objective, fair and rational.
In short, we should all believe in the principle that the teachers are responsible for teaching and the
students are accountable for their success. Likewise, we should encourage evaluation procedures that
evaluate professors based on their teaching performance. Teaching is essentially an interpersonal
interaction, and it cannot be separated from the students’ perceptions of the teacher’s characteristics.
Therefore, teaching evaluation must be based on teaching performance, and any other factors are considered
secondary and alternate.
REFERENCES
Abrami, P.C., d’Apollonia, S., & Cohen, P.A. (1990). Validity of Student Ratings of Instruction: What
We Know and What We Do Not. Journal of Education Psychology, 82(2), 219–231.
Abrami, P.C., Leventhal, L., & Perry, R.P. (1982). Educational seduction. Review of Educational
Research, 32, 446–464.
Aleamoni, L. (1987). Student rating: myths versus research facts. Journal of Personnel Evaluation in
Education, 1, 111–119.
Aleamoni, L. (1989). Typical faculty concerns about evaluation of teaching. In L.M. Aleamoni (Ed.),
Techniques for Evaluating and Improving Instruction. San Francisco, CA: Jossey-Bass.
Cascio, W.F., & Bernardin, H.J. (1981). Implications of performance appraisal litigation for personnel
decisions. Personnel Psychology, 34, 211–226.
Cashin, W.E. (1989). Defining and evaluating college teaching. IDEA Paper No. 21, Center for Faulty
Evaluation and Development, Kansas State University, Manhattan, KS.
Cashin, W.E. (1990). Students do rate different academic fields differently. In M. Theall & J. Franklin
(Eds.), Student Ratings of Instruction: Issues for Improving Practice. San Francisco, CA: JosseyBass.
Cashin, W.E. (1996). Developing an effective faculty evaluation system. IDEA Paper No. 33, Center for
Faulty Evaluation and Development, Kansas State University, Manhattan, KS.
Chen, Q. (2012). On the Development Path of Civil Rights Protection in the United States. The Journal of
Shandong Agricultural Administrators’ College, 6, 71–73.
Cohen, P.A. (1983). Comment on a selective review of the validity of student ratings of teaching. Journal
of Higher Education, 54, 448–458.
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 157
Damron, J.C. (1996). Instructor personality and the politics of the classroom. Douglas College, New
Westminster, British Columbia, Canada.
Deming, W.E. (1986). Out of the Crisis. MIT Center for Advanced Engineering Study, Cambridge, MA.
Dong, G.C. (2014). A Study of Non-Classroom Factors in SETE. Higher Education Exploration, 2, 104–
106.
Dooris, M.J. (1997). An Analysis of the Penn State Student Rating of Teaching Effectiveness. A Report
Presented to The University Faculty Senate of the Pennsylvania State University.
Dowell, D.A., & Neal, J.A. (1982). A selective review of the validity of student ratings of teaching.
Journal of Higher Education, 53, 51–62.
Dowell, D.A., & Neal, J.A. (1983). The validity and accuracy of student ratings of instruction: A reply to
Peter A. Cohen. Journal of Higher Education, 54, 459–63.
Emery, C., & Tian, R. (2002). Schoolwork as Products, Professors as Customers: A Practical Teaching
Approach in Business Education. Journal for Business Education, 78(2), 97–102.
Emery, C.R., Kramer, T.R., & Tian, R.G. (2003). Return to Academic Standards: A Critique of Student
Evaluations of Teaching Effectiveness. Quality Assurance in Education, 11(1), 37–46.
Feldman, K.A. (1978). Course characteristics and college students’ ratings of their teachers: What we
know and what we don’t. Research in Higher Education, 9, 199–242.
Feldman, K.A. (1986). The perceived instructional effectiveness of college teachers as related to their
personality and attitudinal characteristics: a review and synthesis. Research in Higher Education,
24, 139–213.
Guan, H.H. (2012). An Empirical Study on the Effectiveness of SETE in Ningde Normal University.
Journal of Ningde Normal University, 3, 103–109.
Huang, T.Y., & Qi, H.X. (2014). An Analysis of the Factors Influencing SETE Based on Individual
Teachers’ Perspectives. Education and Vocation, 3, 103–105.
McGregor, D. (1972). An uneasy look at performance appraisal. Harvard Business Review, pp. 19–27.
Meyer, H.H., Kay, E., & French, J.R. (1965). Split roles in performance appraisal. Harvard Business
Review, pp. 28–37.
Milliman, J.F., & McFadden, F.R. (1997). Toward changing performance appraisal to address TQM
concerns: The 360-degree feedback process. Quality Management Journal, 4(3), 44–64.
Mohrman, A.M. (1989). Deming Versus Performance Appraisal: Is There a Resolution. Center for
Effective Organisations. Los Angeles, CA: University of Southern California.
Porter, L.W., & Lawler, E.E. (1968). Managerial Attitudes and Performance. Burr Ridge, IL: Irwin
Publishing.
Schmelkin, L.P., Spencer, K.J., & Gellman, E.S. (1997). Faculty perspectives on course and teacher
evaluations. Research in Higher Education, pp. 575–592.
Tan, Y.E. (2014). Reflection and Trend of Teaching Evaluation in Universities. Chongqing Higher
Education Research, 2(5), 83–87.
Trout, P.A. (2000). Flunking the Test: The Dismal Record of Student Evaluations. The Touchstone, 10(4),
11–15.
Uttl, B., White, C.A., & Gonzalez, D.W. (2017). Meta-analysis of faculty’s teaching effectiveness:
Student evaluation of teaching ratings and student learning are not related. Studies in Educational
Evaluation, 54, 22–42.
Wang, J., & Yu, J.J. (2016). Teaching-centered or Learning-centered Teacher Ratings by Students: An
Analysis Based on Indexes of 30 Institutions of Higher Education. Journal of Soochow University
(Educational Science Edition), 02, 104–112.
Wu, S. (2013). Study on the Factors Affecting SETE in China’s Universities. Dalian: Dalian University of
Technology.
Wu, Y.Q. (2004). The Actual Malice Rule as Applied Under American Defamation Law. National Chung
Cheng University Law Journal, 15, 1–97.
158 Journal of Higher Education Theory and Practice Vol. 22(3) 2022
Xie, J.L., & Zhang, C. (2019). A Study on the Influence of Non-Instructional Factors on the Effectiveness
of SETE in Higher Education - Based on the Perspective of Student Subjects. Heilongjiang
Education (Higher Education Research & Appraisal), 7, 25–28.
Zhang, G.J., Ma, X.P., & Jiang, T.K. (2017). On the Feedback of SETE Outcomes. University Education,
7, 194–195.
Zhong, G.Z. (2012). Validity of College Students’ Evaluation of Teaching and Its Optimization
Strategies. Journal of Jimei University, 13(1), 74–77.
Zhou, W. (2009). SETE System in U.S. Colleges and Universities and Its Inspirations. Journal of
Hulunbeier College, 4, 107–110.
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 159
Teaching in Higher Education
Critical Perspectives
ISSN: 1356-2517 (Print) 1470-1294 (Online) Journal homepage: https://www.tandfonline.com/loi/cthe20
Course evaluation scores: valid measures for
teaching effectiveness or rewards for lenient
grading?
Guannan Wang & Aimee Williamson
To cite this article: Guannan Wang & Aimee Williamson (2020): Course evaluation scores: valid
measures for teaching effectiveness or rewards for lenient grading?, Teaching in Higher Education,
DOI: 10.1080/13562517.2020.1722992
To link to this article: https://doi.org/10.1080/13562517.2020.1722992
Published online: 05 Feb 2020.
Submit your article to this journal
Article views: 56
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=cthe20
TEACHING IN HIGHER EDUCATION
https://doi.org/10.1080/13562517.2020.1722992
Course evaluation scores: valid measures for teaching
effectiveness or rewards for lenient grading?
Guannan Wanga and Aimee Williamsonb
a
Accounting Department, Suffolk University, Boston, MA, USA; bInstitute for Public Service, Suffolk University,
Boston, MA, USA
ABSTRACT
ARTICLE HISTORY
Course Evaluation Instruments (CEIs) are critical aspects of faculty
assessment and evaluation across most higher education
institutions, but heated debates surround the value and validity of
such instruments. While some argue that CEI scores are valid
measures of course and instructor quality, others argue that
faculty members can game the system, most notably with lenient
grading practices to achieve higher student ratings. This article
synthesizes the literature on course evaluation instruments as
they relate to student grades to assess the evidence supporting
and refuting the major theoretical frameworks (i.e. leniency
hypothesis and validity hypothesis), explores the implications
of research design and methods and proposes practical
recommendations for colleges and universities. This paper also
goes beyond the CEI-grade relationship and provides a framework
that illustrates the relationships between teaching quality and CEI
scores, and the potential confounding factors and omitted
variables which may significantly deteriorate the informativeness
of the CEI score.
Received 25 July 2019
Accepted 23 January 2020
KEYWORDS
Course evaluation
instrument; expected grade;
teaching quality; student
learning
JEL Classification
I20; I21
1. Introduction
Course evaluation processes are critical and influential components of teaching, with
significant weight for review, tenure, and promotion decisions across most universities.
The course evaluation instrument (CEI) is widely used by institutions of higher education to evaluate and improve teaching quality. Student evaluations of courses are
common among colleges and universities and virtually all business schools use some
form of student evaluations (Clayson 2009; Brockx, Spooren, and Mortelmans 2011).
The first student rating forms were completed at the University of Washington in
the 1920s, and the first research on student ratings followed soon after (Kulik 2001).
Despite closing in on a century of use, there is still much debate as to the validity
and appropriate use of student evaluations of courses. Given the important role
these evaluations play in faculty tenure and promotion processes, it is not very surprising that such student evaluations continue to generate significant debate and attention
in the literature.
CONTACT Guannan Wang
igwang@suffolk.edu
© 2020 Informa UK Limited, trading as Taylor & Francis Group
Suffolk University, 120 Tremont Street, Boston, MA 02108, USA
2
G. WANG AND A. WILLIAMSON
Student rating programs were originally designed, and continue to be used, for two
main reasons: (1) to help instructors improve their teaching and (2) to help administrators
oversee teaching quality across the institution and make related decisions (Kulik 2001;
Brockx, Spooren, and Mortelmans 2011). These broad goals have evolved into many significant and influential uses of student evaluations. These include the use of course evaluation scores in hiring new full time and adjunct faculty, annual review processes,
promotion and tenure decisions, teaching awards, assignment of faculty to courses,
accreditation reviews, development of professional development programs, merit pay,
and student selection of courses (Kulik 2001; Barth 2008; Benton 2011; Brockx,
Spooren, and Mortelmans 2011; Catano and Harvey 2011; Chulkov and Alstine 2011).
Some schools have thresholds for student course evaluation scores, below which a
faculty member is ineligible for tenure. One or two bad CEI scores may also mean that
an adjunct faculty member will not be given another opportunity to teach in a school.
It is critical that we do our best to fully understand the course evaluation process,
create valid and informative course evaluation forms, and use them in the most appropriate manner.
Student evaluations are the most widely used source for evaluating teaching effectiveness, even serving as the only source in many colleges (Benton 2011). The use and
influence of such evaluations have increased in recent years, however, in part due to
broader trends in accountability and marketization of higher education (Brockx,
Spooren, and Mortelmans 2011). Accreditation requirements may also drive the use of
student evaluations (Brockx, Spooren, and Mortelmans 2011). Even with the availability
of other forms of evaluation, student evaluations typically have the most impact and
receive the most attention (Dodeen 2013).
Research on CEIs has identified relationships between CEI scores and a variety of
factors, such as course grades (Krautmann and Sander 1999; McPherson 2006; Weinberg,
Hashimoto, and Fleisher 2009; Brockx, Spooren, and Mortelmans 2011; among others),
class attendance (Arnold 2009; Brockx, Spooren, and Mortelmans 2011; Braga, Paccagnella, and Pellizzari 2014), discipline (McPherson 2006; Nowell 2007; Driscoll and
Cadden 2010; Matos-Díaz and Ragan 2010), class type (Krautmann and Sander 1999;
Centra 2003; Driscoll and Cadden 2010), class level (Nelson and Lynch 1984; Nowell
2007; Driscoll and Cadden 2010; Ewing 2012) and many other factors.
The interpretation of these relationships has generated even further debate. It is critical
that we develop a good understanding of this process and its impact. As others have
suggested, if evaluation scores can be ‘bought’, the instrument most used for measuring
teaching effectiveness is flawed and may contribute to grade inflation at a more systemic
level (Krautmann and Sander 1999).
At its very core is the debate over whether course evaluation instruments are valid
measures of teaching. As Kulik (2001) succinctly states, ‘[t]o say that student ratings
are valid is to say that they reflect teaching effectiveness’ (p. 10). While some faculty
members see CEI scores as valid measures that inform their teaching and bring needed
accountability to higher education, others view CEI scores as invalid measures that are
more likely to reflect student bias and retaliation than instructor performance. Some
point out that student evaluation ratings are more appropriately measures of ‘satisfaction’
than outcomes or teaching value (Benton 2011). Given that other measures of teaching
performance, such as exam scores and peer evaluations, carry similar or even stronger
TEACHING IN HIGHER EDUCATION
3
concerns about validity and reliability, there is no holy grail by which to measure teaching
effectiveness and compare it to CEI scores (Kulik 2001).
2. Research questions & method
As explained above, one of the most controversial topics in the CEI literature is the association between students’ expected grades and CEI scores. We identified two critical questions surrounding this debate: (1) Is there a relationship between grades (actual, expected,
etc.) and CEI scores? (2) If so, what is the nature of or explanation for that relationship?
While previous studies have argued responses to these questions, the findings are mixed,
demonstrating a strong need for a more comprehensive analysis.
To answer these questions and inform the debate surrounding the validity and leniency
hypotheses, we conducted a comprehensive survey of the CEI literature, identifying and
analyzing pedagogical studies that shed light on the relationship between grades and
CEI scores, particularly student expected grades. First, we searched for educational articles
related to course evaluation in the major databases, including ABI/INFORM, Business
Source Complete, ScienceDirect, and Google Scholar by using a list of keywords.1 The
initial search found 72 published articles related to student evaluations. Second, given
that our focus is the impact of grades on student evaluations, we further limited the
sample to studies incorporating grade (actual grade or expected grade) in their study.
That narrowed the sample down to the 28 studies listed in Tables 1 and 2.
Tables 1 and 2 summarize the research type, research question, and data source of the
related literature. Table 3 presents a summary of the choice of research method, dependent
variables, independent variables, statistical results, and control variables. Our analysis
includes an evaluation of the arguments in the existing literature, implications of research
designs and methods, confounding factors, and practical implications. Among the 28
studies reviewed in Tables 1 and 2, 24 of them are empirical analyses and thus discussed
in Table 3.
3. Prior discussion on the CEI-Grade relationship
3.1. Leniency hypothesis vs. validity hypothesis
As noted above, there has been widespread debate around the association between students’ grades and CEI scores. Many studies show consistent evidence that course
grades, both expected and relative among peers, have a positive relationship with the
CEI score (Marsh and Roche 2000; Isely and Singh 2005; Driscoll and Cadden 2010;
Brockx, Spooren, and Mortelmans 2011). However, several researchers have cast doubt
on that contention and find no significant association between course grades and CEI
scores or find that the impact of expected grades on CEI scores is subtle and can be
explained by other factors (Centra 2003; Arnold 2009). Among the 28 studies we surveyed,
24 studies performed statistical analyses on the relationship between grades and CEI
scores. 19 of these studies demonstrate a positive association to some degree between
the (average) CEI score and (average) grade expectation, 4 studies do not find any significant association, and 1 study finds a negative association. The most common measures
and proxies for grade are individual expected grade, class-average expected grade,
4
G. WANG AND A. WILLIAMSON
Table 1. Pedagogical research on the impact of course grade on student evaluation: publication outlet
and research question.
Author
Year
Arnold, I. J. M
2009
Do examinations influence
student evaluations?
Bausell, R.B. and
J. Magoon
1972
Beleche, T., D. Fairris,
and M. Marks
2012
Braga, M.,
M. Paccagnella,
and M. Pellizzari
Brockx, B., P.Spooren,
and D. Mortelmans
2014
Expected grade in a course,
grade point average, and
student ratings of the course
and the instructor
Do course evaluations truly
reflect student learning?
Evidence from an objectively
graded post-test
Evaluating students’
evaluations of Professors
Butcher, K. F.,
P. J. McEwan, and
A. Weerapana
Centra, J. A.
2014
2003
Clayson, D. E.
2009
Driscoll, J. and
D. Cadden
2010
Ewing, A. M.
2012
Gorry, D.
2017
Greenwald, A. G.,
G. M.Gillmore
1997a
Greenwald, A. G.,
G. M.Gillmore
1997b
Hoefer, P.,J.
Yurkiewicz, and
J. C. Byrne
2012
2011
Article Name
Taking the grading leniency
story to the edge. The
influence of student, teacher,
and course characteristics on
student evaluations of
teaching in higher education
The effects of an anti-gradeinflation policy at Wellesley
College
Will teachers receive higher
student evaluations by giving
higher grades and less course
work?
Student Evaluations of
Teaching: Are They Related to
What Students Learn?: A
Meta-Analysis and Review of
the Literature
Student evaluation
instruments: the interactive
impact of course
requirements, student level,
department and anticipated
grade
Estimating the impact of
relative expected grade on
student evaluation of
teachers
The impact of grade ceilings on
student grades and course
evaluations: Evidence from a
policy change
Grading leniency is a
removable contaminant of
student ratings
No pain,no gain? The
importance of measuring
course workload in student
ratings of instruction
The association between
students’ evaluation of
teaching and grades
Journal
International
Journal of
Educational
Research
Educational and
Psychological
Measurement
Economics of
Education Review
Economics of
Education Review
Research questions
Measures the impact of timing
on student evaluations
Examines the relation between
expected grade and the
course rating
The relationship between
student course evaluations
and an objective measure of
student learning
Contrasts measures of teacher
effectiveness
Educ Assc Eval Acc
Examines the influence of
course grades and other
characteristics of students on
student evaluations
Journal of
Economic
Perspectives
Research in Higher
Education
Evaluates the consequences of
the mandatory grade ceiling
on student evaluations
Examines the relationship
between the expected
grades, the level of difficulty,
workload in courses, and
course rating
The relationship between the
evaluations and learning
Journal of
Marketing
Education
American Journal of
Business
Education
Examines the relationship
between measures of
teaching effectiveness and
several factors, including the
students’ anticipated grade
Economics of
Education Review
Investigates instructors’
incentives to ‘buy’ higher
evaluation scores by inflating
grades
The effects of a grade ceiling
policy on grade distributions
and course evaluations
Economics of
Education Review
American
Psychologist
Journal of
Educational
Psychology
Decision Sciences
Journal of
Innovative
Education
Examines the relation between
grading leniency and student
evaluations
Examines the relation between
course grade and student
evaluations
Examines the relation between
course grade and course
rating, and the moderating
role of gender, academic
level, and field.
(Continued )
TEACHING IN HIGHER EDUCATION
5
Table 1. Continued.
Author
Year
Isely, P. and H.Singh
2005
Do higher grades lead to
favorable student
evaluations?
Article Name
The Journal of
Economic
Education
Krautmann, A. C,and
W.Sander
1999
Grades and student evaluations
of teachers
Economics of
Education Review
Love, D. A. and
M. J. Kotchen
2010
Grades, course evaluations, and
academic incentives
Eastern Economic
Journal
Marsh, H. W. and
L. A. Roche
2000
Matos-Díaz, H. and
J. R. Ragan Jr
2010
Journal of
Educational
Psychology
Education
Economics
McPherson, M. A.
2006
Effects of grading leniency and
low workload on students’
evaluations of teaching
Do student evaluations of
teaching depend on the
distribution of expected
grade?
Determinants of how students
evaluate teachers
Millea, M. and
P. W. Grimes
2002
Nelson, J. P, and
K. Lynch
1984
Nowell, C.
2007
The impact of relative grade
expectations on student
evaluation of teaching
Remedios, R. and
D. A. Lieberman
2008
I like your course because you
taught me well: The influence
of grades, workload,
expectations and goals on
students’ evaluations of
teaching
Stumpf, S. A. and
R. D. Freedman
1979
Uttl, B., C. A. White,
and D. W. Gonzales
2017
VanMaaren, V. G.,
C. M.Jaquett, and
R. L.Williams
2016
Expected grade covariation
with student ratings of
instruction: Individual versus
class effects
Meta-analysis of faculty’s
teaching effectiveness:
Student evaluation of
teaching ratings and student
learning are not related
Factors most likely to
contribute to positive course
evaluations
Weinberg, B. A.,
M. Hashimoto,
2009
Grade expectations and
student evaluation of
teaching
Grade inflation, real income,
simultaneity, and teaching
evaluations
Evaluating Teaching in Higher
Education
Journal
The Journal of
Economic
Education
College Student
Journal
The Journal of
Economic
Education
International
Review of
Economics
Education
British Educational
Research Journal
Journal of
Educational
Psychology
Studies in
Educational
Evaluatio
Innovative Higher
Education
Journal of
Economic
Education
Research questions
Examines the relation between
the expected grade in other
classes of the same course
and student evaluations
Examines the relation between
grading practices and
student evaluations
Investigate the incentives
created by academic
institutions affect students’
evaluation on faculty and
grade inflation
Examines the relation between
grading leniency and student
evaluations
Examines the relation between
the distribution of expected
grades and student
evaluations
Grade expectations and
student evaluation of
teaching
Examines the links between
course rigor and grades to
evaluation scores
Examines the relation between
student evaluation and grade
inflation and the moderating
role of faculty real income
Examines the relation between
student evaluations and
relative grades among peers
Investigates how factors such
as students’ pre-course
expectations, achievement
goals, grades, workload, and
perceptions of course
difficulty affect how they rate
their courses
Compares individual and class
effects and their role on
student rating of instruction
Re-estimate previously
published meta-analyses and
examine the relationship
between CEI score and
student learning.
Determines the extent to which
students differentially rated
ten factors likely to affect
their ratings on overall course
evaluations
Examines the relation between
grading practices and
student evaluations and the
role of learnings
6
G. WANG AND A. WILLIAMSON
Table 2. Pedagogical research on the impact of course grade on student evaluation: research type, data
source, and sample size.
Author
Year
Arnold, I. J. M
2009
Archival
Method
Erasmus School of Economics
Target Sample (Survey/experimental)
Bausell, R.B. and J. Magoon
1972
Archival
University of Delaware
Beleche, T., D. Fairris, and
M. Marks
Braga, M., M. Paccagnella, and
M. Pellizzari
Brockx, B., P. Spooren, and
D. Mortelmans
Butcher, K. F., P. J. McEwan, and
A. Weerapana
Centra, J. A.
2012
Archival
2014
Archival
Unidentified four-year public
university
Bocconi University
2011
Archival
University of Antwerp
1,244 students
2014
Archival
Wellesley College
104,454 students
2003
Archival
55,000 classes
Clayson, D. E.
2009
Driscoll, J. and D. Cadden
Ewing, A. M.
Gorry, D.
Greenwald, A. G., G. M.Gillmore
Greenwald, A. G., G. M.Gillmore
Hoefer, P.,J. Yurkiewicz, and
J. C. Byrne
Isely, P. and H.Singh
Krautmann, A. C,and W.Sander
Love,D. A. and M. J. Kotchen
Marsh, H. W. and L. A. Roche
Matos-Díaza, H. and J. R. Ragan Jr
McPherson, M. A.
Millea, M. and P. W. Grimes
Nelson, J. P, and K. Lynch
Nowell, C.
Remedios, R. and D. A. Lieberman
Stumpf, S. A. and R. D. Freedman
2010
2012
2017
1997a
1997b
2012
MetaAnalysis
Archival
Archival
Archival
Theory
Archival
Archival
Student Instructional Report II by
Educational Testing Service
More than 17 prior archival research
Quinnipiac University
University of Washington
Unidentified state university
N/A
University of Washington
Pace University
29,596 students
53,658 classes
281 classes
N/A
200 classes
381 Classes
2005
1999
2010
2000
2010
2006
2002
1984
2007
2008
1979
Archival
Archival
Theory
Archival
Archival
Archival
Archival
Archival
Archival
Archival
Archival
Grand Valley State University
DePaul University in Chicago
N/A
American University
University of Puerto Rico at Bayamón
University of North Texas
Mississippi State University
Penn State University
A large public university in the US
Scottish University
New York University
Uttl, B., C. A. White, and
D. W. Gonzales
VanMaaren, V. G., C. M.Jaquett,
and R. L.Williams
Weinberg, B. A., M. Hashimoto,
and B. M. Fleisher
2017
More than 58 prior research
2016
MetaAnalysis
Archival
260 classes
258 Classes
N/A
5,433 classes
1,232 classes
607 classes
149 students
146 classes
716 students
610 students
5,894 Students and
197 classes
N/A
2009
Archival
A large state university in the
southeastern US
Ohio State University
Sample Size
Around 3,000
students
Over 17,000
students
4,293 students
1,206 students
N/A
148 students
26,666 Students
individual expected grade divided by GPA, individual expected grade relative to the
section average, individual expected grade relative to the actual grade, actual course
grade, overall GPA, grade in the subsequent course, and high school grades. The most
used measures of CEI scores include overall course rating, overall instructor rating, and
rating on instructor’s teaching ability. We present a summary of the choices of research
methods, dependent variables, independent variables, statistical results, and control variables in Table 3.
While many studies have provided empirical evidence supporting the relationship
between grades and CEI scores, the interpretation of such a relationship is under
debate. Greenwald and Gillmore (1997) suggest that the grade–rating correlation primarily results from instructors’ grading leniency. This study established the fundamental
theory of the relationship between course grades and CEI scores and represents the
leniency hypothesis. Another interpretation is the validity hypothesis which posits that
Table 3. Sample selection and variable definition.
Sample
level
Author
Year
Bausell, R. B. and
J. Magoon
Stumpf, S. A. and
R. D. Freedman
1972
Student
1979
Dependent Variables
Independent Variables
Statistical Results
Control Variables
Course evaluation score/Instructor
evaluation score
Ratings of courses and instructors
Expected grade
ns
N/A
Expected grade
Positive;
**/Positive; ***
N/A
Average course evaluation score
Average expected grade
Positive; *
Absolute expected grade/ Relative
expected grade
Expected grade
Positive; ***/
Positive; ***
Positive; ***
Average instructor evaluation, average
present grade, instructor’s average real
income by rank, instructor’s access,
instructor’s interest, instructor’s
organization, class time, class size,
Saturday meeting time, instructor
experience, class level, exam grade and
workload
Self-progress, same instructor, workload
Average expected grade
Attitude about remaining graded work
/Current earned grade
ns
Positive;
***/Positive; ***
Average expected grade
ns
Average expected grade/ Relative
expected grade (the gap between
expected grade and cumulative grade
point average of incoming students)
ns /Positive;
1984
Greenwald, A. G.,
G. M.Gillmore
Krautmann and
Sander
Marsh and Roche
Millea and Grimes
1997b
Class
Average course evaluation score
1999
Student
Course evaluation score
2000
2002
Class
Student
Centra
2003
Class
Average course evaluation score
The overall evaluation, the rating
directly related to the quality of
the course, and the rating directly
related to the quality of the
instructor
Average course evaluation score
Isely and Singh
2010
Class
Average course evaluation score
Instructor gender, instructor rank, class
size, class type, and the class level
Perceived learning and course workload
Student’s gender, student’s race, student’s
age, student’s intellectual ability, and
course difficulty
Course difficulty, course workload, student
effort and involvement, course type,
course level, class size, institutional type,
teaching by lecture, teaching by
discussion or laboratories, and course
outcomes
Class size, percentage of students taking a
required course, percentage of students
that are majors, average cumulative GPA
of students in each class, intensive
writing requirements, length of class,
class time, class location, percentage of
7
(Continued )
TEACHING IN HIGHER EDUCATION
Nelson and Lynch
Student
and
Class
Class
Author
8
Table 3. Continued.
Year
Sample
level
Dependent Variables
Independent Variables
Statistical Results
2006
Class
Average course evaluation score
Average expected grade
Negative; ***
Nowell
2007
Student
Course evaluation score
Positive; **/
Positive; */
Positive **/ns/
Positive; **
Remedios, R. and
D. A. Lieberman
2008
Student
Course evaluation score
Individual expected grade/ Individual
expected grade divided by GPA/
Individual expected grade relative to
the section average/Individual
expected grade relative to the course
average/ Individual expected grade
relative to the average grade given by
the instructor in all classes
Course Grade
Indirectly
Arnold
2009
Student
The overall course evaluation score
and the scores on separate items
Course Grade
Positive; ***
Weinberg and
Hashimoto,
2009
Class
Average course evaluation score
Average course grade
Positive; ***
Driscoll and Cadden
2010
Student
Expected grade
Positive; ***
Matos-Díaza and
Ragan
2010
Class
Instructor’s teaching ability and
whether the student would
recommend this instructor to a
friend.
Average course evaluation score
Average expected grade
Positive; **
Brockx, Spooren, and
Mortelmans
2011
Student
Course evaluation score
Course grade/ Overall grade
Positive; **/ ns
Achievement motivation, study hours,
perceived difficulty, and pre-course
expectations
High school grade, self-reported measures
of students’ class attendance and study
effort
Grades in future sections, female
instructor, foreign born instructor,
lecturer, graduate associate, instructor
has PhD, instructor’s experience, Multisection class, honors class, and class time
Discipline, course type, course level
Actual GPA, instructor’s rank, instructor’s
degree, instructor’s age, instructor’s
gender, class size, class time, discipline,
and academic term,
Course type, class attendance, instructor’s
gender, instructor’s age, student’s
gender, course workload, class size, and
examination period in which the
students received their highest course
grades
G. WANG AND A. WILLIAMSON
McPherson
Control Variables
class that is represented in course
evaluation, and number of years a
instructor has taught at the university
Discipline and the proportion of students
who completed the evaluation
questionnaire
Whether the instructor was part-time, the
percentage of the student’s grade that
was based on testing, course level, class
size, the number of times the class met
each week, disciplines, the student’s age,
the student’s gender, self-reported
measures of student’s effort in the class
2012
Student
Course evaluation score
Grade in the current course/ Grade in the
subsequent course
Positive; **/ ns
Ewing
2012
Class
Average course evaluation score
Average relative expected grade
Positive; ***
Hoefer, P.,J.
Yurkiewicz, and
J. C. Byrne
Braga, Paccagnella
and Pellizzari
2012
Class
Average course evaluation score
Normalized student grade
Positive; *
2014
Class
Average course evaluation score
Average high school grade/ Overall
teaching quality/ Overall clarity of the
lectures
Positive; **/
Negative; **/
Negative; **
Butcher, K. F.,
P. J. McEwan, and
A. Weerapana
VanMaaren, V. G.,
C. M.Jaquett, and
R. L.Williams
2014
Class
Average course evaluation score
Mandatory grade cap
Negative; ***
2016
Student
Final grade
Expected grade
Positive; ***
Gorry
2017
Class
Average course evaluation score
Average course grade
Positive; *
Cumulative high school GPA, placement
score, SAT verbal, SAT writing, ACT, and
indicators for missing SAT, ACT or
placement score, student’s age,
student’s gender, student’s ethnicity,
student’s housing status, first
generation, low income, term,
enrollment, course evaluation response
rate, withdrawal rate, and percent of
students repeating the class
Actual average grade, instructor’s ranking,
course level, class size, course evaluation
response rate, discipline, class time, and
class frequency.
Gender, academic level, discipline
Class size, class attendance, high school
grade, entry test score, percentage of
females, percentage of non-local
students, percentage of late enrollees,
student ability, class time, room’s floor,
and classroom building.
Age, faculty gender, faculty tenure status,
class level, class size
Gender, academic classification, class
characteristics such as relevant class
discussion, extra credit, well-organized
classes, small-group activities, course
papers, student presentation and course
standards.
Ceiling policy, class size, instructor, and
academic semester
TEACHING IN HIGHER EDUCATION
Beleche, Fairris, and
Marks
9
10
G. WANG AND A. WILLIAMSON
more effective teachers mean higher student learning, which translates into higher grades
and higher CEI scores. In the following sections, we provide a detailed discussion of the
two compelling hypotheses proposed by prior studies.
3.1.1. Leniency hypothesis
The leniency hypothesis posits that students give higher CEI scores to instructors from
whom they receive higher grades. Supporters of the leniency hypothesis generally argue
that ‘instructors can “buy” higher grades by grading more leniently’ (Krautmann and
Sander 1999; McPherson 2006; Weinberg, Hashimoto, and Fleisher 2009; among others).
In an early study, Greenwald and Gillmore (1997) find that courses that receive higher
CEI scores are those in which students expect to receive higher grades or a lighter workload, not necessarily those with higher teaching quality. Many studies interpret the
relationship between course grades and CEI scores to support the leniency hypothesis.
For example, Krautmann and Sander (1999) show that a one-point increase in the
expected classroom grade point average (GPA) leads to an improvement of between
0.34 and 0.56 in the CEI score. Similarly, McPherson (2006) finds that an increase of
one point on a four-point expected grade scale results in an improvement in the CEI
score of around 0.34 for foundational courses and 0.30 for upper-level courses. Brockx,
Spooren, and Mortelmans (2011) find that when a student’s course grade increases by
one point, the CEI score increases by 0.33 (grand-mean centered) and 1.56 (groupmean centered). Millea and Grimes (2002) report similar findings that both the current
grade and expected grade have a positive relationship with the CEI score.
Some studies dig deeper to provide clearer evidence of the leniency hypothesis. According to Handelsman et al. (2005), most college students can be classified as performanceoriented rather than mastery-oriented, indicating that their satisfaction with a course is
largely based on their grade in that course. Braga, Paccagnella, and Pellizzari (2014) performed a similar analysis of teaching effectiveness and find that teaching quality is negatively correlated with students’ CEI scores.
In addition to empirical evidence, both Gorry (2017) and Butcher, McEwan, and Weerapana (2014) provide anecdotal evidence regarding the impact of a change of grading
policy on CEI scores at Wellesley College. Butcher, McEwan, and Weerapana (2014)
examine the policy change at Wellesley College by comparing the CEI scores between
departments that were obligated to lower their grades with the outcomes in departments
that were not. The study finds that students in the ‘grading-decreasing’ courses lowered
their evaluations of the instructors accordingly. Similarly, Gorry (2017) analyzes the
effects of a grade ceiling policy implemented by a large state university on grade distributions and CEI scores; such research shows that lowering the grade ceiling significantly
decreases CEI scores across a variety of measures.
3.1.2. Validity hypothesis
The main difference between the leniency and validity hypotheses is whether student
evaluations reflect the quality of teaching or simply capture the grading-satisfaction
game between the instructors and students. Supporters of the validity hypothesis argue
that instructors who teach more effectively receive better evaluation scores because their
students learn more, thereby earning higher grades. In other words, CEI is a valid instrument (Centra 2003; Barth 2008; Remedios and Lieberman 2008; Arnold 2009; Clayson
TEACHING IN HIGHER EDUCATION
11
2009). Essentially, the validity hypothesis suggests that even if there is a strong correlation
between student grades and CEI scores, we cannot be sure that there is causality.
Using more than 50,000 CEI scores, Centra (2003) investigates the previously examined
relationship between grades and student evaluations. Unlike previous researchers, Centra
(2003) controls for a series of variables in regression analyses, including factors such as
subject area, class size, teaching method, and student-perceived learning outcomes. Contrary to many other analyses, Centra (2003) does not find convincing evidence that students’ course ratings are influenced by the grades they receive from their instructors
when controlling for other factors. Rather, the findings suggest a curvilinear relationship
between the difficulty/workload level of courses and the CEI score, all of which are more
indicative of students’ learning experiences.
Centra’s (2003) arguments are further confirmed by a few other studies. Remedios and
Lieberman (2008) find that grades only have a small impact on student ratings compared
with other influential factors. By controlling for students’ achievement goals and expectations at the beginning of the semester, Remedios and Lieberman (2008) show that students’ course ratings are largely determined by the extent to which the students find their
courses stimulating, interesting, and useful. The impact of grades and course difficulty
appears to be small. Marsh and Roche (2000) find similar results that many CEI scores
are not related to grading leniency; rather, they are more related to the learning experience
and teaching efforts. Clayson (2009) conducts a meta-analysis on more than thirty studies
and shows a small average relationship exists between learning and the CEI score.
However, the author highlights that such a relationship is situational and may vary
across teachers, disciplines, or class levels. Barth (2008) shows that the overall instructor
rating is primarily driven by the quality of instruction. Beleche, Fairris, and Marks (2012)
examine the learning–CEI association by using a centrally graded exam as a proxy for
actual student learning. This exam was not related to any specific course, so the sample
was independent of course type, faculty grading policy, and students’ grade expectations.
The literature also suggests inconsistencies and a lack of linearity. For example, Arnold
(2009) finds that successful students do not increase the CEI score in response to their successful performance, whereas unsuccessful students externalize failure by lowering the CEI
score (Arnold 2009). Such results are inconsistent with the common criticism of CEIs,
which is that students use it as a tool to reward or penalize teachers.
3.2. Other factors impact CEI scores
As suggested above, it is well documented that the CEI-grade relationship varies considerably across different subgroups of observations, and that other factors are as impactful if
not more so than grade itself. This paper focuses on the relationship between student
grades and CEI scores, but it is important to remember that this is just one piece of a
complex picture. Figure 1 proposes a diagram representing various factors that impact
CEI scores and their relationships, including student grades. Going into depth on all of
these factors is beyond the scope of this paper, but it is important to keep in mind, particularly in cases where confounding factors may have a strong intersection with the
student grade-CEI score relationship. The most commonly documented confounding
factors include workload, course discipline, course level, class size, class attendance, percentage of non-local students, percentage of late enrollees, student effort, class time, class
12
G. WANG AND A. WILLIAMSON
Figure 1. A Framework for Understanding the Relationship Between Teaching Quality and CEI Scores.
location, class frequency, instructor’s ranking, instructor’s gender, course evaluation
response rate, etc. (Krautmann and Sander 1999; Millea and Grimes 2002; Centra 2003;
among others). We will highlight a few factors found to have a strong impact on the
CEI score.
3.2.1. Workload
A number of studies find that there is a negative relationship between workload and CEI
score, as students typically rate courses higher if they are more manageable (Feldman
1978; Marsh 1987; Paswan and Young 2002; Centra 2003; Clayson 2009; Driscoll and
Cadden 2010). The results from Marsh and Roche (2000) and Centra (2003) indicate
that courses with lighter workloads, such as lower ‘hours per week required outside of
class’, receive higher student ratings.
3.2.2. Course characteristics
Course type, course level, and discipline all have a significant impact on CEI scores.
Brockx, Spooren, and Mortelmans (2011) conclude that instructors teaching elective
courses receive higher scores than instructors teaching required ones. Benton and
Cashin (2012) conclude that higher-level courses tend to receive higher evaluation
ratings in comparison to lower-level courses. Similarly, Ewing (2012) also finds that graduate courses tend to receive better evaluations than undergraduate courses. Such factors can
be so strong that they mitigate or exacerbate the CEI-grade relationship. For example,
Hoefer, Yurkiewicz, and Byrne (2012) extend the discussion and find the correlation
between grade and CEI score is stronger for courses that are for undergraduates and
TEACHING IN HIGHER EDUCATION
13
those in some specific disciplines, such as management and marketing. Their results also
indicate that the CEI scores vary considerably across disciplines. Suggested by their study,
the highest SET scores are received in arts and humanities, followed by biological and
social sciences, business, computer science, math, engineering and physical science
(Matos-Díaz and Ragan 2010; Brockx, Spooren, and Mortelmans 2011). Nowell (2007)
finds that courses will receive higher CEI scores if students exert more effort in the
course or the class meets at least two times per week. Such variation creates endogeneity
issues when CEI scores are used to assess the instructor’s performance. because courses
with different characteristics may not be truly comparable. Driscoll and Cadden (2010)
suggest that, given that CEI scores vary significantly across courses, instructors should
be evaluated within their respective departments by a department average rather than
by an overall university measure.
3.2.3. Instructor characteristics
In addition, literature has documented that full-time faculty members generally receive
higher scores than part-time faculty (Nowell 2007; Driscoll and Cadden 2010). Ewing
(2012) further documents that pre-tenure professors tend to receive lower evaluation
scores than tenured professors. An instructor’s age may also have an impact on the CEI
score. Interestingly, this is not in the direction that would be predicted based on an expectation that experience improves teaching. Rather, Brockx, Spooren, and Mortelmans
(2011) find that younger professors tend to receive better evaluations. Driscoll and
Cadden’s (2010) literature review reports that other studies have found perceptions of
an instructor’s personality and/or enthusiasm to be strong factors in course evaluation
instruments (Clayson and Sheffet 2006; Clayson 2009; Driscoll and Cadden 2010).
Again, some factors have been found to strengthen the CEI-grade relationship, with
Hoefer, Yurkiewicz, and Byrne (2012) finding the correlation between grade and CEI
score to be stronger for courses that taught by female faculty.
4. The caveats of CEI score as a measure of teaching quality
To examine the relationship between grades and CEI scores, prior literature builds
different empirical models and uses various proxies for grades and CEI scores.
Beyond the CEI-grade relationship documented by prior literature (see 3.1 and 3.2 for
detailed review), there are a number of caveats which concern the validity of CEI as a
measure of teaching quality. In this section, we will discuss the possible biases introduced
by CEI: (1) relative performance and peer effect, (2) selection biases, and (3) grade
inflation.
4.1. Relative performance and peer effect
While most of the variables included in these studies capture an individual’s absolute
grade or CEI score, the relative student standing is also shown to have a significant
impact on the student’s decision making regarding CEI scores. Economists and sociologists have found that individuals’ satisfaction depends not only on their own performance but also on their circumstances relative to a reference group (Becker 1974).
Therefore, it is possible that although students’ satisfaction with a course – as captured
14
G. WANG AND A. WILLIAMSON
by CEI scores – may be influenced by individual performance, it may also be influenced by
their relative performance among their peers. Knowing the impact of peer effect is important, as suggested by Nowell (2007):
If students reward teachers for high relative grades as opposed to simply high absolute grades,
there may be limits to an instructor’s ability to ‘purchase’ better teaching evaluations by
increasing the grades of all students. Conversely, if individual students reward teachers for
their own high grades as well as the high grades of their peers, it becomes expensive to
give low grades to anyone in the class and increases the incentive to ‘buy’ higher SET
ratings. (p. 44)
Stumpf and Freedman (1979) provide early evidence of the relationship between grades
and student ratings at both the individual and class levels. Their results suggest that both
the individual’s expected grade and the instructor’s overall expected grading policy contribute to the grade–rating relationship, and that the latter tends to have a stronger
impact. As an extension of Stumpf and Freedman (1979), several studies further
explore the relationship between relative performance and CEI score. Common measures
for relative performance include: (1) the difference between the expected grade for the
current course and the students’ historical GPA, (2) the average grade earned by all students who take the same course, (3) the expected grades in other classes in which the
student is enrolled, and (4) the distribution of expected grades.
Isely and Singh (2005) measure peer performance with two variables: expected grades
in other classes taught by the same instructor and the gap between the expected grade in
the current course and the students’ cumulative GPA. Their findings indicate that if an
instructor has other classes in which students expect higher grades, then the average
CEI score tends to be higher.
Analogous to the findings in Isely and Singh (2005), Nowell (2007) adopts three
measurements for peer performance: the difference between the expected grade for the
current course and the student’s historical GPA, the average grade earned by all students
who take the same course, and the expected grades in other classes in which the student is
enrolled. The study reveals that the grade students care most about has a considerable
impact on the CEI score. If the students use their own grades as benchmark, then the
grade-rating relationship is stronger. In contrast, if students use their peers’ grades as
benchmark, then the grade–rating relationship is weakened.
Matos-Díaz and Ragan (2010) explore the impact of the expected grade on the CEI
score from another perspective. They draw inferences from economics theories about
risk and uncertainty and argue that the variance of expected grades signals the teacher’s
reward structure. The narrow distribution of expected grades indicates that the penalty
for lower study time or unfavorable performance (e.g. poor performance on an examination or assignment) is relatively low and is, therefore, more likely to lead to favorable
student ratings. As expected, Matos-Díaz and Ragan (2010) report a negative relationship
between the variance of the expected grade and CEI score, showing that instructors can
strategically obtain favorable ratings by narrowing the grade distribution. This finding
also weakens the argument that students care more about their relative performance in
a class.
Overall, the literature on peer effect suggests that instructors can significantly increase
CEI scores, not only by increasing grades for individual students, but by lowering the
TEACHING IN HIGHER EDUCATION
15
grading standards for the entire class. In this scenario, the incentives and costs of ‘buying’
high CEI scores may be greater than has been suggested by the literature documented in
sections 3.1 and 3.2.
4.2. Self-selection bias
A favorable CEI score may also reflect factors that increase students’ satisfaction, but are
unrelated to teaching quality, such as students’ initial ability, course type, and instructor
grading leniency. To better isolate the link between the CEI score and teaching quality, it
is necessary to introduce objective measures of student characteristics at the individual
level to control for the impact of learning ability on the students’ evaluation of the
instructors. However, due to the anonymous nature of CEI processes, it is challenging
to incorporate individual level variables and self-selection bias may occur. CEI scores
are mostly calculated as course means and only present a subset of students who
choose to fill out the evaluations (Beleche, Fairris, and Marks 2012). This introduces
crucial measurement errors, especially when the pool of students who complete the
CEI differs from the total student population (Clayson 2009; Isely and Singh 2005;
Kherfi 2011).
Second, students who participate in the administration of a CEI cannot fully represent
the total student population. The course evaluation response rate is normally less than 100
percent and it is questionable to simply assume that the students who do not complete the
survey are well represented by the students who do complete it. Assuming a random
sample of students, when the number of students incorporated into CEI scores has
decreased, the effect of individual variations and biases will be stronger (Isely and Singh
2005). Also, average CEI scores will be more statistically influenced by such bias if the
class size is small.
4.3. Grade inflation
CEI processes may exacerbate the problem of grade inflation and can even decrease a professor’s teaching effort (Krautmann and Sander 1999; Love and Kotchen 2010; Butcher,
McEwan, and Weerapana 2014). Love and Kotchen (2010) examine the effects of CEI
use on faculty behavior and showed that excessive institutional emphasis on teaching,
research, or both can exacerbate the problems of grade inflation and result in diminished
faculty teaching effort. To better align instructors’ incentives with the institution’s objectives on teaching and research, the authors suggest that universities should ensure uniform
grade distributions for individual classes and restrain grade inflation.
Nelson and Lynch (1984) find that the evaluation process produces grade inflation,
reaching similar conclusions. They also determine that faculty members’ grading policies
are related to their real incomes because faculty members are more willing to adopt easier
grading policies when the real income from teaching is falling.
Given the pressures on faculty to maintain favorable CEI scores and the impact of
expected grades on instructors’ evaluations, enforcing lower expected grades may inevitably cause adverse consequences on an instructor’s evaluation. Institutions should carefully
evaluate such impacts, especially when CEI scores are used in tenure and promotion
decisions. To ensure fairness across faculty, it would be important to ensure even
16
G. WANG AND A. WILLIAMSON
application of uniform grade distributions across faculty and programs, and to account for
any overall reduction in CEI scores.
5. Discussion & recommendations
Overall, the literature suggests that course grades are positively correlated with CEI scores,
but there is considerably less evidence as to whether that relationship is properly attributed
to the leniency hypothesis or the validity hypothesis. Given the evidence of correlation
between grades and CEI scores and the lack of clear indication that the validity hypothesis
is more accurate, colleges and universities should consider potential actions to mitigate the
potential for various forms of bias in CEIs. We propose the following to continue efforts to
assess this relationship and mitigate its potential impact: (1) ensuring quality design of the
instrument, (2) attention to qualitative items on CEIs, (3) university level internal analyses
to identify (and address) potential biases and validity issues, (4) consideration of a portfolio approach to instructor evaluation, and (5) increased efforts to tease out the nature of
the relationship in future research.
In this section, we aim to provide examples and pratical techniques that schools can use
to improve the objectivity and informativeness of teaching evaluations. Particularly, we
tailor the recommendation section for schools and institutions that are going through a
CEI adoption or revision process.
5.1. Quality instrument design
First and foremost, it is imperative that colleges and universities review CEIs and design or
adopt a quality instrument. While instrument design alone cannot alleviate student biases,
a poorly designed instrument can exacerbate such biases. In particular, we advocate for
clarity in the wording of the items on CEIs and a clear separation of instructor versus
course questions to help avoid the exacerbation of biases. Item clarity is important to
reduce misinterpretation of items. While items should be broad enough to refer to all
types of courses and instructors, clear and directed questions will give the respondent
something specific to reflect upon.
Given that many instructors do not have complete control over course characteristics,
we also advocate for a clear separation of items focused on the instructor versus those
focused on the course. Based on this analysis of prior studies, it is clear that student
course ratings are determined by multiple variables beyond the instructor’s teaching performance, such as course characteristics, course grade, student qualities, and student
biases. Among these factors, course characteristics, which are frequently not under an
individual instructor’s control, have considerable impact on the student’s perception of
the course. This problem is particularly common when multiple sections of the same
course are taught by different instructors while the textbook, course syllabus, exams,
and other materials are all designed by one faculty member or a small group. In such
cases, instructors tend to have limited freedom in choosing course content or structure,
but these factors are still counted into the instructor’s evaluation.
To separate the uncontrollable factors from instructor effectiveness, universities can
design the course evaluation questionnaire to improve item clarity and reduce response
bias. For example, we recommend presenting questions related to ‘Evaluation of
TEACHING IN HIGHER EDUCATION
17
Instructor’ and questions related to ‘Evaluation of Course’ separately to students. In cases
where a faculty member has little to no control over course content and design, the ‘Evaluation of Instructor’ items provide a more objective valuation on the instructor’s teaching
quality for hiring, tenure and promotion purposes. The ‘Evaluation of Course’ provides
insights on both course-level pedagogy and program-level curriculum, and can be used
by faculty members to improve and enhance their teaching skills. Below is a sampled
CEI from a business school located in Boston, MA.
Evaluation of Instructor
The instructor was well prepared and organized for class.
The instructor communicated information effectively.
The instructor promoted useful classroom discussions, as appropriate for the course.
The instructor demonstrated the importance of the subject matter.
The instructor provided timely and useful feedback.
The instructor was responsive to students outside the classroom.
Overall rating of this instructor.
Evaluation of Course
The syllabus clearly described the goals, content, and requirements of the course.
The course materials, assigned text(s), and/or other resources helped me understand concepts and ideas related to the
course.
The workload for this course (reading, assignments, papers, homework, etc.) was manageable given the subject matter
and course level.
Assignments (exams, quizzes, papers, etc.) adequately reflected course concepts.
Overall rating of this course.
5.2. Attention to qualitative items on CEIs
One disadvantage of quantitative CEI (scaled) questions is that the questions are specifically pre-designed, and the dimensions covered might be somewhat narrow. Qualitative
evaluation questions provide students opportunities to provide in-depth feedback on
broader dimensions, resulting in an extensive examination of the student experience
(Steyn, Davies, and Sambo 2019). Consistent with this argument, Sherry, Fulford, and
Zhang (1998) examine the accuracy, utility, and feasibility of both quantitative and qualitative evaluation approaches and find that both approaches efficiently capture the aspects
of the instructional climate. Grebennikov and Shah (2013) also focus on the use of qualitative evaluation feedback from students and find efficient use of student qualitative feedback and timely response to it helps increase student satisfaction and retention.
5.3. University-level analyses
The complexity of the literature, variety of findings, and heterogeneity of CEIs themselves
suggest that colleges and universities may wish to examine these questions internally to
evaluate the validity of their own instruments as a measure of teaching effectiveness,
assess the impact of grading policies, and identify potential biases.
While the literature provides mixed findings on the validity/leniency approach, universities often have thousands of data points they could use to conduct internal analyses of
CEI scores. With the variation in CEIs, grading scales, and other confounding factors
across universities, internal analyses could provide clearer evidence of the state of the
student grade- CEI score relationship as it exists in a particular university. In particular,
analyzing the grade-CEI score relationship for specific faculty members’ courses over
18
G. WANG AND A. WILLIAMSON
time would control for teaching quality to some extent, especially if there is sufficient data
to analyze by specific course or type of course given that teaching quality could easily vary
based on a faculty member’s expertise in a particular course topic.
As an extension of this, universities could also consider adoption of a relative performance approach to mitigate the effects of teaching courses or disciplines that typically result
in lower course averages, such as more quantitatively focused courses or particularly challenging first year courses. A relative performance approach that compares the ratings of
the instructors with others who teach the same or similar courses, or at least within the
same discipline, can help reduce student grade effects on CEI scores.
5.4. Consideration of a portfolio approach to expand the measures of teaching
quality
The prior recommendations focus on the CEI itself and ways it can be designed or analyzed to mitigate the potential for student grades to drive CEI scores inappropriately.
The extent to which the instruments themselves, both quantitative and qualitative
items, can do this, however is limited. Thus, we also recommend that schools consider
a portfolio approach to expand measures of teaching quality, particularly in cases where
the internal analyses recommended above suggest significant student biases or the
ability for instructors to effectively ‘buy’ grades.
A portfolio approach is based on a combination of measures, such as student evaluations, peer evaluations, chair evaluations, and self-evaluations. Portfolio approaches are
well discussed in the current literature (Mullens et al. 1999; Laverie 2002; Berk 2005;
Chism and Banta 2007) and the details of such an approach are beyond the scope of
this article, so we will limit our discussion. As examples, Berk (2005) discusses some
potential sources of evidence of teaching effectiveness including student ratings, peer
ratings, self-evaluation, student interviews, alumni ratings, teaching scholarship, learnings
outcome measures, etc. While portfolio approaches cannot alleviate any student grade
biases in CEI scores, they allow for alternative measures of teaching effectiveness to
provide a more holistic evaluation.
While some of this information is more difficult to collect than others, these different
sources of information focus on different aspects of teaching effectiveness. The instructor’s
self-evaluation may provide informal evidence of teaching performance. Information provided by the department chair or course coordinator may highlight the instructor’s compliance with the internal policies and procedures. Colleagues who have expertise in the
discipline can provide important feedback through classroom visits or course material
reviews. Schools and departments can randomly select courses and solicit evaluations
from external professors in the same field. This approach allows the school and department to evaluate the teacher’s teaching skills from an educator’s point of view in addition
to the recipients’ (students’) perspectives, but there are recognizably more resource allocation costs involved with such an approach.
It is also important to recognize and balance the benefits and caveats for different types of
peer review. For instance, internal reviewers have a good understanding of schools’ institutional backgrounds, but may feel social pressure to overpraise the reviewees or understate
the concerns. Institutions need to balance the benefits and costs associated with these
different approaches and such debate might be further explored the future research.
TEACHING IN HIGHER EDUCATION
19
5.5. Further research
Finally, give the literature’s mixed findings and continued debate over the relationship
between student grades and CEI scores, most notably whether or not such relationships
are causal, there is a strong need for continued research in this area, specifically targeted
to teasing out the nature of the relationship. As the discussion above demonstrates, it is not
sufficient to argue that there is a relationship between student grades and CEI scores, if the
argument can also be made that effective teaching leads to increased student grades. What
we really want to know is the extent to which student grades or expected grades bias
student evaluations of instructors.
Given that universities across the nation are already collecting troves of CEI data, the
real need is for strong methodologists to design studies that can better determine or refute
claims to causality. This is not to suggest that there are not challenges involved. Anonymity is a critical nature of CEIs, so disaggregating data to the individual student level is problematic, but with the increase in online CEI distribution, it may be increasingly possible
to do so. On a related note, further refinement of effective quantitative measurements and
analyses of CEI scores is advised. While the challenges of finding adequate proxies for
student learning is clear, additional efforts on this front are worthwhile, as the critical
nature of CEIs in higher education should not be underestimated.
Our study also suggests the need for more qualitative research in this area, as most prior
research in this stream of literature uses quantitative research designs such as correlation
tests or multivariate regression. Qualitative research, such as focus group interviews or
quasi-experiments, will provide valuable insights on how these course evaluation questions are truly perceived by students. Future studies can also investigate students’ judgments and decision making with regard to their responses to the quantitative CEI
questions. Such a study would help CEI designers to better align CEI questions with students’ perceptions of their own learning.
6. Conclusions and remarks
As described at length above, prior research has provided ample evidence on the relationship between CEI scores and various instructor and course factors, including grades and
many other characteristics. However, most previous literature either focuses on a single
study, or takes a broad look at the extensive factors that play a role, without fully unpacking particular relationships. In addition, there hasn’t been any conceptual framework that
synthesizes these existing theories and research findings. This article seeks to fill that gap,
focusing on the relationship between student grades and CEI scores, to synthesize the
findings to date, assess the leniency and validity hypotheses, identify closely related
factors, discuss potential biases, and make practical recommendations for schools and
universities.
Overall, the literature suggests that course grades are positively correlated with CEI
scores, but there is considerably less evidence as to whether that relationship is properly
attributed to the leniency hypothesis or the validity hypothesis. In this paper, we survey
28 prior studies and discuss the impact of course grades on course evaluation scores.
We specifically explore the leniency hypothesis, which posits that students give higher
CEI scores to instructors from whom they receive higher grades, and the validity
20
G. WANG AND A. WILLIAMSON
hypothesis, which posits that instructors who teach more effectively receive better evaluation scores because their students learn more and therefore earn higher grades. Our
review reveals that existing research focuses more on the extent of the relationship than
the nature of that relationship. The empiricial studies that do assess this, however, tend
to be more consistent with the leniency hypothesis.
One of the major implications of these findings is that colleges and universities should
be thoughtful about their reliance on CEI scores in the broader faculty evaluation process
and consider a variety of approaches to meet their needs. To address these serious limitations on CEI and provide a more objective evaluation of the instructor’s teaching quality,
we propose five recommendations: quality design of the instrument, attention to qualitative items, university level internal analyses, a portfolio approach to instructor evaluation,
and increased efforts to tease out the nature of the relationship in future research.
In addition, as shown in Figure 1, this study proposes a conceptual framework that
illustrates the relationships between actual teaching quality and CEI scores, and suggests
where confounding factors may play a role. While we are trying to focus on one specific
relationship between CEI scores and grades, we believe that a broad overview of the evaluation-teaching quality relationships is informative to the readers of this study. The
poposed framework lays the groundwork for future research regarding the potential confounding factors and omitted variables which may significantly deteriorate the informativeness of the CEI score.
Note
1. The keywords include teaching evaluation, course evaluation, student evaluation, student
feedback, student perception, and student rating.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
Arnold, I. J. M. 2009. “Do Examinations Influence Student Evaluations.” International Journal of
Educational Research 48 (4): 215–224.
Barth, M. M. 2008. “Deciphering Student Evaluations of Teaching: A Factor Analysis Approach.”
Journal of Education for Business 84 (1): 40–46.
Bausell, R. B., and J. Magoon. 1972. “Expected Grade in a Course, Grade Point Average, and Student
Ratings of the Course and the Instructor.” Educational and Psychological Measurement 32 (4):
1013–1023.
Becker, G. S. 1974. “A Theory of Social Interactions.” Journal of Political Economy 82 (6): 1063–
1093.
Beleche, T., D. Fairris, and M. Marks. 2012. “Do Course Evaluations Truly Reflect Student
Learning? Evidence From an Objectively Graded Post-Test.” Economics of Education Review
31 (5): 709–719.
Benton, S. 2011. “Using Student Course Evaluations to Design Faculty Development Workshops.”
Academy of Educational Leadership Journal 15 (2): 41–53.
Benton, S., and W. E. Cashin. 2012. Student Ratings of Teaching: A Summary of Research and
Literature. IDEA Paper No. 50.
TEACHING IN HIGHER EDUCATION
21
Berk, R. A. 2005. “Survey of 12 Strategies to Measure Teaching Effectiveness.” International Journal
of Teaching and Learning in Higher Education 17 (1): 48–62.
Braga, M., M. Paccagnella, and M. Pellizzari. 2014. “Evaluating Students’ Evaluations of Professors.”
Economics of Education Review 41 (August): 71–88.
Brockx, B., P. Spooren, and D. Mortelmans. 2011. “Taking the Grading Leniency Story to the Edge.
The Influence of Student, Teacher, and Course Characteristics on Student Evaluations of
Teaching in Higher Education.” Educational Assessment, Evaluation and Accountability 23
(4): 289–306.
Butcher, K. F., P. J. McEwan, and A. Weerapana. 2014. “The Effects of an Anti-Grade Inflation
Policy at Wellesley College.” Journal of Economic Perspectives 28 (3): 189–204.
Catano, V., and S. Harvey. 2011. Student Perception of Teaching Effectiveness: Development and
Validation of the Evaluation of Teaching Competencies Scale (ETCS). Halifax, Nova Scotia,
QC, Canada: Routeledge.
Centra, J. A. 2003. “Will Teachers Receive Higher Student Evaluations by Giving Higher Grades
and Less Course Work?” Research in Higher Education 44 (5): 495–518.
Chism, N. V. N., and T. W. Banta. 2007. “Enhancing Institutional Assessment Efforts Through
Qualitative Methods.” New Directions for Institutional Research 136 (winter): 15–28.
Chulkov, D. V., and J. V. Alstine. 2011. “Challenges in Designing Student Teaching Evaluations in a
Business.” International Journal of Educational Management 26 (2): 162–174.
Clayson, D. E. 2009. “Student Evaluations of Teaching: Are They Related to What Students Learn: A
Meta-Analysis and Review of the Literature.” Journal of Marketing Education 31 (1): 16–30.
Clayson, D. E., and M. J. Sheffet. 2006. “Personality and the Student Evaluation of Teaching.”
Journal of Marketing Education 28 (2): 149–160.
Dodeen, H. 2013. “Validity, Reliability, and Potential Bias of Short Forms of Students’ Evaluation of
Teaching: The Case of UAE University.” Educational Assessment 18 (4): 235–250.
Driscoll, J., and D. Cadden. 2010. “Student Evaluation Instruments: The Interactive Impact of
Course Requirements, Student Level, Department and Anticipated Grade.” American Journal
of Business Education 3 (5): 21–30.
Ewing, A. M. 2012. “Estimating the Impact of Relative Expected Grade on Student Evaluations of
Teachers.” Economics of Education Review 31: 141–154.
Feldman, K. A. 1978. “Course Characteristics and Variability Among College Students in
Rating Their Teachers and Courses: A Review and Analysis.” Research in Higher Education 9:
199–242.
Gorry, D. 2017. “The Impact of Grade Ceilings on Student Grades and Course Evaluations:
Evidence from a Policy Change.” Economics of Education Review 56 (February): 133–140.
Grebennikov, L., and M. Shah. 2013. “Student Voice: Using Qualitative Feedback from Students to
Enhance Their University Experience.” Teaching in Higher Education 18 (6): 606–618.
Greenwald, A. G., and G. M. Gillmore. 1997a. “Grading Leniency is a Removable Contaminant of
Student Ratings.” American Psychologist 52 (11): 1209–1217.
Greenwald, A. G., and G. M. Gillmore. 1997b. “No Pain, no Gain? The Importance of Measuring
Course Workload in Student Ratings of Instruction.” Journal of Educational Psychology 89 (4):
743–751.
Handelsman, M. M., W. L. Briggs, N. Sullivan, and A. Towler. 2005. “A Measure of College Student
Course Engagement.” The Journal of Educational Research 98 (3): 184–192.
Hoefer, P., J. Yurkiewicz, and J. C. Byrne. 2012. “The Association between Students’ Evaluation of
Teaching and Grades.” Decision Sciences Journal of Innovative Education 10 (3): 447–459.
Isely, P., and H. Singh. 2005. “Do Higher Grades Lead to Favorable Student Evaluations?.” The
Journal of Economic Education 36 (1): 29–42.
Kherfi, S. 2011. “Whose Opinion is it Anyway? Determinants of Participation in Student Evaluation
of Teaching.” The Journal of Economic Education 42 (2): 19–30.
Krautmann, A. C., and W. Sander. 1999. “Grades and Student Evaluations of Teachers.” Economics
of Education Review 18 (1): 59–63.
22
G. WANG AND A. WILLIAMSON
Kulik, J. A. 2001. “Student Ratings: Validity, Utility, and Controversy.” In The Student Ratings
Debate: Are They Valid? how can we Best use Them? Vol. 2001., edited by Michael Theall,
Philip C. Abrami, and Lisa A. Mets, 9–25.
Laverie, D. A. 2002. “Improving Teaching Through Improving Evaluation: A Guide to Course
Portfolios.” Journal of Marketing Education 24 (2): 104–113.
Love, D. A., and M. J. Kotchen. 2010. “Grades, Course Evaluations, and Academic Incentives.”
Eastern Economic Journal 36 (2): 151–163.
Marsh, H. W. 1987. “Students’ Evaluations of University Teaching: Research Findings,
Methodological Issues, and Directions for Future Research.” International Journal of
Educational Research 11 (3): 253–388.
Marsh, H. W., and L. A. Roche. 2000. “Effects of Grading Leniency and Low Workload on Students’
Evaluations of Teaching: Popular Myth, Bias, Validity, or Innocent Bystanders?” Journal of
Educational Psychology 92 (1): 202–228.
Matos-Díaz, H., and J. R. Ragan Jr. 2010. “Do Student Evaluations of Teaching Depend on the
Distribution of Expected Grade?” Education Economics 18 (3): 317–330.
McPherson, M. A. 2006. “Determinants of how Students Evaluate Teachers.” The Journal of
Economic Education 37 (1): 3–20.
Millea, M., and P. W. Grimes. 2002. “Grade Expectations and Student Evaluation of Teaching.”
College Student Journal 36 (4): 582–590.
Mullens, J., M. S. Leighton, K. G. Laguarda, and E. O’Brian. 1999. Student Learning, Teaching
Quality, and Professional Development: Theoretical Linkages, Current Measurement, and
Recommendations for Future Data Collection. Working paper.
Nelson, J. P., and K. Lynch. 1984. “Grade Inflation, Real Income, Simultaneity, and Teaching
Evaluations.” The Journal of Economic Education 15 (1): 21–37.
Nowell, C. 2007. “The Impact of Relative Grade Expectations on Student Evaluation of Teaching.”
International Review of Economics Education 6 (2): 42–56.
Paswan, A. K., and J. A. Young. 2002. “Student Evaluation of Instructor: A Nomological
Investigation Using Structural Equation Modeling.” Journal of Marketing Education 24 (3):
193–202.
Remedios, R., and D. A. Lieberman. 2008. “I Liked Your Course Because you Taught me Well: The
Influence of Grades, Workload, Expectations and Goals on Students’ Evaluations of Teaching.”
British Educational Research Journal 34 (1): 91–115.
Sherry, A. C., C. Fulford, and S. Zhang. 1998. “Assessing Distance Learners’ Satisfaction with
Instruction: A Quantitative and a Qualitative Measure.” American Journal of Distance
Education 12 (3): 4–28.
Steyn, C., D. Davies, and A. Sambo. 2019. “Eliciting Student Feedback for Course Development: the
Application of a Qualitative Course Evaluation Tool among Business Research Students.”
Assessment and Evaluation in Higher Education 44 (1): 11–24.
Stumpf, S. A., and R. D. Freedman. 1979. “Expected Grade Covariation with Student Ratings of
Instruction: Individual Versus Class Effects.” Journal of Educational Psychology 71 (3): 293–302.
Uttl, B., C. A. White, and D. W. Gonzales. 2017. “Meta-analysis of Faculty’s Teaching Effectiveness:
Student Evaluation of Teaching Ratings and Student Learning are not Related.” Studies in
Educational Evaluation 54 (1): 22–42.
VanMaaren, V. G., C. M. Jaquett, and R. L. Williams. 2016. “Factors Most Likely to Contribute to
Positive Course Evaluations.” Innovative Higher Education 41 (5): 425–440.
Weinberg, B. A., M. Hashimoto, and B. M. Fleisher. 2009. “Evaluating Teaching in Higher
Education.” Journal of Economic Education 40 (3): 227–261.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/259503622
Appropriate and inappropriate uses of students' assessment of instruction
Article · January 2013
CITATIONS
READS
0
537
1 author:
David M. McCord
Western Carolina University
78 PUBLICATIONS 871 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Thesis Pilot Study View project
All content following this page was uploaded by David M. McCord on 15 March 2014.
The user has requested enhancement of the downloaded file.
View publication stats
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232718162
Top five flashpoints in the assessment of teaching effectiveness
Article in Medical Teacher · October 2012
DOI: 10.3109/0142159X.2012.732247 · Source: PubMed
CITATIONS
READS
39
524
1 author:
Ronald Alan Berk
Johns Hopkins University
76 PUBLICATIONS 3,502 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Microaggressions in the academic workplace and classroom View project
All content following this page was uploaded by Ronald Alan Berk on 21 August 2016.
The user has requested enhancement of the downloaded file.
2013; 35: 15–26
Top five flashpoints in the assessment of
teaching effectiveness
RONALD A. BERK
Johns Hopkins University, USA
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Abstract
Background: Despite thousands of publications over the past 90 years on the assessment of teaching effectiveness, there is still
confusion, misunderstanding, and hand-to-hand combat on several topics that seem to pop up over and over again on listservs,
blogs, articles, books, and medical education/teaching conference programs. If you are measuring teaching performance in
face-to-face, blended/hybrid, or online courses, then you are probably struggling with one or more of these topics or flashpoints.
Aim: To decrease the popping and struggling by providing a state-of-the-art update of research and practices and a ‘‘consumer’s
guide to trouble-shooting these flashpoints.’’
Methods: Five flashpoints are defined, the salient issues and research described, and, finally, specific, concrete recommendations
for moving forward are proffered. Those flashpoints are: (1) student ratings vs. multiple sources of evidence; (2) sources
of evidence vs. decisions: which come first?’ (3) quality of ‘‘home-grown’’ rating scales vs. commercially-developed scales;
(4) paper-and-pencil vs. online scale administration; and (5) standardized vs. unstandardized online scale administrations. The first
three relate to the sources of evidence chosen and the last two pertain to online administration issues.
Results: Many medical schools/colleges and higher education in general fall far short of their potential and the available
technology to comprehensively assess teaching effectiveness. Specific recommendations were given to improve the quality and
variety of the sources of evidence used for formative and summative decisions and their administration procedures.
Conclusions: Multiple sources of evidence collected through online administration, when possible, can furnish a solid foundation
from which to infer teaching effectiveness and contribute to fair and equitable decisions about faculty contract renewal, merit pay,
and promotion and tenure.
Introduction
FLASHPOINT: a critical stage in a process, trouble
spot, discordant topic, or lowest temperature at
which a flammable liquid will give off enough
vapor to ignite.
If you have read any of my previous articles, you know I have
given off buckets of vapor. For you language scholars,
‘‘flashpoint’’ is derived from two Latin words, ‘‘flashus,’’
meaning ‘‘your shorts,’’ and ‘‘pointum,’’ meaning, ‘‘are on fire.’’
Why flashpoints?
This article is not another review of the research on student
ratings. It is a state-of-the-art update of research and practices,
primarily since 2006 (Berk 2006; Seldin & Associates 2006;
Arreola 2007), with specific TARGETS: the flashpoints that have
emerged, which are critical issues, conflicts, contentious
problems, and volatile hot buttons in the assessment of
teaching effectiveness. They are the most prickly, thorny,
vexing, and knotty topics that every medical school/college
and institution in higher education must confront.
These flashpoints cause confusion, misunderstanding, dissension, hand-to-hand combat, and, ultimately, inaccurate and
Practice points
. Polish your student rating scale, but start building
multiple sources of evidence to assess teaching
effectiveness.
. Match your highest quality sources to the specific
formative and summative decisions using the 360 MSF
model.
. Review current measures of teaching effectiveness with
your faculty and plan specifically how you can improve
their psychometric quality.
. Design an online administration system in-house or outhouse with a vendor to conduct the administration and
score reporting.
. Standardize directions, administration procedures, and a
narrow window for completion of your student rating
scale and other measures of teaching effectiveness.
unfair decisions about faculty. Although there are many more
than five in this percolating cauldron of controversy, the ones
tackled here seem to pop up over and over again on listservs,
blogs, articles, books, and medical education/teaching conference programs, plus they generate a firestorm of debate by
Correspondence: R.A. Berk, Johns Hopkins University, 10971 Swansfield Road, Columbia, MD 21044, USA. Tel: þ1 410 9407118; fax: þ1 206
3091618; email: rberk1@jhu.edu
ISSN 0142–159X print/ISSN 1466–187X online/13/010015–12 ß 2013 Informa UK Ltd.
DOI: 10.3109/0142159X.2012.732247
15
R. A. Berk
faculty and administrators more than others. This contribution
is an attempt to decrease some of that percolating and
popping.
Trouble-shooting flashpoints
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
If you are currently using any instrument to measure teaching
performance in face-to-face, blended/hybrid, or online
courses, then you are probably struggling with one or more
flashpoints. This article is a ‘‘consumer’s guide to troubleshooting these flashpoints.’’ The motto of this article is: ‘‘Get to
the flashpoint and the solution.’’
This is the inauguration of my new PBW series on problembased writing. Your problems are the foci of my writing. The
structure of each section will be governed by the PBW
perspective:
(1)
(2)
(3)
Definition: Each flashpoint will be succinctly defined.
Options: The options available based on research and
practice will be described.
Recommended Solution: Specific, concrete recommendations for faculty and administrators will be proffered
to move them to action.
There does not seem to be any short-cut, quick fix, or multilevel marketing scheme to improve the quality of teaching.
Tackling these flashpoints head-on will hopefully be one
positive step toward that improvement.
The top five flashpoints are: (1) student ratings vs. multiple
sources of evidence; (2) sources of evidence vs. decisions:
which come first?; (3) quality of ‘‘home-grown’’ rating scales
vs. commercially-developed scales; (4) paper-and-pencil vs.
online scale administration; and (5) standardized vs. unstandardized online scale administration. The first three relate to
critical decisions about the sources of evidence chosen and the
last two pertain to online scale administration issues.
Top five flashpoints
Student ratings vs. multiple sources of evidence
FLASHPOINT 1: Student rating scales have dominated as the primary or, usually, the only measure of
teaching effectiveness in medical schools/colleges
and universities worldwide and in a few remote
planets. This state of practice is contrary to the advice
of a cadre of experts and the limitations of student
input to comprehensively evaluate teaching effectiveness. Several other measures should be used in
conjunction with student ratings.
Student ratings. Historically, student rating scales have been
the primary measure of teaching effectiveness for the past 50
years. Students have had a critical role in the teaching–learning
feedback system. The input from their ratings in summative
decision making has been recommended on an international
level (Strategy Group 2011; Surgenor 2011).
There are nearly 2000 references on the topic (Benton &
Cashin 2012) with the first journal article published 90 years
ago (Freyd 1923). There is more research and experience in
16
higher education with student ratings than with all of the other
measures of teaching effectiveness combined (Berk 2005,
2006). If you need to be brought up to speed quickly with the
research on student ratings, check out these up-to-date
reviews (Gravestock & Gregor-Greenleaf 2008; Benton &
Cashin 2012; Kite 2012).
Unfortunately, in medical/healthcare education, student
ratings have not received the same level of research attention.
There is only a sprinkling of studies over the last 20 years
(e.g., Hoeks & van Rossum 1988; Jones & Froom 1994; Mazor
et al. 1999; Elzubeir & Rizk 2002; Barnett et al. 2003; Kidd &
Latif 2004; Pierre et al. 2004; Turhan et al. 2005; Maker et al.
2006; Ahmady et al. 2009; Barnett & Matthews 2009; Berk
2009a; Chenot et al. 2009; Donnon et al. 2010; Boerboom et al.
2012; Stalmeijer et al. 2010). There is far more research on peer
observation (e.g., Berk et al. 2004; Siddiqui et al. 2007; Wellein
et al. 2009; DiVall et al. 2012; Pattison et al. 2012; Sullivan et al.
2012). There are also a few qualitative studies that are
peripherally related (Stark 2003; Steinert 2004; Martens et al.
2009; Schiekirka et al. 2012).
With this volume of scholarly productivity and practice in
academia worldwide, student ratings seem like the solution to
assessing teaching effectiveness in medical/healthcare education and higher education in general. So, what is the problem?
Limitations of student ratings. As informative as student
ratings can be about teaching, there are numerous behaviors
and skills defining teaching effectiveness which students are
NOT qualified to rate, such as a professor’s knowledge and
content expertise, teaching methods, use of technology,
course materials, assessment instruments, and grading practices (Cohen & McKeachie 1980; Calderon et al. 1996;
d’Apollonia & Abrami 1997a; Ali & Sell 1998; Green et al.
1998; Hoyt & Pallett 1999; Coren 2001; Ory & Ryan 2001;
Theall & Franklin 2001; Marsh 2007; Svinicki & McKeachie
2011). Students can provide feedback at a certain level in each
of those areas, but it will take peers and other qualified
professionals to rate those skills in depth. BOTTOM LINE:
Student ratings from well-constructed scales are a necessary,
but not sufficient, source of evidence to comprehensively assess
teaching effectiveness.
Student ratings provide only one portion of the information
needed to infer teaching effectiveness. Yet, that is pretty much
all that is available at most institutions. When those ratings
alone are used for decision making, they will be incomplete
and biased. Without additional evidence of teaching effectiveness, student ratings can lead to incorrect and unfair career
decisions about faculty that can affect their contract renewal,
annual salary increase, and promotion and tenure.
It is the process of evaluation or assessment that permits
several sources of appropriate evidence to be collected
for the purpose of decision making. Assessment is a
‘‘systematic method of obtaining information from [scales]
and other sources, used to draw inferences about characteristics of people, objects, or programs,’’ according to
the US Standards for Educational and Psychological
Testing (AERA, APA, & NCME Joint Committee on
Standards 1999, p. 272). Student ratings represent one
measure and just one source of information in that process.
Flashpoints in teaching effectiveness
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Multiple sources of evidence. Over the past decade, there has
been a trend toward augmenting student ratings with other
data sources of teaching effectiveness. Such sources can serve
to broaden and deepen the evidence base used to assess
courses and the quality of teaching (Theall & Franklin 1990;
Braskamp & Ory 1994; Hoyt & Pallett 1999; Knapper &
Cranton 2001; Ory 2001; Cashin 2003; Berk 2005, 2006; Seldin
2006; Arreola 2007; Theall & Feldman 2007; Gravestock &
Gregor-Greenleaf 2008; Benton & Cashin 2012). In fact, several
comprehensive models of ‘‘faculty evaluation’’ have been
proposed (Centra 1993; Braskamp & Ory 1994; Berk 2006,
2009a; Arreola 2007; Gravestock & Gregor-Greenleaf 2008),
which include multiple sources of evidence with some models
attaching greater weight to student and peer ratings and less
weight to self-, administrator, and alumni ratings, and other
sources. All of these models are used to arrive at formative and
summative decisions.
15 Sources. There are 15 potential sources of evidence of
teaching effectiveness: (1) student ratings; (2) peer observations; (3) peer review of course materials; (4) external expert
ratings; (5) self-ratings; (6) videos; (7) student interviews;
(8) exit and alumni ratings; (9) employer ratings; (10) mentor’s
advice; (11) administrator ratings; (12) teaching scholarship;
(13) teaching awards; (14) learning outcome measures; and
(15) teaching (course) portfolio. Berk (2006) described several
major characteristics of each source, including type of measure
needed to gather the evidence, the person(s) responsible for
providing the evidence (students, peers, external experts,
mentors, instructors, or administrators), the person or committee who uses the evidence, and the decision(s) typically
rendered based on that data (formative, summative, or
program). He also critically examined the value and contribution of these sources for teaching effectiveness based on the
current state of research and practice. His latest recommendations will be presented in Flashpoint 2.
Triangulation. Much has been written about the merits and
shortcomings of these various sources of evidence (Berk 2005,
2006). Put simply: There is no perfect source or combination of
sources. Each source can supply unique information, but also
is fallible, usually in a way different from the other sources. For
example, the unreliability and biases of peer ratings are not the
same as those of student ratings; student ratings have other
weaknesses. By drawing on three or more different sources of
evidence, you can leverage the strengths of each source to
compensate for weaknesses of the other sources, thereby
converging on a decision about teaching effectiveness that is
more accurate and reliable than one based on any single
source (Appling et al. 2001). This notion of triangulation is
derived from a compensatory model of decision making.
Given the complexity of measuring the act of teaching in a
real-time classroom environment or online course, it is
reasonable to expect that multiple sources can provide a
more accurate, reliable, and comprehensive picture of teaching effectiveness than just one source. However, the decision
maker should integrate the information from only those
sources for which validity evidence is available (see Standard
14.13). The quality of the sources chosen should be beyond
reproach, according to the Standards (AERA, APA, & NCME
Joint Committee on Standards 1999).
Since there is not enough experience with multiple sources,
there is a scarcity of empirical evidence to support the use of
any particular combination of sources (e.g., Barnett et al. 2003;
Stalmeijer et al. 2010; Stehle et al. 2012). There are a few
surveys of the frequency of use of individual sources (Seldin
1999; Barnett & Matthews 2009). Research is needed on
various combinations of measures for different decisions to
determine ‘‘best practices.’’
Recommendations. All experts on faculty evaluation recommend multiple sources of evidence to assess teaching effectiveness. Beyond student ratings, is it worth the extra effort,
time, and cost to develop the additional measures suggested in
this section? Just what new information do you have to gain?
As those instruments are being built, it should become clear
that they are intended to measure different teaching behaviors
that contribute to teaching effectiveness. Each measure should
bite off a separate chunk of behaviors. They should be
designed to be complementary, not redundant, although there
may be justification for some overlap for corroboration.
There is even research evidence on the relationships
between student ratings and several other measures to support
their complementarity. Benton and Cashin’s (2012) research
review reported the following validity coefficients with student
ratings: trained observers (0.50 with global ratings), self (0.30–
0.45), alumni (0.54–0.80), and administrators (0.47–0.62; 0.39
with global ratings). Since 0.50 is only 25% explained variance
and 75% unexplained or new information, these coefficients
suggest a lot of insight can be gained using observers’, self, and
administrators’ ratings as sources of evidence.
Sources of evidence vs. decisions: Which come
first?
FLASHPOINT 2: Rating scales are typically administered and then confusion occurs over what to do
with the results and how to interpret them for specific
decisions. A better strategy would be to do exactly the
opposite of that practice. Spin your head around
180 , exorcist style. The decision should drive the
selection of the appropriate sources of evidence, the
types of data needed for the decision, and the design
of the report form. Custom tailor the sources, data,
and form to fit the decision. The information and
format of the evidence a professor needs to improve his
or her teaching are very different from that required
by a department chair or associate dean for annual
review (contract renewal or merit pay) or by a faculty
committee for promotion and tenure review. The
sources of evidence and formats of the reports can
either hinder or facilitate the decision process.
Types of decisions. According to Seldin (1999), teaching is
the major criterion (98%) in assessing overall faculty performance in liberal arts colleges compared to student advising
(64%), committee work (59%), research (41%), publications
(31%), and public service (24%). Although these figures may
17
R. A. Berk
not hold up in research universities and, specifically, in
medical schools/colleges, teaching didactic, and/or clinical
courses is still a critical job requirement and criterion on which
most faculty members are assessed.
There are two types of individual decisions in faculty
assessment with which you may already be familiar in the
context of student assessment, plus one decision about
programs:
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
(1)
(2)
(3)
Formative decisions. These are decisions faculty make
to improve and shape the quality of their teaching. It is
based on evidence of teaching effectiveness they gather
to plan and revise their teaching semester after semester. This evidence and the subsequent adjustments in
teaching can occur anytime during the course, so the
students can benefit from those changes, or after the
course in preparation for the next course.
Summative decisions. These decisions are rendered by
the administrative-type person who controls a professor’s destiny and future in higher education. This
individual is usually the dean, associate dean, program
director, or department head or chair. This administrator uses evidence of a professor’s teaching effectiveness
along with other evidence of research, publications,
clinical practice, and service to ‘‘sum up’’ his or her
overall performance or status to decide about contract
renewal or dismissal, annual merit pay, teaching
awards, and promotion and tenure.
Although promotion and tenure decisions are often
made by a faculty committee, a letter of recommendation by the dean is typically required to reach the
committee for review. These summative decisions are
high-stakes, final employment decisions reached at
different points in time to determine a professor’s
progression through the ranks and success as an
academician.
Program decisions. Several sources of evidence can also
be used for program decisions, as defined in the
Program Evaluation Standards by the US Joint
Committee on Standards for Educational Evaluation
(Yarbrough et al. 2011). They relate to the curriculum,
admissions and graduation requirements, and program effectiveness. They are NOT individual decisions;
instead, they focus on processes and products. The
evidence usually is derived from various types of faculty
and student input and employers’ performance appraisal of students. It is also collected to provide documentation to satisfy the criteria for accreditation review.
Matching sources of evidence to decisions. The challenge is
to pick the most appropriate and highest quality sources of
evidence for the specific decision to be made; that is, match the
sources to the decision. The decision drives your choices of
evidence. Among the aforementioned 15 sources of evidence of
teaching effectiveness, here are my best picks based on the
literature for formative, summative, and program decisions:
Formative decisions
. student ratings,
. peer observations,
18
.
.
.
.
.
.
peer review of course materials,
external expert ratings,
self-ratings,
videos,
student interviews, and
mentor’s advice.
Summative decisions (annual review for contract renewal
and merit pay)
.
.
.
.
.
.
student ratings,
self-ratings,
teaching scholarship,
administrator ratings,
teaching portfolio (for several courses over the year),
peer observation (report written expressly for summative
decision),
. peer review of course materials (report written expressly for
summative decision), and
. mentor’s review ( progress report written expressly for
summative decision).
Summative decisions ( promotion and tenure)
.
.
.
.
.
.
.
student ratings,
self-ratings,
teaching scholarship,
administrator ratings,
teaching portfolio (across several years’ courses),
peer review (written expressly for summative decision), and
mentor’s review ( progress report written expressly for
summative decision).
Program decisions
. Student ratings
. Exit and alumni ratings
. Employer ratings
The multiple sources identified for each decision can be
configured into the 360 multisource feedback (MSF) model of
assessment (Berk 2009a, 2009b) or other model for accreditation documentation of teaching assessment. The sources for
each decision may be added gradually to the model. This is an
on-going process for your institution.
Recommendations. So now that you have seen my picks,
which sources are you going to choose? So many sources, so
little time! Which sources are already available in your
department? What is the quality of the measures used to
provide evidence of teaching effectiveness? Are the faculty
stakeholders involved in the current process?
You have some decisions to make. Where do you begin?
Here are a few suggestions:
(1)
(2)
Start with student ratings. Consider the content and
quality of your current scale and determine whether it
needs a minor or major tune-up for the decisions being
made.
Review the other sources of evidence with your faculty
to decide the next steps. Which sources will your
faculty embrace which reflect best practices in
Flashpoints in teaching effectiveness
(3)
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
(4)
teaching? Weigh the pluses and minuses of the different
sources.
Decide which combination of sources is best for your
faculty. Identify which sources should be used for both
formative and summative decisions, such as self- and
peer ratings, and which sources should be used for one
type of decision but not the other, such as administrator
ratings and teaching portfolio.
Map out a plan to build those sources, one at a time, to
create an assessment model for each decision (see Berk
2009a).
Whatever combination of sources you choose to use, take
the time and make the effort to design the scales, administer
the scales, and report the results appropriately. The accuracy
of faculty assessment decisions depends on the integrity of the
process and the validity and reliability of the multiple sources
of evidence you collect. This endeavor may seem rather
formidable, but, keep in mind, you are not alone in this
process. Your colleagues at other institutions are probably
struggling with the same issues. Maybe you could pool
resources.
Quality of ‘‘home-grown’’ rating scales vs.
commercially-developed scales
FLASHPOINT 3: Many of the rating scales developed by faculty committees in medical schools/
colleges and universities do not meet even the most
basic criteria for psychometric quality required by
professional and legal standards. Most of the scales
are flawed internally, administered incorrectly, and
rarely is there any evidence of score reliability and
validity. The serious concern is that decisions about
the careers of faculty are being made with these
instruments.
Quality control. Researchers have reviewed the quality of
student rating scales used by colleges and universities
throughout the US and Canada (Berk 1979, 2006; Franklin &
Theall 1990; d’Apollonia & Abrami 1997b, 1997c; Seldin 1999;
Theall & Franklin 2000; Abrami 2001; Franklin 2001; Ory &
Ryan 2001; Arreola 2007; Gravestock & Gregor-Greenleaf
2008). The instruments are either commercially developed
scales with pre-designed reporting forms or ‘‘home-grown,’’
locally constructed measures built usually by faculty committees. The former exhibit the quality control of the company
that developed the scales and reports, such as Educational
Testing Service and The IDEA Center (see Flashpoint 4); the
latter have no consistency in the development process and
rarely any formal procedures for controlling psychometric
quality.
Quality of ‘‘home-grown’’ scales. That lack of quality control
may very well extend to institutions worldwide. It could be
due to a lack of commitment, importance, accountability, or
interest; inappropriate personnel without the essential skills; or
limited resources. No one knows for sure. Regardless of the
reason, the picture is ugly.
Reviewers of practices at institutions in North America have
found the following problems with ‘‘home-grown’’ scales:
.
.
.
.
.
.
.
.
poor or no specifications of teaching behaviors,
faulty items (statements and anchors),
ambiguous or confusing directions,
unstandardized administration procedures,
inappropriate data collection, analysis, and reporting,
no adjustments in ratings for extraneous factors,
no psychometric studies of score reliability and validity, and
no guidelines or training for faculty and administrators to
use the results correctly for appropriate decisions.
Does the term psychometrically putrid summarize current
practices? How does your scale stack up against those
problems? Fertilizer-wise, ‘‘home-grown’’ scales are not growing. Their development is arrested. They are more like ‘‘Peter
Pan scales.’’
The potential negative consequences of using faulty
measures to make biased and unfair decisions to guide
teaching improvement and faculty careers can be devastating.
Moreover, this assessment only addresses the quality of
student rating scales. What would be the quality of peer
observations, self-ratings, and administrator ratings and their
interpretations? Serious attention needs to be devoted to the
quality control of all ‘‘home-grown’’ scales.
From a broader perspective, poor quality scales violates US
testing/scaling standards according to the Standards for
Educational and Psychological Testing (AERA, APA, &
NCME Joint Committee on Standards 1999), Personnel
Evaluation Standards (Joint Committee on Educational
Evaluation Standards 2009), and the US Equal Employment
Opportunity Commission’s (EEOC) Uniform Guidelines on
Employee Selection Procedures (US Equal Employment
Opportunity Commission 2010). The psychometric requirements for instruments used for summative ‘‘employment’’
decisions about faculty are rigorous and appropriate for their
purposes.
Recommendations. This issue reduces to the leadership and
the composition of the faculty committee that accepts the
responsibility to develop the scales and reports and/or the
external consultant or vendor hired to guide the development
process. The psychometric standards for the construction,
administration, analysis, and interpretation of scales must be
articulated and guided by professionals trained in those
standards (AERA, APA, & NCME Joint Committee on Standards
1999). As Flashpoint 2 emphasized, if the committee does not
contain one or more professors with expertise in psychometrics, then it should be ashamed of itself. That is a prescription
for putridity and the previous problem list. Reviewers rarely
found any one with these skills on the committees of the
institutions surveyed.
It is also recommended that all faculty members be given
workshops on item writing and scale structure. In the
development process, they will be reviewing, selecting,
critiquing, adapting, and writing items. Even if faculty are
excellent test item writers, that does not mean they can write
scale items.
19
R. A. Berk
The structure and criteria for writing scale items are very
different from test items (Berk 2006), not difficult, just different.
Even with commercially developed instruments, professors are
usually given the option to add up to 10 course-specific items;
in other words, they will need to write items. Rules for writing
scale items are available in references on scale construction
(Netemeyer et al. 2003; Dunn-Rankin et al. 2004; Streiner &
Norman 2008; Berk 2006; deVellis 2012).
catalog of items. These options are listed in order of increasing
cost. Depending on in-house resources, it is possible to
execute the entire processing in a very cost-effective manner.
Alternatively, estimates from a variety of vendors should be
obtained for the out-house options.
(1)
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Paper-and-pencil vs. online scale administration
FLASHPOINT 4: The battle between paper-andpencil versus online administration of student rating
scales is still being fought in medical schools and on
many campuses worldwide. Despite an international trend and numerous advantages and
improvements in online systems over the past
decade, there are faculty who still dig their heels in
and institutions that have resisted the conversion.
Much has been learned about how to increase
response rates, which is a flashpoint by itself, and
how to overcome many of the deterrents to adopting
an online system. Online administration, analysis,
and reporting can be executed in-house or by an
out-house vendor specializing in that processing.
Comparison of paper-and-pencil and online administration. A detailed examination of the advantages and disadvantages of the two modes of administration according to
15 key factors has been presented by Berk (2006). There are
major differences between them. Although it was concluded
that both are far from perfect, the benefits of the online mode
and the improvements in the delivery system with the research
and experiences over the past few years exceed the pluses of the
paper-based mode. Furthermore, most Net Geners do not
know what a pencil is. Unless it is an iPencil, it is not on their
radar or part of their mode.
The benefits of the online mode include ease of administration, administration flextime, low cost, rapid turnaround
time for results, ease of scale revision, and higher quality and
greater quantity of unstructured responses (Sorenson &
Johnson 2003; Anderson et al. 2005; Berk 2006; Liu 2006;
Heath et al. 2007). Students’ concerns with lack of anonymity,
confidentiality of ratings, inaccessibility, inconvenience, and
technical problems have been eliminated at many institutions.
Faculty resistance issues of low response rates and negative
bias and lower ratings than paper-based version have been
addressed (Berk 2006). Two major topics that still need
attention are lack of standardization (Flashpoint 5) and
response bias, which tends to be the same for both paper
and online.
Three online delivery options. Online administration, scoring,
analysis, and reporting of student ratings can be handled in
three ways: (1) in-house by the department of computer
services, IT, or equivalent unit; (2) out-house by a vendor that
provides all delivery services for the institution’s ‘‘homegrown’’ scale; or (3) out-house by a vendor that provides all
services, plus their own scale or a scale you create from their
20
(2)
In-house administration. If you have developed or
plan to develop your own scale, you should consider
this option. Convene the key players who can make
this happen, including administrators and staff from IT
or computer services, faculty development, and a
testing center, plus at least one measurement expert.
A discussion of scale design, scoring, analysis, report
design, and distribution can determine initially
whether the resources are available to execute the
system. Once a preliminary assessment of the resources
required has been completed, costs should be estimated for each phase. A couple of meetings can
provide enough information to consider the possibility.
Your in-house system components, products, and
personnel can then be compared to the two options
described next. As you go shopping for an online
system, at least you will have done your homework and
be able to identify what the commercial vendors offer,
including qualitative differences, that you cannot execute yourself. Although the cost could be the dealbreaker, you will know all the options available to
make an informed final decision. Further, you can
always change your system if your stocks plummet, the
in-house operation has too many bumps that cannot be
squished and ends up in Neverland, or the commercial
services do not deliver as promised.
Vendor administration with ‘‘home-grown’’ scale.
If outsourcing to a vendor is your preference or you
just want to explore this option, but you want to
maintain control over your own scale content and
structure, there are certain vendors that can online your
scale. For some strange reason, they are all located in
Madagascar. Kidding. They include CollegeNET (What
Do You Think?), ConnectEDU (courseval), and IOTA
Solutions (MyClassEvaluation). They will administer
your scale online, perform all analyses, and generate
reports for different decision makers. Thoroughly
compare all of their components with yours. Evaluate
the pluses and minuses of each package.
Make sure to investigate the compatibility of the
packages with your course management system. The
choice of the system is crucial to provide the anonymity
for students to respond, which can boost response rates
(Oliver & Sautter 2005). Most of the vendors’ packages
are compatible with Blackboard, WebCT, Moodle,
Sakai, and other campus portal systems.
There are even free online survey providers, such as
Zoomerang (MarketTools 2006), which can be used
easily by any instructor without a course management
system (Hong 2008). Other online survey software, both
free and pay, has been reviewed by Wright (2005).
There are specific advantages and disadvantages of the
different packages, especially with regard to rating
Flashpoints in teaching effectiveness
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
(3)
scale structure and reporting score results (Hong 2008).
This is a viable online option worth investigating for
formative feedback.
Vendor administration and rating scale. If you want a
vendor to supply the rating scale and all of the delivery
services, there are several commercial student rating
systems you should consider. Examples include Online
Course Evaluation, Student Instructional Report II,
Course/Instructor Evaluation Questionnaire, IDEA
Student Ratings of Instruction, Student Evaluation of
Educational Quality, Instructional Assessment System,
and Purdue Instructor Course Evaluation Service.
Sample forms and lists of services with prices are
given on the websites for these scales.
This is the simplest solution to the student rating
scale online system: Just go buy one. The seven
packages are designed for you, Professor Consumer.
The items are professionally developed; the scale has
usually undergone extensive psychometric analyses to
provide evidence of reliability and validity; and there
are a variety of services provided, including the scale,
online administration, scanning, scoring, and reporting
of results in a variety of formats with national norms.
For some, you can access a specimen set of rating
scales and report forms online. All of the vendors
provide a list of services and prices on their websites.
Carefully shop around to find the best fit for your
faculty and administrator needs and institutional
culture. The packages vary considerably in scale
design, administration options, report forms, norms,
and, of course, cost.
Comparability of paper-and-pencil and online ratings.
Despite all of the differences between paper-based and
online administrations and the contaminating biases that afflict
the ratings they produce, researchers have found consistently
that online students and their in-class counterparts rate
courses and instructors similarly (Layne et al. 1999; Spooner
et al. 1999; Waschull 2001; Carini et al. 2003; Hardy 2003;
McGee & Lowell 2003; Dommeyer et al. 2004; Avery et al.
2006; Benton et al. 2010b; Venette et al. 2010; Perrett 2011;
Stowell et al. 2012). The ratings on the structured items are not
systematically higher or lower for online administrations. The
correlations between online and paper-based global item
ratings were 0.84 (overall instructor) and 0.86 (overall course)
(Johnson 2003).
Although the ratings for online and paper are not identical,
with more than 70% of the variance in common, any
differences in ratings that have been found are small.
Further, interrater reliabilities of ratings of individual items
and item clusters for both modalities were comparable (McGee
& Lowell 2003), and so were the underlying factor structures
(Layne et al. 1999; Leung & Kember 2005). All of these
similarities were also found in comparisons between face-toface and online courses, although response rates were slightly
lower in the online courses (Benton et al. 2010a).
Alpha total scale (18 items) reliabilities were similar for
paper-based (0.90) and online (0.88) modes when all items
appeared on the screen (Peer & Gamliel 2011). Slightly lower
coefficients (0.74–0.83) for online displays of one, two, or
four items only on the screen were attributable to response
bias (Gamliel & Davidovitz 2005; Berk 2010; Peer &
Gamliel 2011).
The one exception to the above similarities is the unstructured items, or open-ended comment section. The research
has indicated that the flexible time permitted to the onliners
usually, but not always, yields longer, more frequent and
thoughtful comments than those of in-class respondents
(Layne et al. 1999; Ravelli 2000; Johnson 2001, 2003; Hardy
2002, 2003; Anderson et al. 2005; Donovan et al. 2006; Venette
et al. 2010; Morrison 2011). Typing the responses is reported
by students to be easier and faster than writing them, plus it
preserves their anonymity (Layne et al. 1999; Johnson 2003).
Recommendations. Weighing all of the pluses and minuses
in this section strongly suggests that the conversion from a
paper-based to online administration system seems worthy of
serious consideration by medical schools/colleges and
every other institution of higher education using student
ratings. When the concerns of the online approach are
addressed, its benefits for face-to-face, blended/hybrid, and
online/distance courses far outweigh the traditional paperbased approach. (NOTE: Online administration should also be
employed for alumni ratings and employer ratings. The costs
for these ratings will be a small fraction of the cost of the
student rating system.)
Standardized vs. unstandardized online scale
administration
FLASHPOINT 5: Standardized administration
procedures for any measure of human or rodent
behavior are absolutely essential to be able to
interpret the ratings with the same meaning for all
individuals who completed the measure. Student
rating scales are typically administered online at the
end of the semester without regard for any standardization or controls. There doesn’t seem to be any
sound psychometric reasons for why the administrations are scheduled the way they are. This is,
perhaps, the most neglected issue in the literature
and in practice.
Importance of standardization. A significant amount of
attention has been devoted to establishing standardized
times, conditions, locations, and procedures for administering
in-class tests and clinical measures, such as the OSCE, as well
as out-of-class admissions, licensing, and certification tests.
National standards for testing practices require this standardization to assure that students take tests under identical
conditions so their scores can be interpreted in the same way,
they are comparable from one student or group to another,
and they can be compared to norms (AERA, APA, & NCME
Joint Committee on Standards 1999).
Unfortunately, standardization has been completely
neglected in the faculty evaluation literature for the administration of online student rating scales (Berk 2006). This topic
was only briefly mentioned in a recent review of the student
21
R. A. Berk
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
ratings research (Addison & Stowell 2012). Although the
inferences drawn from the scale scores and other measures of
teaching effectiveness require the same administration precision as tests, procedures to assure scores will have the same
meaning from students completing the scales at the end of the
semester have not been addressed in research and practice.
Typically, students are given notice that they have 1 or 2
weeks to complete the student ratings form with the deadline
before or after the final exam/project.
Confounding uncontrolled factors. Since students can complete online rating scales during their discretionary time, there
is no control over the time, place, conditions, or any
situational factors under which the self-administrations occur
(Stowell et al. 2012). Most of these factors were controlled with
the paper-and-pencil, in-class administration by the instructor
or a student appointed to handle the administration.
In fact, in the online mode, there is no way to insure that
the real student filled out the form or did not discuss it with
someone who already did. It could be a roommate, partner,
avatar, alien, student who has never been to class doing a
favor in exchange for a pizza, alcohol, or drugs, or all of the
preceding. Any of those substitutes would result in fraudulent
ratings (Standard 5.6). Bad, bad ratings! Although there is no
standardization of the actual administration, at least the written
directions given to all students can be the same. Therefore, the
procedures that the students follow should be similar if they
read the directions.
Timing of administration. The timing of the administration
can also markedly affect the ratings. For example, if some
students complete the scale before the final review and final
exam, on the day of the final, or after the exam, their feelings
about the instructor/course can be very different. Exposure to
the final exam alone can significantly affect ratings,
particularly if there are specific items on the scale measuring
testing and evaluation methods. It could be argued that the
final should be completed in order to provide a true rating of
all evaluation methods.
Despite a couple of ‘‘no difference’’ studies of paper-andpencil administrations almost 40 years ago (Carrier et al. 1974;
Frey 1976) and one study examining final exam day administration (Ory 2001), which produced lower ratings, there does
not seem to be any agreement among the experts on the best
time to administer online scales or on any specific standardization procedures other than directions.
What is clear is that whatever time is decided must be the
same for all students in all courses; otherwise, the ratings of
these different groups of students will not have the same
meaning. For example, faculty within a department should
agree that all online administrations must be completed before
the final or after, but not both. Faculty must decide on the best
time to get the most accurate ratings. That decision will also
affect the legitimacy of any comparison of the ratings to
different norm groups.
Standards for standardization. So what is the problem with
the lack of standardization? The ratings by students are
assumed to be collected under identical conditions according
22
to the same rules and directions. Standardization of the
administration and environment provide a snapshot of how
students feel at one point in time. Although their individual
ratings will vary, they will have the same meaning. Rigorous
procedures for standardization are required by the US
Standards for Educational and Psychological Testing (AERA,
APA, & NCME Joint Committee on Standards 1999).
Groups of students must be given identical instructions,
which is possible, administered the scale under identical
conditions, which is nearly impossible, to assure the comparability of their ratings (Standards 3.15, 3.19, and 3.20). Only
then would the interpretation of the ratings and, in this case,
the inferences about teaching effectiveness from the ratings be
valid and reliable (Standard 3.19). In other words, without
standardization, such as when every student fills out the scale
willy-nilly at different times of the day and semester, in
different places, under different conditions, using different
procedures, the ratings from student to student and professor
to professor will not be comparable.
Recommendations. Given the limitations of online administration, what can be done to approximate some semblance of
standardized conditions or, at least, minimize the extent to
which the bad conditions contaminate the ratings? Here are a
few options extended from Berk’s (2006) previous suggestions, listed from highest level of standardization and control to
lowest level:
(1)
(2)
(3)
In-class administration before final for maximum
control: Set a certain time slot in class, just like the
paper-and-pencil version, for students to complete the
forms on their own PC/Mac, iPad, iPhone, iPencil, or
other device. The professor should leave the room and
have a student execute and monitor the process.
Adequate time should be given for students to type
comments for the unstructured section of the scale.
(NOTE: Not recommended if there are several items or
a subscale that measures course evaluation methods,
since the final is part of those methods.)
Computer lab time slots before or after final: Set certain
time slots in the computer lab or an equivalent location
during which students can complete the forms. The
controls exercised in the previous option should be
followed in the lab. If available, techie-type students
should proctor the slots to eliminate distractions and
provide technical support for any problems that arise.
One or two days before or after final at students’
discretion: This is the most loosy-goosy option with the
least control, albeit, the most popular. Specify a narrow
window within which the ratings must be completed,
such as one or two days after the final class and before
the final exam, or one or two days after the exam
before grades are submitted and posted. This gives new
meaning to ‘‘storm window.’’
Any of these three options will improve the standardization
of current online administration practices beyond the typical
1- or 2-week bay window. Experience and research on these
procedures will hopefully identify the confounding variables
that can affect the online ratings. Ultimately, concrete
Flashpoints in teaching effectiveness
guidelines to assist faculty in deciding on the most appropriate
administration protocol will result.
Declaration of interest: The author report no conflicts of
interest. The author alone is responsible for the content and
writing of the article.
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Top five recommendations
After ruminating over these flashpoints, it can be concluded
that there are a variety of options within the reach of every
medical school/college and institution of higher education to
improve its current practices with its source(s) of evidence and
administration procedures. Everyone is wrestling with these
issues and, although more research is needed to test the
options, there are tentative solutions to these problems. As
experience and research continue to accumulate, even better
solutions will result.
There is a lot of activity and discourse on these flashpoints
because we know that all of the summative decisions about
faculty will be made with or without the best information
available. Further, professors who are passionate about
teaching will also seek out sources of evidence to guide
their improvement.
The contribution of this PBW article rests on the value and
usefulness of the recommendations that you can convert into
action. Without action, the recommendations are just dead
words on a page. Your TAKE-AWAYS are the concrete action
steps you choose to implement to improve the current state of
your teaching assessment system.
Here are the top five recommendations framed in terms of
action steps:
(1)
(2)
(3)
(4)
(5)
polish your student rating scale, but also start building
additional sources of evidence, such as self, peer, and
mentor scales, to assess teaching effectiveness;
match your highest quality sources to the specific
formative and summative decisions using the 360 MSF
model;
review current measures of teaching effectiveness with
your faculty and plan specifically how you can improve
their psychometric quality;
design an online administration system in-house or outhouse with a vendor to conduct the administration and
score reporting for your own student rating scale or the
one it provides; and
standardize directions, administration procedures, and
a narrow window for completion of your student rating
scale and other measures of teaching effectiveness.
Taking action on these five can yield major strides in
improving the practice of assessing teaching effectiveness and
the fairness and equity of the formative and summative
decisions made with the results. Just how important is
teaching in your institution? Your answer will be expressed
in your actions. What can you contribute to make it better than
it is ever been? That is my challenge to you.
Notes on contributor
RONALD A. BERK, PhD, is Professor Emeritus, Biostatistics and
Measurement, and former Assistant Dean for Teaching at the Johns
Hopkins University, where he taught for 30 years. He has presented 400
keynotes/workshops and published 14 books, 165 journal articles, and 300
blogs. His professional motto is: ‘‘Go for the Bronze!’’
References
Medical/healthcare education
Ahmady S, Changiz T, Brommels M, Gaffney FA, Thor J, Masiello I. 2009.
Contextual adaptation of the Personnel Evaluation Standards for
assessing faculty evaluation systems in developing countries: The
case of Iran. BMC Med Educ 9(18), DOI: 10.1186/1472-6920-9-18.
Anderson HM, Cain J, Bird E. 2005. Online student course evaluations:
Review of literature and a pilot study. Am J Pharm Educ 69(1):34–43.
Available from http://web.njit.edu/bieber/pub/Shen-AMCIS2004.pdf.
Appling SE, Naumann PL, Berk RA. 2001. Using a faculty evaluation triad to
achieve evidenced-based teaching. Nurs Health Care Perspect
22:247–251.
Barnett CW, Matthews HW. 2009. Teaching evaluation practices in colleges
and schools of pharmacy. Am J Pharm Educ 73(6).
Barnett CW, Matthews HW, Jackson RA. 2003. A comparison between
student ratings and faculty self-ratings of instructional effectiveness.
J Pharm Educ 67(4).
Berk RA. 2009a. Using the 360 multisource feedback model to evaluate
teaching and professionalism. Med Teach 31(12):1073–1080.
Berk RA, Naumann PL, Appling SE. 2004. Beyond student ratings: Peer
observation of classroom and clinical teaching. Int J Nurs Educ
Scholarsh 1(1):1–26.
Boerboom TBB, Mainhard T, Dolmans DHJM, Scherpbier AJJA, van
Beukelen P, Jaarsma ADC. 2012. Evaluating clinical teachers with the
Maastricht clinical teaching questionnaire: How much ‘teacher’ is in
student ratings? Med Teach 34(4):320–326.
Chenot J-F, Kochen MM, Himmel W. 2009. Student evaluation of a primary
care clerkship: Quality assurance and identification of potential for
improvement. BMC Med Educ 9(17), DOI: 10.1186/1472-6920-9-17.
DiVall M, Barr J, Gonyeau M, Matthews SJ, van Amburgh J, Qualters D,
Trujillo J. 2012. Follow-up assessment of a faculty peer observation and
evaluation program. Am J Pharm Educ 76(4).
Donnon T, Delver H, Beran T. 2010. Student and teaching characteristics
related to ratings of instruction in medical sciences graduate programs.
Med Teach 32(4):327–332.
Elzubeir M, Rizk D. 2002. Evaluating the quality of teaching in medical
education: Are we using the evidence for both formative and
summative purposes? Med Teach 24:313–319.
Hoeks TW, van Rossum HJ. 1988. The impact of student ratings on a new
course: The general clerkship (ALCO). Med Educ 22(4):308–313.
Jones RF, Froom JD. 1994. Faculty and administration views of problems in
faculty evaluation. Acad Med 69(6):476–483.
Kidd RS, Latif DA. 2004. Student evaluations: Are they valid measures of
course effectiveness? J Pharm Educ 68(3).
Maker VK, Lewis MJ, Donnelly MB. 2006. Ongoing faculty evaluations:
Developmental gain or just more pain? Curr Surg 63(1):80–84.
Martens MJ, Duvivier RJ, van Dalen J, Verwijnen GM, Scherpbier AJ, van der
Vleuten. 2009. Student views on the effective teaching of physical
examination skills: A qualitative study. Med Educ 43(2):184–191.
Mazor K, Clauser B, Cohen A, Alper E, Pugnaire M. 1999. The dependability
of students’ rating of preceptors. Acad Med 74:19–21.
Pattison AT, Sherwood M, Lumsden CJ, Gale A, Markides M. 2012.
Foundation observation of teaching project – A developmental model
of peer observation of teaching. Med Teach 34(2):e36–e142.
Pierre RB, Wierenga A, Barton M, Branday JM, Christie CD. 2004.
Student evaluation of an OSCE in paediatrics at the University of
the West Indies, Jamaica. BMC Med Educ 4(22), DOI: 10.1186/14726920-4-22.
Schiekirka S, Reinhardt D, Heim S, Fabry G, Pukrop T, Anders S,
Raupach T. 2012. Student perceptions of evaluation in undergraduate
medical education: A qualitative study from one medical school.
BMC Med Educ 12(45), DOI: 10.1186/1472-6920-12-45.
23
R. A. Berk
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Siddiqui ZS, Jonas-Dwyer D, Carr SE. 2007. Twelve tips for peer
observation of teaching. Med Teach 29(4):297–300.
Stalmeijer RE, Dolmans DH, Wolfhagen IH, Peters WG, van Coppenolle L,
Scherpbier AJ. 2010. Combined student ratings and self-assessment
provide useful feedback for clinical teachers. Adv Health Sci Educ
Theory Pract 15(3):315–328.
Stark P. 2003. Teaching and learning in the clinical setting: A qualitative
study of the perceptions of students and teachers. Med Educ
37(11):975–982.
Steinert Y. 2004. Student perceptions of effective small group teaching. Med
Educ 38(3):286–293.
Sullivan PB, Buckle A, Nicky G, Atkinson SH. 2012. Peer observation of
teaching as a faculty development tool. BMC Med Educ 12(26), DOI:
10.1186/1472-6920-12-26.
Turhan K, Yaris F, Nural E. 2005. Does instructor evaluation by students
using a web-based questionnaire impact instructor performance? Adv
Health Sci Educ Theory Pract 10(1):5–13.
Wellein MG, Ragucci KR, Lapointe M. 2009. A peer review process for
classroom teaching. Am J Pharm Educ 73(5).
General higher education
Abrami PC. 2001. Improving judgments about teaching effectiveness using
rating forms. In: Theall M, Abrami PC, Mets LA, editors. The student
ratings debate: Are they valid? How can we best use them? (New
Directions for Institutional Research, No. 109). San Francisco, CA:
Jossey-Bass. pp 59–87.
Addison WE, Stowell JR. 2012. Conducting research on student evaluations
of teaching. In: Kite ME, editor. Effective evaluation of teaching: A guide
for faculty and administrators. pp 1–12. E-book [Accessed 6 June 2012]
Available from the Society for the Teaching of Psychology website
http://teachpsych.org/ebooks/evals2012/index.php.
AERA
(American
Educational
Research
Association),
APA
(American Psychological Association), NCME (National Council on
Measurement in Education) Joint Committee on Standards. 1999.
Standards for educational and psychological testing. Washington, DC:
AERA.
Ali DL, Sell Y. 1998. Issues regarding the reliability, validity and utility of
student ratings of instruction: A survey of research findings. Calgary,
AB: University of Calgary APC Implementation Task Force on Student
Ratings of Instruction.
Arreola RA. 2007. Developing a comprehensive faculty evaluation system:
A handbook for college faculty and administrators on designing and
operating a comprehensive faculty evaluation system. 3rd ed. Bolton,
MA: Anker.
Avery RJ, Bryan WK, Mathios A, Kang H, Bell D. 2006. Electronic course
evaluations: Does an online delivery system influence student
evaluations? J Econ Educ 37(1):21–37.
Benton SL, Cashin WE. 2012. Student ratings of teaching: A summary
of research and literature (IDEA Paper no. 50). Manhattan, KS:
The IDEA Center. [Accessed 8 April 2012] Available from http://
www.theideacenter.org/sites/default/files/idea-paper_50.pdf.
Benton SL, Webster R, Gross A, Pallett W. 2010a. An analysis of IDEA
Student Ratings of Instruction in traditional versus online courses (IDEA
Technical Report no. 15). Manhattan, KS: The IDEA Center.
Benton SL, Webster R, Gross A, Pallett W. 2010b. An analysis of IDEA
Student Ratings of Instruction using paper versus online survey
methods (IDEA Technical Report no. 16). Manhattan, KS: The IDEA
Center.
Berk RA. 1979. The construction of rating instruments for faculty
evaluation: A review of methodological issues. J Higher Educ
50:650–669.
Berk RA. 2005. Survey of 12 strategies to measure teaching effectiveness.
Int J Teac Learn Higher Educ 17(1):4862. Available from http://
www.isetl.org/ijtlthe.
Berk RA. 2006. Thirteen strategies to measure college teaching:
A consumer’s guide to rating scale construction, assessment, and
decision making for faculty, administrators, and clinicians. Sterling, VA:
Stylus.
24
Berk RA. 2009b. Beyond student ratings: ‘‘A whole new world, a new
fantastic point of view.’’ Essays Teach Excellence 20(1). Available from
http://podnetwork.org/publications/teachingexcellence.htm.
Berk RA. 2010. The secret to the ‘‘best’’ ratings from any evaluation scale.
J Faculty Dev 24(1):37–39.
Braskamp LA, Ory JC. 1994. Assessing faculty work: Enhancing individual
and institutional performance. San Francisco, CA: Jossey-Bass.
Calderon TG, Gabbin AL, Green BP. 1996. Report of the committee on
promoting evaluating effective teaching. Harrisonburg, VA: James
Madison University.
Carini RM, Hayek JC, Kuh GD, Ouimet JA. 2003. College student responses
to web and paper surveys: Does mode matter? Res Higher Educ
44(1):1–19.
Carrier NA, Howard GS, Miller WG. 1974. Course evaluations: When?
J Educ Psychol 66:609–613.
Cashin WE. 2003. Evaluating college and university teaching: Reflections of
a practitioner. In: Smart JC, editor. Higher education: Handbook of
theory and research. Dordrecht, the Netherlands: Kluwer Academic
Publishers. pp 531–593.
Centra JA. 1993. Reflective faculty evaluation: Enhancing teaching and
determining faculty effectiveness. San Francisco: Jossey-Bass.
Cohen PA, McKeachie WJ. 1980. The role of colleagues in the evaluation of
teaching. Improving College Univ Teach 28(4):147–154.
Coren S. 2001. Are course evaluations a threat to academic freedom?
In: Kahn SE, Pavlich D, editors. Academic freedom and the inclusive
university. Vancouver, BC: University of British Columbia Press.
pp 104–117.
d’Apollonia S, Abrami PC. 1997a. Navigating student ratings of instruction.
Am Psychol 52:1198–1208.
d’Apollonia S, Abrami PC. 1997b. Scaling the ivory tower, part 1:
Collecting evidence of instructor effectiveness. Psychol Teach Rev
6:46–59.
d’Apollonia S, Abrami PC. 1997c. Scaling the ivory tower, part 2:
Student ratings of instruction in North America. Psychol Teach
Rev 6:60–76.
deVellis RF. 2012. Scale development: Theory and applications. 3rd ed.
Thousand Oaks, CA: Sage.
Dommeyer CJ, Baum P, Hanna RW, Chapman KS. 2004. Gathering
faculty teaching evaluations by in-class and online surveys: Their
effects on response rates and evaluations. Assess Eval Higher
Educ 29(5):611–623.
Donovan J, Mader CE, Shinsky J. 2006. Constructive student feedback:
Online vs. traditional course evaluations. J Interact Online Learn
5:283–296.
Dunn-Rankin P, Knezek GA, Wallace S, Zhang S. 2004. Scaling methods.
Mahwah, NJ: Erlbaum.
Franklin J. 2001. Interpreting the numbers: Using a narrative to help others
read student evaluations of your teaching accurately. In: Lewis KG,
editor. Techniques and strategies for interpreting student evaluations
(Special issue) (New Directions for Teaching and Learning, No. 87).
San Francisco, CA: Jossey-Bass. pp 85–100.
Franklin J, Theall M. 1990. Communicating student ratings to decision
makers: Design for good practice. In: Theall M, Franklin J, editors.
Student ratings of instruction: Issues for improving practice (Special
issue) (New Directions for Teaching and Learning, No. 43). San
Francisco, CA: Jossey-Bass. pp 75–93.
Frey PW. 1976. Validity of student instructional ratings as a function of their
timing. J Higher Educ 47:327–336.
Freyd M. 1923. A graphic rating scale for teachers. J Educ Res
8(5):433–439.
Gamliel E, Davidovitz L. 2005. Online versus traditional teaching
evaluation: Mode can matter. Assess Eval Higher Educ 30(6):
581–592.
Gravestock P, Gregor-Greenleaf E. 2008. Student course evaluations:
Research, models and trends. Toronto, ON: Higher Education
Quality Council of Ontario. E-book [Accessed 6 May 2012] Available
from http://www.heqco.ca/en-CA/Research/Research%20Publications/
Pages/Home.aspx.
Green BP, Calderon TG, Reider BP. 1998. A content analysis of teaching
evaluation instruments used in accounting departments. Issues Account
Educ 13(1):15–30.
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Flashpoints in teaching effectiveness
Hardy N. 2002. Perceptions of online evaluations: Fact and fiction. Paper
presented at the annual meeting of the American Educational Research
Association, April 1–5 2002, New Orleans, LA.
Hardy N. 2003. Online ratings: Fact and fiction. In: Sorenson DL, Johnson
TD, editors. Online student ratings of instruction (New Directions for
Teaching and Learning, No. 96). San Francisco, CA: Jossey-Bass.
pp 31–38.
Heath N, Lawyer S, Rasmussen E. 2007. Web-based versus paper and
pencil course evaluations. Teach Psychol 34(4):259–261.
Hong PC. 2008. Evaluating teaching and learning from students’
perspectives in their classroom through easy-to-use online surveys.
Int J Cyber Soc Educ 1(1):33–48.
Hoyt DP, Pallett WH. 1999. Appraising teaching effectiveness: Beyond
student ratings (IDEA Paper no. 36). Manhattan, KS: Kansas State
University Center for Faculty Evaluation and Development.
Johnson TD. 2001. Online student ratings: Research and possibilities.
Invited plenary presented at the Online Assessment Conference,
September, Champaign, IL.
Johnson TD. 2003. Online student ratings: Will students respond?.
In: Sorenson DL, Johnson TD, editors. Online student ratings of
instruction (New Directions for Teaching and Learning, no. 96).
San Francisco, CA: Jossey-Bass. pp 49–60.
Joint Committee on Standards for Educational Evaluation. 2009. The
personnel evaluation standards: How to assess systems for evaluating
educators. 2nd ed. Thousand Oaks, CA: Corwin Press.
Kite ME, editor. 2012. Effective evaluation of teaching: A guide for faculty
and administrators. E-book [Accessed 6 June 2012] Available from the
Society for the Teaching of Psychology website http://teachpsych.org/
ebooks/evals2012/index.php.
Knapper C, Cranton P, editors. 2001. Fresh approaches to the evaluation
of teaching (New Directions for Teaching and Learning, no. 88).
San Francisco, CA: Jossey-Bass. pp 19–29.
Layne BH, DeCristoforo JR, McGinty D. 1999. Electronic versus
traditional student ratings of instruction. Res Higher Educ
40(2):221–232.
Leung DYP, Kember D. 2005. Comparability of data gathered from
evaluation questionnaires on paper through the Internet. Res Higher
Educ 46:571–591.
Liu Y. 2006. A comparison of online versus traditional student evaluation of
instruction. Int J Instr Technol Distance Learn 3(3):15–30.
MarketTools. 2006. Zoomerang: Easiest way to ask, fastest way to know.
[Accessed 17 July 2012] Available from http://info.zoomerang.com.
Marsh HW. 2007. Students’ evaluations of university teaching:
Dimensionality, reliability, validity, potential biases and usefulness.
In: Perry RP, Smart JC, editors. The scholarship of teaching and learning
in higher education: An evidence-based perspective. Dordrecht, the
Netherlands: Springer. pp 319–383.
McGee DE, Lowell N. 2003. Psychometric properties of student ratings of
instruction in online and on-campus courses. In: Sorenson DL, Johnson
TD, editors. Online student ratings of instruction (New Directions for
Teaching and Learning, no. 96). San Francisco, CA: Jossey-Bass.
pp 39–48.
Morrison R. 2011. A comparison of online versus traditional student end-ofcourse critiques in resident courses. Assess Eval Higher Educ
36(6):627–641.
Netemeyer RG, Bearden WO, Sharma S. 2003. Scaling procedures.
Thousand Oaks, CA: Sage.
Oliver RL, Sautter EP. 2005. Using course management systems to
enhance the value of student evaluations of teaching. J Educ Bus
80(4):231–234.
Ory JC. 2001. Faculty thoughts and concerns about student ratings.
In: Lewis KG, editor. Techniques and strategies for interpreting student
evaluations (Special issue) (New Directions for Teaching and Learning,
No. 87). San Francisco, CA: Jossey-Bass. pp 3–15.
Ory JC, Ryan K. 2001. How do student ratings measure up to a new validity
framework?. In: Theall M, Abrami PC, Mets LA, editors. The student
ratings debate: Are they valid? How can we best use them? (Special
issue) (New Directions for Institutional Research, 109). San Francisco,
CA: Jossey-Bass. pp 27–44.
Peer E, Gamliel E. 2011. Too reliable to be true? Response bias as a
potential source of inflation in paper and pencil questionnaire
reliability. Practical Assess Res Eval 16(9):1–8. Available from http://
pareonline.net/getvn.asp?v=16%n=9.
Perrett JJ. 2011. Exploring graduate and undergraduate course evaluations
administered on paper and online: A case study. Assess Eval Higher
Educ 1–9, DOI: 10.1080/02602938.2011.604123.
Ravelli B. 2000. Anonymous online teaching assessments: Preliminary
findings. [Accessed 12 June 2012] Available from http://www.edrs.com/
DocLibrary/0201/ED445069.pdf.
Seldin P. 1999. Current practices – good and bad – nationally. In: Seldin P &
Associates Changing practices in evaluating teaching: A practical guide
to improved faculty performance and promotion/tenure decisions.
Bolton, MA: Anker. 1–24.
Seldin P. 2006. Building a successful evaluation program. In: Seldin P &
Associates Evaluating faculty performance: A practical guide to
assessing teaching, research, and service. Bolton, MA: Anker 1–19.
Seldin P, Associates, editors. 2006. Evaluating faculty performance: A
practical guide to assessing teaching, research, and service. Bolton, MA:
Anker. pp 201–216.
Sorenson DL, Johnson TD, editors. 2003. Online student ratings of
instruction (New Directions for Teaching and Learning, no. 96).
San Francisco, CA: Jossey-Bass.
Spooner F, Jordan L, Algozzine B, Spooner M. 1999. Student ratings of
instruction in distance learning and on-campus classes. J Educ Res
92:132–140.
Stehle S, Spinath B, Kadmon M. 2012. Measuring teaching effectiveness:
Correspondence between students’ evaluations of teaching and
different measures of student learning. Res Higher Educ. DOI:
10.1007/s11162-012-9260-9.
Stowell JR, Addison WE, Smith JL. 2012. Comparison of online and
classroom-based student evaluations of instruction. Assess Eval Higher
Educ 37(4):465–473.
Strategy Group. 2011. National strategy for higher education to 2030
(Report of the Strategy Group). Dublin, Ireland: Department of
Education and Skills, Government Publications Office. [Accessed 17
July
2012]
Available
from
http://www.hea.ie/files/files/
DES_Higher_Ed_Main_Report.pdf.
Streiner DL, Norman GR. 2008. Health measurement scales: A practical
guide to their development and use. 4th ed. New York: Oxford
University Press.
Surgenor PWG. 2011. Obstacles and opportunities: Addressing the
growing pains of summative student evaluation of teaching. Assess
Eval Higher Educ 1–14, iFirst Article. DOI: 10.1080/
02602938.2011.635247.
Svinicki M, McKeachie WJ. 2011. McKeachie’s teaching tips: Strategies,
research, and theory for college and university teachers. 13th ed.
Belmont, CA: Wadsworth.
Theall M, Feldman KA. 2007. Commentary and update on Feldman’s (1997)
‘‘Identifying exemplary teachers and teaching: Evidence from student
ratings’’. In: Perry RP, Smart JC, editors. The teaching and learning in
higher education: An evidence-based perspective. Dordrecht, the
Netherlands: Springer. pp 130–143.
Theall M, Franklin JL. 1990. Student ratings in the context of
complex evaluation systems. In: Theall M, Franklin JL, editors.
Student ratings of instruction: Issues for improving practice (New
Directions for Teaching and Learning, no. 43). San Francisco, CA:
Jossey-Bass. pp 17–34.
Theall M, Franklin JL. 2000. Creating responsive student ratings systems to
improve evaluation practice. In: Ryan KE, editor. Evaluating teaching in
higher education: A vision for the future (Special issue) (New Directions
for Teaching and Learning, no. 83). San Francisco, CA: Jossey-Bass.
pp 95–107.
Theall M, Franklin JL. 2001. Looking for bias in all the wrong places:
A search for truth or a witch hunt in student ratings of instruction?.
In: Theall M, Abrami PC, Mets LA, editors. The student ratings
debate: Are they valid? How can we best use them? (New Directions
for Institutional Research, no. 109). San Francisco, CA: Jossey-Bass.
pp 45–56.
US Equal Employment Opportunity Commission (EEOC). 2010. Employment
tests and selection procedures. [Accessed 20 August 2012] Available from
http://www.eeoc.gov/policy/docs/factemployment_procedures.html.
25
R. A. Berk
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13
For personal use only.
Venette S, Sellnow D, McIntire K. 2010. Charting new territory: Assessing
the online frontier of student ratings of instruction. Assess Eval Higher
Educ 35:101–115.
Waschull SB. 2001. The online delivery of psychology courses: Attrition,
performance, and evaluation. Comput Teach 28:143–147.
Wright KB. 2005. Researching internet-based populations: Advantages and
disadvantages of online survey research, online questionnaire
26
View publication stats
authoring software packages, and web survey services. J Comput
Mediated Commun 10(3). Available from http://jcmc.indiana.edu/
vol10/issue3/wright.html.
Yarbrough DB, Shulha LM, Hopson RK, Caruthers FA. 2011. The
program evaluation standards: A guide for evaluators and evaluation
users. 3rd ed. Thousand Oaks, CA: Sage.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/268522696
Views from below: Students’ perceptions of teaching practice evaluations and
stakeholder roles
Article in Perspectives in Education · December 2013
CITATIONS
READS
6
53
1 author:
Lungi Sosibo
Cape Peninsula University of Technology
24 PUBLICATIONS 45 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
NRF Project on Maths and Science View project
Inequalities in Education View project
All content following this page was uploaded by Lungi Sosibo on 05 May 2018.
The user has requested enhancement of the downloaded file.
View publication stats
College Teaching
ISSN: 8756-7555 (Print) 1930-8299 (Online) Journal homepage: https://www.tandfonline.com/loi/vcol20
Predicting Student Achievement in UniversityLevel Business and Economics Classes: Peer
Observation of Classroom Instruction and Student
Ratings of Teaching Effectiveness
Craig S. Galbraith & Gregory B. Merrill
To cite this article: Craig S. Galbraith & Gregory B. Merrill (2012) Predicting Student Achievement
in University-Level Business and Economics Classes: Peer Observation of Classroom
Instruction and Student Ratings of Teaching Effectiveness, College Teaching, 60:2, 48-55, DOI:
10.1080/87567555.2011.627896
To link to this article: https://doi.org/10.1080/87567555.2011.627896
Published online: 04 Apr 2012.
Submit your article to this journal
Article views: 426
View related articles
Citing articles: 3 View citing articles
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=vcol20
COLLEGE TEACHING, 60: 48–55, 2012
C Taylor & Francis Group, LLC
Copyright 
ISSN: 8756-7555 print / 1930-8299 online
DOI: 10.1080/87567555.2011.627896
Predicting Student Achievement in University-Level
Business and Economics Classes: Peer Observation
of Classroom Instruction and Student Ratings of
Teaching Effectiveness
Craig S. Galbraith
University of North Carolina Wilmington
Gregory B. Merrill
St Mary’s College of California
We examine the validity of peer observation of classroom instruction for purposes of faculty
evaluation. Using both a multi-section course sample and a sample of different courses across
a university’s School of Business and Economics we find that the results of annual classroom
observations of faculty teaching are significantly and positively correlated with student learning
outcome assessment measures. This finding supports the validity of classroom observation as
an assessment of teaching effectiveness. The research also indicates that student ratings of
teaching effectiveness (SETEs) were less effective at measuring student learning than annual
classroom observations by peers.
There is no question that teaching effectiveness is a very
personal, highly complex, and ever changing process involving a multitude of different skills and techniques. Teaching
is also part of the mission of every institution of higher learning, although certainly the weightings between teaching and
other components, such as scholarship, service, and regional
engagement, may vary between individual campuses. As the
primary institutional service providers for the core mission of
teaching, the faculty’s teaching effectiveness must be evaluated for various personnel decisions, such as promotion,
tenure, and retention. Today, most universities systematically
use a combination of peer evaluations and student ratings.
The notion of peer evaluations has evolved significantly
since the 1980s and now represents a relatively broad definition that includes both direct classroom observation and
review of a faculty member’s teaching portfolio of syllabi,
exam samples, and possibly other data points, such as statements of teaching philosophy and reflective reactions to student feedback. As Yon, Burnap, and Kohut (2002) observe,
Correspondence should be sent to Craig S. Galbraith, University of North
Carolina Wilmington, Cameron School of Business, 601 South College
Road, Wilmington, NC 28403, USA. E-mail: galbraithc@uncw.edu
“the expanding use of peers in the evaluation of teaching
is part of a larger trend in postsecondary education toward
a more systematic assessment of classroom performance”
(104). In fact, there now exists a broad normative literature describing the underlying theories, proposed protocols,
and content breadth of comprehensive peer evaluations (e.g.,
Centra 1993; Cavanagh 1996; Malik 1996; Hutchings 1996
1998: Bernstein and Edwards 2001; Bernstein et al. 2006;
Arreola 2007; Chism 2007; Bernstein 2008).
Of all the elements in a typical university peer evaluation
process, direct classroom observation continues to be one
of the more controversial. Not only can issues of peer bias,
observer training, and classroom intrusion be raised (e.g.,
Cohen and McKeachie 1980; DeZure 1999; Yon et al.
2002; Costello et al. 2001; Arreola 2007; Courneya, Pratt,
and Collins 2007), but there remains a fundamental debate
whether classroom observation is most valid for formative
purposes in assisting faculty to improve their teaching effectiveness or for evaluative purposes in providing university
administrators and faculty colleagues useful data for personnel decisions (e.g, Cohen and McKeachie; 1980; Centra
1993; Shortland 2004; Peel 2005; Chism 2007). In practice,
most universities that use classroom observation for evaluative purposes generally use observations from a single class
PREDICTING STUDENT ACHIEVEMENT
“visit,” and that assessment is then assumed to reflect an evaluation of the faculty member’s overall teaching ability during
that particular time period, or at least until another “visit” is
conducted. Despite the intermittent nature of some aspects of
peer evaluations, surveys tend to support the argument that
at least the faculty themselves believe that peer evaluations
can be an effective measure of teaching effectiveness (e.g.,
Peterson 2000; Yon et al. 2002; Kohut, Burnap, and Yon
2007).
While faculty may believe that peer evaluations and classroom observations, if done properly, are valid measures of
teaching effectiveness, it is difficult to draw any conclusion
at all from empirical validity studies of peer evaluations and
classroom observation. As with any instructional related metric used for faculty personnel decisions, the argument for, or
against, the validity of peer evaluations should be based upon
convincing evidence that they indeed measure teaching effectiveness or student learning. As Cohen and McKeachie
(1980) succinctly noted early in this debate, “clearly what
is needed are studies demonstrating the validity of colleague
ratings against other criteria of teaching effectiveness. One
possibility would be to relate colleague ratings to student
achievement” (149).
In spite of Cohen and McKeachie’s (1980) call for more
empirical research tied to student achievement, this type of
validity testing for peer evaluations or classroom observation
has simply not yet occurred. To date, almost all the empirical arguments for, or against, the validity of peer evaluations
and classroom observation are based upon their correlations
with student evaluations of teaching effectiveness (SETEs)
or some other purported measure of teaching excellence,
such as teaching awards. Feldman’s (1989b) meta-analysis
of these types of studies, for example, reports a mean correlation between peer evaluations and student ratings of 0.55,
with correlations ranging from 0.19 to 0.84. Empirical results since Feldman’s meta-analysis report similar correlations (e.g., Kremer 1990; Centra 1994). In general, higher
correlations with SETEs are found when peers examined a
complete teaching portfolio, and therefore may have been
influenced by student evaluations included in the portfolio,
while the lower correlations were from studies involving primarily classroom observations (Burns 1998).
While this line of research is interesting, given the fact that
SETEs themselves are often challenged as valid measures of
teaching effectiveness, studies that correlate peer evaluations
with SETEs simply provide little or no insight regarding
the validly of peer evaluations and classroom observations
for purposes of assessing faculty teaching effectiveness. The
validity of SETEs as a measure of teaching effectiveness has
been challenged for a number of reasons.
First, early research indicates only a moderate amount
of statistical variation in independent and objective measures of teaching effectiveness are explained by SETE
scores—depending on the meta-analysis study, between
about 4% and 20% for the typical “global” item on SETE in-
49
struments (Cohen, P. 1981, 1982, 1983; Costin 1987; Dowel
and Neal 1982, 1983; McCallum 1984; Feldman 1989a,
2007)—with many of the studies finding validity within the
“weak” category of scale criterion validity suggested by Cohen, J. (1969, 1981)1.
Second, it has been noted that the vast majority of this
early SETE research relied upon data from introductory undergraduate college courses at research institutions taught by
teaching assistants (TAs) following a textbook or departmental created lesson plan. These types of TA taught introductory classes, however, only account for a small percentage
of a university’s total course offerings, and may not be at
all representative for non-doctorate granting colleges. In addition, as Taylor (2007) notes, it is in the more advanced
core, elective, and graduate courses where faculty members
have the greatest flexibility over pedagogical style, course
content, and assessment criteria—the factors most likely to
drive classroom learning. In fact, recent empirical research
has indicated a possible negative, or negatively bi-modal, relationship between SETEs and student achievement in more
advanced university courses (Carrell and West, 2010; Galbraith, Merrill, and Kline, 2011).
Third, in the past two decades a number of articles have
appeared that specifically challenge various validity related
aspects of SETEs (e.g., Balam and Shannon, 2010; Campbell
and Bozeman 2008; Davies et al. 2007; Emery, Kramer, and
Tian 2003; Pounder 2007; Langbein 2008; Carrell and West
2010). These include arguments that student perceptions of
teaching are notoriously subject to various types of manipulation, such as the often debated “grading leniency” hypothesis, or even giving treats such as “chocolate candy” prior
to the evaluation (e.g., Blackhart et al. 2006; Bowling 2008;
Boysen 2008; Felton, Mitchell, and Stinson 2004; Youmans
and Jee 2007). Other research has demonstrated that student
ratings are influenced by race, gender, and cultural biases
as well as various “likability and popularity” attributes of
the instructor, such as physical looks and “sexiness” (e.g.,
Abrami, Levanthal, and Perry 1982; Ambady and Rosenthal
1993; Anderson and Smith 2005; Atamian and Ganguli 1993;
Buck and Tiene 1989; Davies et al. 2007; Felton, Mitchell,
and Stinson 2004; McNatt, 2010; Naftulin, Ware, and Donnelly 1973; Riniolo et al. 2006; Smith 2007; Steward and
Phelps 2000).
The lack of empirical studies linking classroom observations by peers to student achievement combined with the continuing questions surrounding the overall validity of SETEs
as an indicator of teaching effectiveness clearly underlines
the need for continued research as to how faculty members
J. (1969, 1981) refers to r = 0.10 (1.0% variance explained) as
a small effect, r = 0.30 (9.0% variance explained) as a medium effect, and
r = 0.50 (25.0% variance explained) as a large effect. Many researchers
have used an r<0.30 (less than 9% variance explained) to signify a “small”
effect for purposes of testing scale validity (e.g., Barrett et al. 2009; Hon et
al., 2010; Varni et al. 2001; Whitfield et al. 2006).
1Cohen,
50
GALBRAITH AND MERRILL
are evaluated. In this study, we directly examine issues surrounding the validity of peer classroom observations in relationship to student learning. Our analysis differs from previous empirical efforts in several respects. First, we investigate
the validity of classroom peer observations by using standardized learning outcome measures set by an institutional
process rather than simply correlating peer evaluations with
SETEs. Second, our sample of advanced but required core
undergraduate and graduate courses represents a mid-range
of content control by individual instructors. Third, we compare the explanatory power of peer evaluation ratings with
SETEs, and fourth, we have both part-time instructors and
full-time faculty members in our sample. This allows for a
test regarding the possible impact of independence in selecting which “peers” observe a faculty member’s classroom
instruction.
Data
The data come from courses taught by thirty-four different
faculty at a “School of Business and Economics” for a private
university located in a large urban region. Classes are offered at both the undergraduate and graduate (masters) level.
Similar to many urban universities, a number of adjunct or
part-time instructors are used to teach courses. Some of the
adjunct instructors hold terminal degrees, however, and are
associated with other colleges in the region. Those part-time
instructors not holding terminal degrees would be considered
“professionally qualified” under standards set by the Association to Advance Collegiate Schools of Business (AACSB).
The university would be classified as a non-research intensive
institution offering masters degrees in the Carnegie Foundation classification, with a mission that is clearly “teaching”
in orientation.
Courses in the sample include the disciplines of marketing, management, leadership, finance, accounting, statistics,
and economics, with 48% of the sample being graduate
courses. Sixty percent of the sample courses were taught
by full-time instructors. Average class size is 16 students.
Measures
Teaching effectiveness—Achievement of student
learning outcomes (ACHIEVE)
Encouraged by the guidelines of various accrediting agencies, the School has used course learning outcomes for several years. Learning outcomes are established by a faculty
committee for each core and concentration required course
within the School. There is an average of six to ten learning outcomes per course, and these learning outcomes are
specifically identified in the syllabus of each course.
Recently the School has invested substantial time and
resources in revising and quantifying its learning outcome
assessment process. Quantified assessment of course learning outcome attainment by students is measured by a stan-
dardized student learning outcome test. The School’s student
learning outcome exams are developed individually for each
course in each program by a committee of content experts in
the subject area, with four questions designed to assess each
of the stated learning outcomes. Student outcome exams go
beyond simple final exam questions in that they are institutionally agreed upon and formally tied to programmatic objectives. Approximately one-third of the School’s core and
required courses are given student learning outcome exams
at the present time.
Student outcome exams are administered to every student
in every section of the course being assessed, regardless of
instructor and delivery mode. The same student outcome
assessment is used for all sections of the same course, and
instructors are not allowed to alter the questions. Student
learning outcome exams are given at the end of the course
period. Since there are four questions per learning outcome,
the exams are all scored on a basis of zero (0) to four (4)
points per learning outcome. In the present study, for the
ACHIEVE score we use the mean score of all the learning
outcome questions on that particular course exam.
Depending upon the course and learning outcome, student learning outcome exams consist of multiple choice, or
occasionally, short essay questions. For the short essay questions a grading rubric is created so that there is consistency
in scoring across all sections of a particular course. Less than
10% of all student learning outcome exam content is essay,
with over 90% multiple choice. Exam design and rigor is
specifically modeled after professional certification exams
such as the Certified Public Accounting (CPA) exam. This
type of assessment data directly related to carefully articulated “course learning outcomes” is exactly what McKeachie
(1979) referred to when he noted, “we take teaching effectiveness to be the degree to which one has facilitated student
achievement of education goals” (McKeachie 1979, 385).
Although the student learning outcome exam questions
are designed to be the same format, the same level of difficulty, and all scored on a 0 to 4 scale, for the full crosssectional sample the ACHIEVE measure does comes from
assessments for different courses using different questions.
For our full sample analysis we therefore dichotomize the
student learning data based upon the median (low student
achievement v. high student achievement). Dichotomizing
the outcome variable using the median is common in these
situations when outcome data come from a relatively small
cross-sectional sample, and there is not sufficient sample
size to accurately calculate multiple means to normalize the
outcome data across the different categories in the sample (e.g., Bolotin 2006; Baarveld, Kollen, Groenier 2007;
Mazumdar and Glassman 2000; Muennig, Sohler, and Mahato 2007; Muthén and Speckart 1983). In addition, the implied binary benchmarking of teaching effectiveness for faculty across different departments is common in practice. Most
obvious are tenure and promotion decisions (yes or no) for
full-time faculty at smaller teaching-driven colleges, annual
PREDICTING STUDENT ACHIEVEMENT
contract renewals for non-tenure track full-time and part-time
teaching lecturers, faculty teaching award nominations, and
the formal use of binary assessment metrics of faculty teaching effectiveness by some institutions (e.g., Glazerman et al.,
2010). Not surprisingly, faculty themselves often tend to informally categorize colleagues (or themselves) as effective or
“good” teachers versus being less effective in the classroom
(e.g., Fetterley 2005; Sutkin et al. 2008). However, when examining multiple sections of a single course using exactly
the same set of learning outcome questions, we use the raw
ACHIEVE score in our analysis.
Classroom observation (PEER)
In our sample, faculty members are required to undergo
one classroom observation per year. The classroom observation procedure is typical to many universities—the peer
observer reviews the syllabus and arranges the time to visit
the class. Although faculty training in peer evaluation processes is often recommended (e.g., Cohen and McKeachie
1980; Bernstein et al. 2006, Chism 2007), in our sample faculty peer classroom observers were not provided any specific
training in observation techniques. While some universities
suggest multiple observers, our sample university required
only one observer per instructor.
An “evaluation” form is used where the classroom observer checks/scores various questions related to ten different
categories of teaching: class meets stated outcomes, level of
student understanding, enthusiasm for teaching, sensitivity to
student needs, giving clear explanations, use of instructional
material, teaching methods and pacing, knowledge of subject, clarity of syllabus, and the course’s assessment process.
The last two categories are from review of the syllabus. Comments can also be added. After reading the submitted written
observation form, the senior administrator gave a “class observation” score between “1” (low) and “7” (high) based upon
the scoring and information in the form. For this study we
used this numerical score. In our sample, the numerical classroom peer observation score ranged between “3” and “7”.
There was an important difference in the classroom observation process for part-time faculty versus full-time faculty.
Full-time faculty could generally request which colleague
observed his or her class, with an obvious possible bias toward requesting friends or colleagues who might provide
more favorable comments. In contrast, for part-time faculty
the classroom observer was appointed by the department
chairperson rather than requested by the instructor.
It should be noted that there are certainly other possible
differences between full-time faculty and part-time faculty,
such as tenure status, types of courses taught and terminal
degree education. However, in our sample we feel that the
most likely explanation for any differences between the ability of the peer evaluation ratings of part-time versus full-time
faculty to explain student achievement would come from differences in the “peer” selection process; that is, controlling
for the “peer selection bias” commonly mentioned in the
51
literature. In fact, no apparent bias in the peer evaluation
score was noted for the part-time faculty across a number
of variables. For example, there was no significant difference in the mean peer evaluation score between part-time
faculty with terminal degrees versus those without terminal
degrees. Similarly, although full-time faculty taught a greater
percentage of graduate classes versus part-time faculty there
was no significant difference in the peer evaluation score for
part-time faculty that taught graduate classes versus undergraduate classes.
Student perception of teaching effectiveness
(SETE)
As with most universities, student course evaluations are
based upon multiple item forms that gather student perceptions, with several questions directly related to perceptions of
the instructor’s skill. We used the comprehensively worded
“global” item (SETE Global Instructor) asking students to
rate the instructor with the wording, “overall, I rate the
instructor of this course an excellent teacher.” Most SETE
scales use such a final “global” question, and from the authors’ experience it is this question that tends to hold the
most weight in faculty performance evaluations.
Control variables
As control variables we used class size, whether the course
was a graduate course, and delivery method (on-site versus
distance). Class size appears to be a particularly important
control variable. Zietz and Cochran (1997) found a negative
relationship between class size and test results, while Lopus
and Maxwell (1995) found a positive relationship in business
related classes. Pascarella and Terenzini (2005) argue that the
connection is still unknown. A more recent large-scale study
of science classes by Johnson (2010) indicates that while
class size negatively impacts student learning (as measured
by grades), the impact diminishes as class size increases.
ANALYSIS
We model the analysis close to the actual practice in universities. In our sample we have ACHIEVE, SETE, and the
control variables for forty-six classes taught by thirty-four
faculty within a one-year period. We use the faculty member’s annual classroom observation scores (PEER) from a
single “face-to-face” classroom visit that is closest to the
one-year period of our class-specific data2. The other important component of assessing teaching effectiveness in practice would be the collection of student ratings for the various
classes during the time period.
2We only have the numeral score for a faculty’s peer evaluation, not the
specific class it came from.
52
GALBRAITH AND MERRILL
Within our sample, the bivariate correlation between
PEER and SETE is 0.43. This directly compares with
Feldman’s (1989b) meta-analysis report of a mean correlation between peer reviews and student evaluations of 0.55.
Since research indicates that ratings from direct class observation have somewhat lower correlations with SETEs than
for broad peer evaluations (e.g, Burns 1998), the 0.43 correlation between PEER and SETE suggests our sample is
probably representative.
Ideally, the best test of validity would involve multiple
sections of the same course, taught by different instructors,
using a common measurement of student performance. This
has been noted by several authors. For example, in their
discussion of the need to establish peer evaluation validity,
Cohen and McKeachie (1980) write, “this would require a
multi-section course with a standard post-term achievement
measures, such an endeavor would prove valuable for assessing the validity of colleague ratings” (149). In our sample,
one course (a graduate finance class) had a sufficient number of different instructors (N = 5) to calculate a correlation
between the faculty member’s annual classroom observation
score (PEER) and student learning outcomes (ACHIEVE)3.
All five of the instructors were full-time faculty. Since the
student learning outcome exam for this particular finance
course was the same across all sections, we could use the
raw scores in this analysis. The bivariate correlation between
PEER and ACHIEVE for this particular multi-section course
was 0.675 (p < 0.10, one-tailed), statistically significant and
in the expected direction in spite of the very small sample
size. On the other hand, for this one multi-section course
analysis, the bivariate correlation between student evaluation of teaching (SETE) and ACHIEVE was only 0.289; a
positive relationship but not statistically significant. In fact,
the amount of variance (8.26%) in student achievement explained by SETE in our analysis is very similar to many
of the SETE validity studies reviewed by Feldman (1989a,
2007) and falls within the “weak” category of scale criterion
validity suggested by Cohen, J. (1969, 1981). On the other
hand, the faculty member’s annual course observation score
(PEER) explains 45.6% of the variance in student achievement in this sample, and therefore falls within the “strong”
category of scale criterion validity. Thus, within this well controlled, albeit small, multi-section case, the faculty member’s
annual classroom observation score explained a much higher
percentage of student achievement than their SETEs. Given
the small sample size these results should certainly be interpreted cautiously, however, it should be noted that Feldman’s
(1989a, 2007) often cited meta-analyses of SETE validity
also includes research with only five instructors/sections in
their multi-section samples.
3The next largest multi-section course in our sample had only three
different instructors, and they were a combination of part-time and full-time
faculty.
TABLE 1
Binary Logistic Regression Analysis—Explaining
Student Achievement (ACHIEVE)
Variables
Constant
Class Size
Online Class
Graduate Class
SETE
PEER
Nagelkerke R2
Cox and Snell R2
N
Pooled-Sample
Regression
Full-Time Faculty
Regression
Part-Time
Faculty
Regression
−7.738
−0.117∗∗
0.874
2.610∗∗∗
1.178∗
0.598∗
0.348
0.258
46
−8.493
−0.122∗
1.771∗
3.428∗∗
1.351
0.435
0.487
0.363
28
−13.847
−0.077
−0.120
2.214∗
1.139
1.810∗∗
0.374
0.276
18
Note: ∗∗∗ p < 0.01; ∗∗ p < 0.05; ∗ p < 0.10
We are also interested in comparing the relationship between the two measures commonly used to evaluate a faculty
member’s teaching effectiveness (SETEs and PEER) and our
independent measure of student achievement (ACHIEVE)
across the full range of courses. This is important since most
universities compare, either directly or indirectly, a faculty
member’s teaching evaluation assessments with other faculty members across departments and schools during annual
review, tenure, and promotion decision discussions.
For this analysis, ACHIEVE was the dependent variable, while PEER, SETE, and the control variables were
independent variables. As previously discussed, for this
cross-sectional analysis we used the bivariate measure of
ACHIEVE, “high student achievement” and “low student
achievement”—the appropriate regression technique is therefore logistic regression. We estimate binary logistic regression models for the full pooled sample, and both the full-time
and part-time faculty sub-samples. Table 1 reports the results
of this analysis.
With respect to the control variables, graduate classes and
smaller classes clearly tend to have higher levels of student
achievement. The graduate class variable was positive, and
statistically significant in all three models—the pooled sample, and both the full-time and part-time faculty sub-samples.
Class size, while negative in all three regressions was statistically significant in both in the pooled sample and the full-time
faculty sample. The on-line class variable had opposite signs
between the estimated regression models, and was statistically significant only in the full-time faculty sample. Overall,
all the regression models were statistically significant, and
had reasonably high R2s.
Of interest to our research are the two “teaching effectiveness” evaluation metrics: student evaluations of teaching
(SETE) and the faculty member’s annual classroom peer
observations (PEER). Both metrics were positive in all three
equations. The SETE variable, however, was statistically significant only for the pooled sample. The PEER variable was
also statistically significant in the pooled sample regression.
PREDICTING STUDENT ACHIEVEMENT
Most interesting are the results for the two sub-samples
of full-time and part-time faculty. As previously discussed,
there was a significant difference in the way “peers” were
selected between these two groups, with the selection of
“peers” for part-time faculty more of an independent, “armslength” process. Given this important difference, the pooled
sample may be too heterogeneous across the PEER variable
and the model estimates therefore misleading.. Examining
the two sub-samples should provide additional insight. In
the full-time faculty sub-sample, the PEER variable, while
indicating a positive relationship, was not statistically significant. However, in the part-time faculty model, which has a
much stronger peer selection control process, the classroom
observation variable (PEER) was both positive and statistically significant. The SETE metric, while positive in both
equations, was not statistically significant in either.
DISCUSSION AND CONCLUSION
As the debate continues about which measures of teaching
effectiveness should be used to evaluate faculty for personnel
decisions, there is an increasing need for continued investigation into the validity of these different metrics. With respect
to student evaluations of teaching, McKeachie (1996) succinctly summarized the problem, “If student ratings are part
of the data used in personnel decisions, one must have convincing evidence that they add valid evidence of teaching
effectiveness” (McKeachie 1996, 3). The same can certainly
be said for faculty peer evaluations.
While there is a large body of empirical literature examining the validity of SETEs, the results of these studies are
open to vast differences in interpretation. To date, however,
the empirical basis for arguing for the validity of peer evaluations or classroom observations of teaching is based primarily
on studies that correlate peer evaluations with SETES. Unfortunately, there are few, if any, studies that examine the
relationship between peer evaluations and an actual, independent measure of student achievement, and then compare
the strength of this relationship with student ratings.
Our research represents an attempt to start filling this gap
in our knowledge. Given that major institutions of higher
learning around the world regularly employ both student
ratings and peer evaluations of teaching for faculty personnel decisions without knowing more about the true validity
of these two metrics in assessing teaching effectiveness is
somewhat surprising.
In our study we were able to examine the validity of one
important component of peer evaluations, the classroom observation, from two perspectives. Using a multi-section class
taught by different instructors, we compared the annual classroom observation ratings for faculty members against the results of an independent student learning outcome assessment
measure in courses those faculty members taught. Not only
did we find that the annual classroom observation metric was
53
significantly and positively correlated with student achievement, but that it was also a much better predictor of student
achievement than student ratings of teaching (SETEs) from
the classes. This is exactly the type of validity testing called
for by Cohen and McKeachie (1980).
We also wanted to examine the validity of both classroom
observation and SETEs in a manner which somewhat paralleled the way that most universities actually use such measures for personnel decisions, that is, across different instructors, courses and departments. Again, using the standardized
learning assessment, our analysis again offered two conclusions. First, a faculty member’s annual classroom observation
rating was positively related to student achievement, particularly when the process reflected a somewhat “arms-length”
selection of the actual observer. Second, under these conditions of stricter peer-selection control, a faculty member’s
annual classroom observation rating was more significantly
related to student achievement than the course SETEs. In addition, although not directly the focus of our study, we also
found evidence that class size was negatively related to student achievement, with smaller classes outperforming larger
classes on the average.
Our analysis supports the validity of university-level classroom observation by peers, particularly if done under relatively strict peer-selection controls. And it should be noted
that our peer evaluation process followed few of the complex
observation, training, feedback, and reporting protocols suggested by the rapidly expanding normative peer evaluation
literature—our reviewers were simply colleagues asked to
observe another’s class with a simple check-list.
Obviously there are limitations to our research. First, our
sample size was relatively small, particularly in our multisection analysis. While this should suggest caution in interpretation, this type of research will always struggle with sample size issues. Second, it would have been ideal if we could
have obtained the actual post-observation forms so that multiple, independent scorers could have provided the quantitative
ratings. This would have allowed for a test of inter-rater reliability. And finally, our data came from only one university,
albeit across different departments and disciplines.
Given these limitations, however, we feel that our results
are noteworthy, particularly since almost no published research has appeared that directly correlates classroom peer
observation results to an independent measure of student
achievement designed around agreed upon student learning outcomes. Although university accrediting bodies are
encouraging more assurance of learning outcome measurement, few universities at the present time are taking a standardized and quantifiable approach to assessing learning
outcomes across all disciplines that lend themselves to crossinstitutional analysis. We hope that as more and more outcome data do become standardized, quantified, and available
from different institutions that additional empirical analysis will continue to examine these fascinating, and highly
charged debates.
54
GALBRAITH AND MERRILL
REFERENCES
Abrami, P., L. Levanthal, & R. Perry. 1982. Educational seduction. Review
of Educational Research 52: 446–464.
Ambady, N., & Rosenthal, R. 1993. Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness.
Journal of Personality and Social Psychology 64: 431–441.
Anderson, K., & Smith, G. 2005. Students preconceptions of professors:
Benefits and barriers according to ethnicity and gender. Hispanic Journal
of Behavioral Sciences 27(2): 184–201.
Arreola, R. 2007. Developing a comprehensive faculty evaluation system.
3rd ed. Bolton, MA: Anker Publishing.
Atamian, R., & G. Ganguli. 1993. Teacher popularity and teaching effectiveness: Viewpoint of accounting students. Journal of Education for Business
68(3): 163–169.
Baarveld, F., B. Kollen, & K. Groenier 2007. Expertise in sports medicine
among family physicians: What are the benefits? The Open Sports
Medicine Journal 1: 1–4.
Balam, E., & D. Shannon. 2010. Student ratings of college teaching: A
comparison of faculty and their students. Assessment and Evaluation in
Higher Education 35(2): 209–221.
Barrett, J., K. Hart, J. Schmerier, K. Willmartch, J. Carey, & S. Mohammed.
2009. Criterion validity of the financial skills subscale of the direct assessment of functional status scale. Psychiatry Research 166(2/3): 148–
157.
Bernstein, D. 2008. Peer review and the evaluation of the intellectual work
of teaching. Change, March/April: 48–51.
Bernstein, D., A. Burnett, A. Goodburn, & P. Savory. 2006. Making teaching
and learning visible: Course portfolios and the peer review of teaching.
Bolton, MA: Anker Publishing.
Bernstein, D., & R. Edwards. 2001. We need objective, rigourous peer review
of teaching. Chronicle of Higher Education 47(17): B24.
Blackhart, G., B. Peruche, C. DeWall, & T. Joiner. 2006. Factors influencing
teaching evaluations in higher education. Teaching of Psychology 33:
37–39.
Bolotin, A. 2006. Fuzzy logic approach to robust regression of uncertain
medical categories. World Academy of Science, Engineering and Technology 22: 106–111.
Bowling, N. 2008. Does the relationship between student ratings of course
easiness and course quality vary across schools? The role of school academic rankings. Assessment and Evaluation in Higher Education 33(4):
455–464.
Boysen, G. 2008. Revenge and student evaluations of teaching. Teaching of
Psychology 35(3): 218–222.
Buck, S., & D. Tiene. 1989. The impact of physical attractiveness, gender,
and teaching philosophy on teacher evaluations. Journal of Educational
Research 82: 172–177.
Burns, C. 1998. Peer evaluation of teaching: Claims vs. research. University
of Arkansas, Little Rock, AK. http://eric.ed.gov/ERICWebPortal/search/
detailmini . jsp ? nfpb = trueand andERICExtSearch SearchValue 0 =
ED421470andERICExtSearch SearchType 0=noandaccno=ED421470
Campbell, J. and W. Bozeman. 2008. The value of student ratings: Perceptions of students, teachers, and administrators. Community College
Journal of Research and Practice 32(1): 13–24.
Carrell, S., & J. West. 2010. Does professor quality matter? Evidence from
random assignments of students to professors. Journal of Political Economy 118(3): 409–432.
Cavanagh, R. 1996. Formative and summative evaluation in the faculty
peer review of teaching. Innovative Higher Education 20(4): 235–
240.
Centra, J. 1993. Reflective faculty evaluation. San Francisco: Jossey-Bass.
Centra, J. 1994. The use of teaching portfolios and student evaluations for
summative evaluation. Journal of Higher Education 65: 555–570.
Chism, N. 2007. Peer review of teaching: A sourcebook. 2nd ed. Bolton,
MA: Anker Publishing.
Cohen, J. 1969. Statistical power analysis for the behavioural sciences, San
Diego, CA: Academic Press.
Cohen, J. 1981. Statistical power analysis for the behavioural sciences. 2nd
ed. Hillsdale, NJ: Lawrence Erlbaum Associates.
Cohen, P. 1981. Student ratings of instruction and student achievement.
Review of Educational Research 51(3): 281–309.
Cohen, P. 1982. Validity of student ratings in psychology courses: A research
synthesis. Teaching of Psychology 9(2): 78–82.
Cohen, P. 1983. Comment on a selective review of the validity of student
ratings of teaching. Journal of Higher Education 54(4): 448–458.
Cohen, P., & W. McKeachie. 1980. The role of colleagues in the evaluation
of college teaching. Improving college and university teaching 28(4):
147–154.
Costello, J., B. Pateman, H. Pusey, & K. Longshaw. 2001. Peer review
of classroom teaching: An interim report. Nurse Education Today 21:
444–454.
Costin, P. 1987 Do student ratings of college teachers predict student
achievement? Teaching of Psychology 5(2): 86–88.
Courneya, C., D. Pratt, & J. Collins. 2007. Through what perspective do
we judge the teaching of peers? Teaching and Teacher Education 24: 69–
79.
Davies, M., J. Hirschberg, J. Lye, & C. Johnston. 2007. Systematic influences
on teaching evaluations: The Case for Caution. Australian Economic
Papers 46(1): 18–38.
DeZure, D. 1999. Evaluating teaching through peer classroom observation.
In Changing practices in evaluating teaching, ed. P. Seldin. Bolton, MA:
Anker Publishing.
Dowel, D., & J. Neal. 1982. A selective review of the validity of student
ratings of teaching. Journal of Higher Education 32(1): 51–62.
Dowell, D., & J. Neal. 1983. The validity and accuracy of student ratings
of instruction: A reply to Peter A. Cohen. Journal of Higher Education
54(4): 459–463.
Emery, C., T. Kramer, & R. Tian. 2003. Return to academic standards: A critique of student evaluations of teaching effectiveness. Quality Assurance
in Education 11(1): 37–46.
Feldman, K. 1989a. The association between student ratings of specific
instructional dimensions and student achievement: Refining and extending the synthesis of data from multisection validity studies. Research in
Higher Education 30(6): 583–645.
Feldman, K. 1989b. Instructional effectiveness of college teachers as judged
by teachers themselves, current and former students, colleagues, administrators, and external (neutral) observers. Research in Higher Education
30(2): 137–194.
Feldman, K 2007. Identifying exemplary teachers and teaching: Evidence
from student ratings. In The scholarship of teaching and learning in
higher education: An evidence-based perspective, eds. R. Perry and J.
Smart, 93–129. Dordrecht, The Netherlands: Springer.
Felton, J., J. Mitchell, & J. Stinson. 2004. Web-based student evaluations
of professors: The relations between perceived quality, easiness and sexiness. Assessment and Evaluation in Higher Education, 29(1): 91–108.
Fetterley, J. 2005. Teaching and “my work”. American Literary History
17(4): 741–752.
Galbraith, C., G. Merrill, & D. Kline. 2011. Are student evaluations of
teaching effectiveness valid for measuring student learning outcomes
in business related classes? A neural network and Bayesian analysis. Research in Higher Education: 1–22. http://www.springerlink.com/
content/2058756205016652.
Glazerman, S. S., Loeb, D., Goldhaber, S., Raudenbush, D., Staiger, G., &
Whitehurst, G. 2010. Evaluating teachers: The important role of valueadded. Palo Alto, CA: Center for Educational Policy Analysis, Stanford
University.
Hon, J., K. Lagden, A. McLaren, D. O’Sullivan, L. Orr, P. Houghton, & M.
Woodbury. 2010. A prospective multicenter study to validate use of the
PUSH© in patients with diabetic, venous, and pressure ulcers. Ostomy
Wound Management 56(2): 26–36.
PREDICTING STUDENT ACHIEVEMENT
Hutchings, P. 1996. The peer review of teaching: Progress, issues and
prospects. Innovative Higher Education 20(4): 221–234.
Hutchings, P., ed. 1998. The course portfolio. Sterling, VA: Stylus.
Johnson, I. 2010. Class size and student performance at a public research
university: A cross-classified model. Research in Higher Education.
http://www.springerlink.com/content/0l35t1821172j857/fulltext.pdf
Kremer, J 1990. Construct validity of multiple measures in teaching, research, and service and reliability of peer ratings. Journal of Educational
Psychology 82: 213–218.
Kohut, G., C. Burnap, & M. Yon. 2007. Peer observation of teaching: Perceptions of the observer and the observed. College Teaching 55(1): 19–25.
Langbein, L. 2008. Management by results: Student evaluation of faculty
teaching and the mis-measurement of performance. Economics of Education Review 27(4): 417–428.
Lopus, J., & N. Maxwell. 1995. Should we teach microeconomic principles
before macroeconomic principles? Economic Inquiry 33(2): 336–350.
Malik, D. 1996. Peer review of teaching: External review of course content.
Innovative Higher Education. 20(4): 277–286.
Mazumdar, M., & R. Glassman. 2000. Categorizing a prognostic variable: Review of methods, code for easy implementation and applications to decision-making about cancer treatments. Statistics Medicine 19:
113–132
McCallum, L. 1984. A meta-analysis of course evaluation data and its use
in the tenure decision. Research in Higher Education 21: 150–158.
McKeachie, W. 1979. Student ratings of faculty: A reprise. Academe 65(6):
384–397.
McKeachie, W. 1996. Student ratings of teaching. Occasional Paper No.
33. American Council of Learned Societies, University of Michigan.
http://archives.acls.org/op/33 Professonal Evaluation of Teaching.htm
McNatt, B. 2010. Negative reputation and biases student evaluations of
teaching: Longitudinal results from a naturally occurring experiment.
Academy of Management Learning and Education 9(2): 225–242.
Muennig, P., N. Sohler, & B. Mahato. 2007. Scoioeconomic status as an independent predictor of physiological biomarkers of cardiovascular disease: Evidence from NHANES. Preventive Medicine.
http://www.sciencedirect.com.
Muthén, B., & G. Speckart. 1983. Categorizing skewed, limited dependent
variables. Evaluation Review 7(2): 257–269.
Naftulin, D., J. Ware, & F. Donnelly. 1973. The Doctor Fox lecture: A
paradigm of educational seduction. Journal of Medical Education 48:
630–635.
55
Pascarella, E., & P. Terenzini. 2005. How college affects students: A third
decade of research. San Francisco: Jossey-Bass
Peel, D. 2005. Peer observation as a transformatory tool? Teaching in Higher
Education 10(4): 489–504.
Peterson, K. 2000. Teacher evaluation: A comprehensive guide to new directions and practices. 2nd ed.. Thousand Oaks, CA: Corwin Press.
Pounder, J. 2007. Is student evaluation of teaching worthwhile? An analytical framework for answering the question. Quality Assurance in Education 15(2): 178–191.
Riniolo, T., K. Johnson, T. Sherman, & J. Misso. 2006. Hot or not: Do professors perceived as physically attractive receive higher student evaluations?
The Journal of General Psychology 133(1): 19–35.
Shortland, S 2004. Peer observation: A tool for staff development or compliance? Journal of Further and Higher Education 28: 219–227.
Smith, B. 2007. Student ratings of teaching effectiveness: An analysis of endof-course faculty evaluations. College Student Journal 471(4): 788–800.
Steward, R., & R. Phelps. 2000. Faculty of color and university students:
Rethinking the evaluation of faculty teaching. Journal of the Research
Association of Minority Professors 4(2): 49–56.
Sutkin, G., E. Wagner, I.Harris, & R. Schiffer. 2008. What makes a good clinical teacher in medicine? A review of the literature, Academic Medicine
83(5): 452–466.
Taylor, J. 2007. The teaching/research nexus: A model for institutional
management. Higher Education 54(6): 867–884.
Varni, J., M. Seid, & P. Kurtin. 2001. PedsQLTM4.0: Reliability and validity
of the Pediatric Quality of Life InventoryTMVersion 4.0 Generic Core
Scales in healthy and patient populations. Medical Care 39(8): 800–
812.
Whitfield, K., R. Buchbinder, L. Segal, & R. Osborne. 2006. Parsimonious and efficient assessment of health-related quality of life in
osteoarthritis research, validation of the Assessment of Quality of
Life (AQoL) instrument. Health and Quality of Life Outcomes 4(19).
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538577/#B19
Yon, M., C. Burnap, & G. Kohut. 2002. Evidence of effective teaching:
Perceptions of peer reviewers. College Teaching 50(3): 104–110.
Youmans, R., & B. Jee. 2007. Fudging the numbers: Distributing chocolate
influences student evaluations of an undergraduate course. Teaching of
Psychology 34(4): 245–247.
Zietz, J., & H. Cochran. 1997. Containing cost without sacrificing achievement: Some evidence from college-level economics classes. Journal of
Education Finance 23: 177–192.
1007859
research-article2021
JMTXXX10.1177/10570837211007859Journal of Music Teacher EducationHash
Article
Reliability and Construct
Validity of the edTPA for
Music Education
Journal of Music Teacher Education
1­–15
© National Association for
Music Education 2021
Article reuse guidelines:
sagepub.com/journals-permissions
https://doi.org/10.1177/10570837211007859
DOI: 10.1177/10570837211007859
jmte.sagepub.com
Phillip M. Hash1
Abstract
The purpose of this study was to examine the psychometric quality of Educative
Teacher Performance Assessment (edTPA) scores for 136 preservice music teachers
at a Midwest university. I addressed the factor structure of the edTPA for music
education, the extent to which the edTPA fits the one- and three-factor a priori
models proposed by the test authors, and the reliability of edTPA scores awarded
to music education students. Factor analysis did not support the a priori one-factor
model around teacher readiness, or the three-factor model based on the edTPA
tasks of Planning, Instruction, and Assessment. Internal consistency was acceptable
for all rubrics together and for the Instruction task. However, estimates of interrater
reliability fell substantially below those reported by test administrators. These findings
indicate the need for revision of the edTPA for music education and call into question
its continued use among music teacher candidates in its current form.
Keywords
assessment, edTPA, music teacher preparation, student teaching, teacher readiness
The Educative Teacher Performance Assessment (edTPA) is a portfolio-based subjectspecific project completed by preservice candidates during their clinical semester.
Educator preparation programs in 41 states and the District of Columbia currently
administer the edTPA and at least 19 states and the District of Columbia use the assessment for initial licensure. The Stanford Center for Assessment, Learning, and Equity
(SCALE) is the sole developer of the edTPA and Stanford University is the exclusive
owner. The university has licensed the Evaluation Systems group of Pearson to provide
1
Illinois State University, Normal, USA
Corresponding Author:
Phillip M. Hash, School of Music, Illinois State University, Campus Box 5660, Normal, IL 61790-5660, USA.
Email: pmhash@ilstu.edu
2
Journal of Music Teacher Education 00(0)
operational support for national administration of the assessment (Powell & Parkes,
2020; SCALE, 2019a).
Candidates completing the edTPA engage in three tasks: Planning, Instruction, and
Assessment. The complete portfolio consists of several artifacts including lesson
plans, instructional materials, assessments, written commentaries, teaching videos,
and student work samples, as dictated by separate handbooks for 28 content areas.
Music candidates follow the K–12 Performing Arts Assessment Handbook (SCALE,
2018a), which also includes theater and dance. SCALE (2013) states that the theoretical framework for the edTPA evolved from a three-step process that included the
following:
1.
2.
3.
Subject-specific expert design teams who provided content validity evidence
of the specific job-related competencies assessed within each subject area.
A job analysis study to confirm the degree to which the job requirements of a
teacher align to the edTPA.
A content validation committee to rate the importance, alignment, and representativeness of the knowledge and skills required for each edTPA rubric in
relation to national pedagogical and content-specific standards.
Among other requirements, candidates must attend to academic language demands in
all three tasks, which include teaching subject-specific vocabulary, engaging in a language function (e.g., analyze, describe, identify, create), and demonstrating the use of
syntax and/or discourse (e.g., speaking or writing) within the discipline (SCALE,
2018a).
According to SCALE (2019a), handbooks in all content areas share approximately
80% of their design. The other 20% contains key subject-specific components of
teaching and learning drawn from the content standards authored by national organizations. However, it is unclear how the edTPA relates to standards of the National
Association of Schools of Music (2020), the National Association for Music Education
(2014), or any other arts organization.
Candidates submit their portfolios to Pearson, who employs independent evaluators
to score the materials. Scoring for most portfolios involves 15 rubrics, five per task,
graded on a scale of one (novice not ready to teach) to five (highly accomplished
beginner). This process results in a possible maximum total score of 75. Evaluators
review specified artifacts and written commentary separately for each rubric rather
than considering all parts of the assessment together. Candidates not achieving the
minimum benchmark set by their state or institution can revise and resubmit one, two,
or all three tasks (Parkes & Powell, 2015; SCALE, 2018b).
The pool of edTPA scorers includes P–12 teachers and college faculty with pedagogical content knowledge and experience preparing novice teachers. They possess
discipline-specific expertise and score only those portfolios for which they are qualified. Although the performing arts include music, theater, and dance, only scorers with
knowledge and experience in music evaluate candidates in this discipline (Pearson,
personal communication, April 21, 2020).
Hash
3
Evaluators complete an extensive training program and must demonstrate their
ability to determine scores consistently and accurately (SCALE, 2019a). SCALE randomly selects 10% of portfolios for double scoring to maintain reliability. In addition,
portfolios scored within a defined range above and below the state-specific (currently
35–41) or SCALE-recommended (currently 42) cut score undergo a second and sometimes third review. In these cases, a scoring supervisor resolves instances where Scorer
1 and Scorer 2 (a) are more than 1 point apart on any rubric or (b) determine total
scores on opposite sides of the cut score. The supervisor also resolves cases where
both scorers fall above or below the cut score but have five or more adjacent rubric
scores (SCALE, 2019b).
Proponents claim that the edTPA provides an authentic means of assessing teacher
readiness by measuring candidates’ ability to create lesson plans, implement instruction, and assess student learning in an actual classroom environment. Supporters also
emphasize the assessment’s uniformity across disciplines and seemingly impartial
evaluation, as well as the potential for the edTPA to shape teacher education programs
and curricula. Some college faculty believe that the edTPA has fostered their professional growth, while cooperating teachers in K–12 school districts report that the
assessment provides guidance for them in mentoring candidates during the student
teaching semester (Darling-Hammond & Hyer, 2013; Pecheone & Whittaker, 2016;
Sato, 2014).
Critics cite concerns with ecological validity of the edTPA and state that candidates
might make instructional decisions to meet the requirements of the rubrics rather than
long-term student needs (Parkes & Powell, 2015). In addition, the two required video
excerpts (maximum 10 minutes each) could alter the teaching environment, create
privacy concerns, foster anxiety among candidates, and fail to capture nuanced student
interactions and other aspects of teaching stipulated in the rubrics (Bernard & McBride,
2020; Choppin & Meuwissen, 2017).
Behizadeh and Neely (2018) questioned the consequential validity of the edTPA in
relation to positive and negative social outcomes, especially in an urban teacher preparation program focused on social justice. Participants (N = 16) in this study, who were
mostly candidates of color and first-generation college students, stated that the edTPA
increased their mental and financial stress and lacked a social justice orientation in the
scoring procedures. They also felt pressure to select the highest achieving classes for
their lesson segment and to teach content that fulfilled scoring criteria, regardless of
students’ needs. Authors have also criticized the corporate control of the scoring process, the high cost for teacher candidates ($300 for initial submission), and the effect
of the edTPA on preparation program autonomy (e.g., Dover et al., 2015; Heil & Berg,
2017; Parkes, 2020).
The content and evaluation standards of the edTPA can present problems specific to
the music classroom. For example, the timeline requiring candidates to teach their
entire unit in three to five consecutive lessons might not allow K–12 students to engage
in creative artistic processes authentically (Heil & Berg, 2017). The assessment can
also force candidates in secondary ensemble programs to teach edTPA lessons unrelated to the goals of the classroom and within a tight rehearsal schedule dictated by
public performances (Powell & Parkes, 2020).
4
Journal of Music Teacher Education 00(0)
SCALE (e.g., 2015, 2018c, 2019a) annually reports the reliability and validity of
the edTPA using data from statistical tests conducted on aggregated scores of all content areas with at least 10 portfolio submissions from the previous calendar year. In
2018, internal consistency as measured by Cronbach’s α equaled .89 for the performing arts, and for all subjects combined. Interrater reliability estimates using the kappan (kn) statistic averaged .91 among the 15 evaluation rubrics. Factor analysis
supported both the one-factor and three-factor models, with all loadings exceeding
.50. According to SCALE (2015), these results “confirm [ ] that the tasks are measuring a common unifying teaching construct and that there are three common latent
constructs . . . which [comprise] each of the three tasks” (p. 22). Factor correlations
in the three-factor model ranged from .71 to .78, which SCALE (2018c) claims “supports the edTPA structure consisting of three correlated abilities: Planning, Instruction,
and Assessment” (p. 25).
Gitomer et al. (2021) questioned the psychometric validity and reliability of the
edTPA due to (a) the use of aggregated data across content areas, (b) the supposed
existence of both a one- and a three-factor model, (c) measures of internal consistency
involving scores of all evaluators combined, and (d) the utilization of exact + adjacent
agreements rather than only exact agreements to calculate interrater reliability through
the kn statistic. The authors illustrated the difference in interrater agreement indices
attained using exact agreements only versus exact + adjacent agreements as used by
SCALE. The simulation involved rubric scores from 184 students from one of the
author’s institutions and interrater agreement coefficients for all handbooks combined
from the 2017 edTPA Administrative Report (SCALE, 2018c). Results indicated that
interrater reliability for individual rubrics ranged from kappa indices of .06 to .32 (M
= .23) using only exact agreements, compared with .85 to .97 (M = .91) as reported
by SCALE. The authors acknowledged the need for analysis of individual content
areas and called for SCALE to make these data publicly available.
Musselwhite and Wesolowski (2019) used the Rasch Measurement Model to analyze edTPA scores of music students (N = 100) from three universities in the United
States. They examined (a) the validity and reliability of the 15 rubrics, (b) the extent
to which the rubric criteria fit the measurement model and vary in difficulty, and (c)
if category response structures for each criterion empirically cooperate to provide
meaningful measures. Reliability of separation, similar in interpretation to
Cronbach’s alpha, fell within the upper range of acceptability for students (Rel. =
.89) and rubric criteria (Rel. = .95), meaning edTPA scores could be used to separate
high- and low-achieving students and the most and least difficult rubric criteria.
Rubrics within each of the three tasks demonstrated adequate data-model fit.
However, based on underuse of the lowest (1) and highest (5) ratings, the authors
suggested that response categories were not capturing the full range of candidate
performance or the results may not reflect the expected and intended meaning of the
rubrics (e.g., “novice not ready to teach,” “highly accomplished beginner”). In addition, violations of monotonicity (i.e., the assumption that variables move consistently in the same or opposite directions) raised concerns with the overall rating
scale structure.
Hash
5
Austin and Berg (2020) analyzed the reliability, validity, and utility of edTPA scores
for music teacher candidates (N = 60) over a 3-year period from 2013 to 2015. Scores
for all three tasks (α = .76-.81) and the 15 rubrics combined (α = .84) demonstrated
adequate internal consistency. Factor analysis supported the construct validity of the
assessment and produced a clear structure that corresponded to the three edTPA tasks.
Criterion-related validity evidence was mixed, however, with most correlations
between edTPA scores and the 16 variables examined being of modest magnitude
(<.25).
Purpose and Need for the Study
Annual edTPA Administrative Reports (e.g., SCALE, 2015, 2018c, 2019a) provide
factor analysis and interrater agreement data for all content areas combined, as well as
Cronbach’s alpha for each handbook with at least 10 submissions. The reports provide
no data related to internal consistency (α) of each task or to factor analysis for specific
disciplines. The 2018 Administrative Report states that “factor analyses models of
latent structure are reviewed for each field [handbook] with appropriate sample size”
(SCALE, 2019a, p. 15). However, only state-level technical advisory committees have
access to these data (Pearson, personal communication, March 18, 2020).
Detailed reliability and validity data for individual subject areas assessed by the
edTPA are necessary for policymakers and teacher educators to evaluate the efficacy
of this instrument. However, SCALE does not make these data available to the public
(Gitomer et al., 2021). Therefore, the purpose of this study was to examine the psychometric quality of edTPA scores for portfolios completed by 136 preservice music
teachers. Research questions were as follows:
Research Question 1: What factor structure emerges from edTPA scores for music
education?
Research Question 2: To what extent do edTPA scores for music education fit the
one- and three-factor a priori models proposed by SCALE (2013)?
Research Question 3: What is the internal consistency and interrater reliability of
the edTPA for music education students?
This research will help estimate the reliability and construct validity of the edTPA
specifically among preservice music educators and provide discipline-specific data to
compare against that available publicly (e.g., SCALE, 2019a).
Method
Data
Data for this study consisted of all edTPA rubric scores (N = 2,040) attained between
fall 2015 and spring 2020 for preservice music educators (N = 136) at one large university in the Midwestern United States. The sample involved 61 males and 75 females.
6
Journal of Music Teacher Education 00(0)
All participants were pursuing a Bachelor of Music Education degree and following
either an instrumental (n = 93, 68.4%) or a vocal (n = 43, 31.6%) track. With one
exception, all students passed the edTPA on their initial attempt. The institution piloted
the edTPA beginning in 2013, two years before the state implemented the assessment
as a requirement for teacher licensure (Adkins et al., 2015).
A comparison of music scores from this study with national data for the K–12
Performing Arts Handbook indicated higher than average final (Music: M = 51.62;
Performing Arts: M = 46.36) and rubric scores (Music: M = 3.44; Performing Arts:
M = 3.18). Individual rubric means all exceeded national averages and the 3.0 benchmark associated with candidates being “competent and ready to teach.” In addition,
frequency counts and skewness indices indicated a normal distribution (see Table S1
in the online supplement). Consistent with national data for the K–12 Performing Arts
(SCALE, 2018c, 2019a, 2019c), 6% of rubric scores in this study consisted of a 1 or a
5 with about 95% of scores falling within the 2 to 4 range.
Construct Validity
Preliminary examination of construct validity involved a series of factor analyses
using various methods and rotations to determine the best model fit based on criteria
for simple structure (Asmus, 1989; J. D. Brown, 2009):
1.
2.
3.
4.
Each variable produces at least one zero loading (−.10 to +.10) on some
factor.
Each factor has at least as many zero loadings as there are factors.
Each pair of factors contains variables with significant loadings (≥.30) on one
and zero loadings on the other.
Each pair of factors contains only a few complex variables (loading ≥.30 on
more than one factor).
Final analysis for this study involved principal axis factoring using Kaiser normalization and promax rotation with kappa set at the default value of 4. The first analysis
used an eigenvalue of one criterion to determine if a factor structure other than that
determined by SCALE (2013) might emerge from the edTPA for music education.
Subsequent analysis tested the existence of the a priori models, which include a singlefactor solution around teacher readiness and a three-factor model aligned with edTPA
tasks: Planning, Instruction, and Assessment.
I considered the effectiveness of the models based on communalities (proportion of
each variable’s total variance explained by all factors) and the extent to which items
achieved a high loading on their intended factor. Generally, researchers consider loadings of .30 to .40 meaningfully large (Miksza & Elpus, 2018). The pattern matrix
(unique contribution of each factor to a variable’s variance) served as the primary
determinant used to identify which items clustered into factors. I also examined the
structure matrix (correlation of each variable and factor) to verify the interpretation.
Bartlett’s test of sphericity indicated if there were adequate correlations for data
Hash
7
reduction, and the Kaiser-Meyer-Olkin measure determined sampling adequacy
(Asmus, 1989; J. D. Brown, 2009). Maximum interfactor correlations of .80 served as
the standard for adequate discriminant validity (T. A. Brown, 2015).
SCALE analyzes internal structure of the edTPA for all content areas combined
through a confirmatory factor analysis using maximum likelihood estimation, which
assumes a normal distribution and is most appropriate for large sample sizes (Costello
& Osborne, 2005; Miksza & Elpus, 2018). Principal axis factoring used in this study
better fit the data and proved more effective in achieving a simple solution (e.g., J. D.
Brown, 2009).
Reliability
Cronbach’s alpha provided a measure of internal consistency for the complete edTPA,
individual tasks determined by SCALE (2019a), and factors identified in this study. A
coefficient of α ≥ .80 served as the minimum acceptable benchmark as per general
practice in the social sciences (e.g., Carmines & Zeller, 1979; Krippendorff, 2013).
SCALE (2019a) analyzed interrater reliability for each rubric using the kappan
statistic:
kn =
AO − 1 / n
1 − 1/ n
where AO represents observed agreement and n equals the number of possible adjudication categories/classifications.1 Due to the lack of agreement indices for data in this
study, I replicated the procedure of Gitomer et al. (2021) and calculated Cohen’s kappa
formula instead:
A − AC
k= O
1 − AC
This estimate of interrater reliability used the proportions of exact agreement (AO)
reported in the 2018 edTPA Administrative Report for all content areas combined and
chance agreement (AC) coefficients from the music scores analyzed here. Chance
agreement indices equaled the sum of the cross-multiplied proportions of rubric scores
in each category (1–5) from portfolios that did not contain fractional numbers (e.g.,
2.5) due to double scoring (n = 128).2 Thus, kappa is higher to the extent that observed
agreement exceeds the expected level of chance agreement (Brennan & Prediger,
1981). Due to the unavailability of data from two independent evaluators, calculations
of chance agreement involved multiplying duplicate proportions of one scorer
(Gitomer et al., 2021).
Like Gitomer et al. (2021), I only considered exact agreements when estimating
kappa to provide a more precise estimate of interrater reliability. Kappa coefficients
reported by SCALE (2019a) are likely inflated because calculations involved exact +
adjacent agreements on a 5-point scale, where about 95% of scores fell between 2 and
4 (Stemler & Tsai, 2008). Kappa can range from −1 to +1 with interpretations of poor
(below 0.00), slight (0.00-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial
(0.61-0.80), and almost perfect (0.81-1.00; Landis & Koch, 1977).
8
Journal of Music Teacher Education 00(0)
The purpose of estimating k was to demonstrate the difference in readings based on
exact + adjacent agreements versus those attained with exact agreements only. The
use of exact + adjacent agreements is problematic on a 5-point scale, especially when
scorers rarely use the highest and lowest categories. This approach is less problematic
when estimating reliability for longer scales because of the underlying possible score
range and the precision required to attain perfect agreement (Stemler & Tsai, 2008).
Results
Factor Structure
I conducted an exploratory factor analysis (EFA) using principal axis factoring with
promax rotation and an eigenvalue of one criterion for extraction. Based on Bartlett’s
test output (χ2 = 726.9, p < .001) and Kaiser-Meyer-Olkin measure (.89), I concluded
the underlying data were adequately correlated and the sample size was appropriate
for conducting an EFA. Eigenvalues equaled 5.88 (Factor 1), 1.34 (Factor 2), and 1.06
(Factor 3), with Factor 1 accounting for 35.7% of the variance in edTPA ratings, followed by Factors 2 (5.8%) and 3 (3.4%) for a cumulative explained variance of 44.9%.
The resulting three-factor model met all criteria for simple structure with the exception
of three rubrics failing to achieve a 0 loading on any factor, a criterion which may be
difficult to meet with smaller sample sizes and fewer extracted factors.
The three-factor model resulting from edTPA scores for music education (see Table
S2 in the online supplement) did not support the a priori structure proposed by SCALE
(2019a) around the three tasks. R1-R5, R11, R12, and R15 from Tasks 1 and 3 clustered into Factor 1. Factor 2 consisted of R6-R9 from Task 2, and Factor 3 consisted of
R10 from Task 2, and R13 and R14 from Task 3. The eight rubrics comprising Factor
1 suggest an interpretation of “Planning and Assessment.” Factor 2, consisting of
R6-R9, resembled Task 2 (Instruction). Factor 3, containing R10, R13, and R14, defied
a clear interpretation.
With interfactor correlations ranging from .51 (Factors 2 and 3) to .67 (Factors 1
and 3), I concluded that the three-factor model provided adequate discriminant validity (T. A. Brown, 2015), despite lack of support for construct validity. An additional
analysis examined the one-factor solution around teacher readiness, which resulted in
factor loadings of .46 to .74 (M = .59; SD = .09) an d explained just 35.1% of the
variance (see Table S2 in the online supplement).
Reliability
(In subsequent discussions, tasks refer to the a priori groupings of rubrics around
Planning, Instruction, and Assessment [e.g., SCALE, 2019a] and factors denote groupings that emerged from the analysis described here.) I report two forms of reliability
estimation for preservice music teachers’ edTPA scores—internal consistency (how
consistent ratings are within a priori edTPA tasks or factors extracted through EFA)
and interrater reliability (how consistent edTPA rubric scores are across evaluators).
Hash
9
Estimates of internal consistency (α) for the three a priori tasks ranged from .73 for
Task 1 (Planning) and .74 for Task 3 (Assessment) to .81 for Task 2 (Instruction).
Alpha coefficients for factors produced by the EFA ranged from .66 for Factor 3
(rubrics 10, 13, 14) to .81 for Factor 1 (rubrics 1-5, 11, 12, 15) and .82 for Factor 2
(rubrics 6-9). Regardless of whether tasks or factors served to frame the grouping of
rubrics, scores for rubrics thought to represent Instruction yielded the highest level of
internal consistency. When all 15 rubrics were considered together as a single measure
of teacher readiness, the resulting alpha was .88.
Estimated interrater reliability using only exact agreements for Cohen’s kappa
(Gitomer et al., 2021) ranged from .07 to .51 for individual rubrics and averaged .25
(SD = .12) overall. These findings are similar to estimated k for the Performing Arts
(Range = −.01-.32; M = .24; SD = .09) calculated from rubric scores reported in the
spring 2019 edTPA National Performance Summary (SCALE, 2019c) and exact agreement indices for all content areas combined from the 2018 Administrative Report.
Estimated k from both analyses differed greatly from kappan statistics reported by
SCALE (Range = .85-.98, M = .91, SD = .04) for all handbooks together (SCALE,
2019a; see Table S3 in the online supplement).
Discussion
In this study, I examined the reliability and construct validity of edTPA scores for
preservice music teachers. Readers should interpret results with caution due to limitations of the study. In addition to a relatively small nonrandom sample, all data came
from one institution and may not reflect broader trends. It is also important to note
differences between statistical procedures used in this study and those involved in
analyses published in the Administrative Reports (SCALE, 2013, 2015, 2018c, 2019a)
when making comparisons.
Construct Validity
The factor structure that I obtained through EFA raises important questions about the
construct validity of the edTPA for music education. According to SCALE (2019a), all
28 content areas share approximately 80% of their design around Planning, Instruction,
and Assessment. However, this design results in standardization that might fail to capture the uniqueness of teaching and learning in some disciplines (e.g., Powell &
Parkes, 2020). The percent of variance explained by the one- and three-factor solutions in this study indicates that the 15 rubrics do not represent the totality of what
occurs in the music classroom. Individual factor loadings also suggest that some
rubrics might measure elements of instruction connected less to music than other content areas. Variables related to academic language demands, for example, loaded .39 to
.55 on either model.
It is unclear why three of the Assessment (Task 3) rubrics (R11, R12, & R15) loaded
with the Planning (Task 1) rubrics (R1-R5) onto Factor 1. The titles of these tasks,
“Planning for Instruction and Assessment;” and “Assessing Student Learning,” imply
10
Journal of Music Teacher Education 00(0)
a relationship. Maybe these tasks are more closely related in music than in other subjects. However, the factor structure that emerged in Austin and Berg (2020) clearly
aligned with the Planning, Instruction, and Assessment tasks, and does not support this
assertion. Perhaps the teacher preparation program involved in this study taught
assessment in such a way that caused students to view planning and assessment as
being so closely associated that the scores they received for the a priori assessment
rubrics did not coalesce in a meaningful way and, instead, loaded onto two different
factors.
The failure of the theoretical model (e.g., SCALE, 2019a) to emerge in this study is
problematic when scorers evaluate individual tasks. Rubrics in Task 1 (R1-R5) are not
inclusive of those that represented a single construct (i.e., Factor 1: R1-R5, R11, R12,
R15). Likewise, R10 from Task 2 did not load with other instructional rubrics (R6R9), and rubrics associated with Task 3 (R11-R15) loaded onto two different factors.
SCALE could mitigate this concern by allowing graders to consult all materials as
evidence for any task. However, scoring rules treat the three tasks as separate entities
by prohibiting evaluators from considering evidence from one task when scoring
another. For example, a scorer cannot use lesson plans from Task 1 as evidence for
achievement on Task 3 (Parkes & Powell, 2015).
Reliability
Measures of internal consistency (α) in this study for all tasks and total scores exceeded
.70 and were similar to those attained by Austin and Berg (2020). However, only Task
2 (Instruction, R6-R10) met the .80 benchmark for acceptable reliability while Tasks 1
(Planning, R1-R5) and 3 (Assessment, R11-R15) did not. Lower alpha coefficients for
individual tasks could be a function of the number of items in each (Carmines & Zeller,
1979). Alpha readings in this study, like those by SCALE (2019a), might also be inaccurate due to combining (a) different observations by multiple evaluators and (b) nonindependent rubric scores assigned by single raters. This procedure ignores the effects
of individual scores on internal consistency of the edTPA evaluation form, which might
result in inflated alpha coefficients (Gitomer et al., 2021; Miksza & Elpus, 2018).
Interrater reliability estimates (k) in this study might be imprecise because of the
statistical procedures used in the absence of two sets of evaluator ratings for preservice
music teachers in this study. However, the wide disparity between these and kn indices
listed for all content areas combined (SCALE, 2019a) were likely due to SCALE’s use
of exact + adjacent agreements in the calculations rather than differences in k and kn
formulas (Gitomer et al., 2021). Although coefficients based on adjacent + exact
agreements appear in the literature, their use depends on raters assigning scores across
all possible categories for discrete 5-point rubrics. Underuse of the highest and lowest
scoring options results in a scale where nearly all points will be adjacent and in agreement indices usually above 90% (Stemler & Tsai, 2008). About 95% of edTPA music
scores in this study and for content areas nationally (SCALE, 2019c) fell within a scale
of 2 to 4. Consequently, agreement indices for individual rubrics listed in the 2018
Administrative Report (2019a) ranged from .94 to .99.
Hash
11
Summary and Recommendations
Factor analysis indicated that while the three-factor model for the edTPA accounted
for almost one-half of the variance in music teacher readiness, the single-factor model
accounted for just over one-third. Although scores for all rubrics sufficiently loaded on
the single-factor model, the three-factor model lacked clarity and interpretability in
relation to a priori tasks proposed by the test authors (e.g., SCALE, 2019a). In addition, measures of internal consistency for two of the three tasks did not meet the .80
benchmark for acceptability (Carmines & Zeller, 1979), and estimated interrater
agreement ranged from only slight to moderate (Landis & Koch, 1977). These findings support the need for analysis by content area (e.g., Gitomer et al., 2021) and challenge the aggregated data published by SCALE.
Policymakers, teacher educators, and other stakeholders should consider findings
from this study when making decisions about implementation and continuation of the
edTPA. Although this research focused solely on psychometric qualities, decisionmakers must also weigh ethical and philosophical concerns such as consequential and
ecological validity, socioeconomic factors, racial bias, and potential effects on K–12
student learning (e.g., Powell & Parkes, 2020). If the edTPA is to continue to serve as
a high-stakes assessment for preservice music teachers, it should act as only one component among multiple measures of readiness. Perhaps policymakers should allow
candidates scoring below their benchmark to make up the deficiency through grade
point averages, student teaching evaluations, content exams, or other criteria (e.g.,
Parkes, 2020).
SCALE should consider revising the edTPA for specific content areas, especially
when data do not support reliability and validity. For the performing arts, test authors
should divide music, theater, and dance into separate handbooks, and then work with
educators to develop scoring procedures and criteria to better reflect the specific types
of teaching and learning that occur in these classrooms. Changes might include altering
the number of rubrics and their descriptors to focus more on creating, performing, and
responding, and less on learning about the subject through writing and discussion.
These changes are not unprecedented. The world languages and classical languages
handbooks each contain 13 rubrics. In addition, one version of the elementary education handbook consists of four tasks with 18 rubrics total (SCALE, 2019a). Regardless,
scoring rules should allow evaluators to consult all materials throughout the grading
process to account for the holistic nature of teaching (e.g., Powell & Parkes, 2020) and
to compensate for different factor structures that might exist in various subject areas.
Public data published in the Administrative Reports for all handbooks combined
hold little meaning, since the assessment is designed, administered, and scored within
separate disciplines. Instead, these analyses should reflect a higher level of transparency and contain complete data for each area. Results from factor analysis, for example, should include information not currently available such as the percentage of
variance explained by each factor, communalities, and the type of matrix (e.g., pattern,
structure) used in the interpretation.
Internal consistency coefficients (α) should account for measurement error caused
by raters. One method might be to calculate α for all portfolios graded by an individual
12
Journal of Music Teacher Education 00(0)
scorer, and then report an average for each task and all 15 rubrics combined within a
content area. Test administrators should also consider a different procedure for calculating interrater reliability. The current method of combining exact + adjacent agreements for use in kn is too liberal, especially with underuse of the lowest and highest
ratings (Stemler & Tsai, 2008). Likewise, using only exact agreements in the measurement might be too conservative concerning the practical application of edTPA scores
in readiness-for-licensure decisions, which flow from total scores rather than individual rubric scores. Instead, SCALE should consider use of a weighted kappa to provide
a more accurate representation of interrater reliability. This procedure penalizes disagreements in terms of their severity, whereas unweighted kappa treats all disagreements equally (Sim & Wright, 2005). Regardless, agreement indices and proportions
of scores for all rubrics in each content area should appear with other public data so
that scholars outside of SCALE and Pearson can verify statistical analysis and conduct
further research.
The high stakes nature of the edTPA for preservice teachers requires valid and reliable results in all disciplines. Continuous research is needed to monitor the psychometric qualities and identify weaknesses in this assessment. In the absence of publicly
available data, researchers could replicate this study and others (Austin & Berg, 2020;
Musselwhite & Wesolowski, 2019) by combining scores from multiple institutions to
create analyses that are more robust. Future studies should involve multiple statistical
procedures due to limitations and advantage of various methods. For example, the
Rasch model can compensate for differences in rater severity or sample characteristics
(Musselwhite & Wesolowski, 2019; Stemler & Tsai, 2008). Educator preparation programs considering the edTPA for internal use or states adopting the assessment as a
licensure requirement should not do so without evidence of validity and reliability for
each content area.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship,
and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this
article.
ORCID iD
Phillip M. Hash
https://orcid.org/0000-0002-3384-4715
Supplemental Material
Supplemental material for this article is available online.
Notes
1.
SCALE (2019b) uses categories of agreement (n = 2) rather than rubric categories (n = 5)
as the unit of n in their calculations, stating that, “given the three possible classifications of
Hash
2.
13
agreement (perfect, adjacent, and nonagreement), . . . perfect and adjacent were combined
as the agreement statistic” (p. 6). SCALE does not provide details about these calculations
beyond stating the use of kappan. However, calculating this statistic using the exact +
adjacent agreements for each rubric provided by SCALE (2019) and 2 as the value for n
resulted in the same kn coefficients provided in the 2018 Administrative Report.
Rubrics that undergo double scoring are averaged when Scorer 1 and Scorer 2 reach
adjacent agreement. Rubric scores more than one number apart are resolved by a scoring
supervisor.
References
Adkins, A., Klass, P., & Palmer, E. (2015, January). Identifying demographic and preservice
teacher performance predictors of success on the edTPA [Conference presentation]. 2015
Hawaii International Conference on Education. Honolulu, Hawaii. http://hiceducation.org/
wp-content/uploads/proceedings-library/EDU2015.pdf
Asmus, E. P. (1989). Factor analysis: A look at the technique through the data of Rainbow.
Bulletin of the Council for Research in Music Education, 101, 1–29. www.jstor.org/stable/40318371
Austin, J. R., & Berg, M. H. (2020). A within-program analysis of edTPA score reliability,
validity, and utility. Bulletin of the Council for Research in Music Education, 226, 46–65.
https://doi.org/10.5406/bulcouresmusedu.226.0046
Behizadeh, N., & Neely, A. (2018). Testing injustice: Examining the consequential validity of
edTPA. Equity & Excellence in Education, 51(3–4), 242–264. http://doi.org/10.1080/1066
5684.2019.1568927
Bernard, C., & McBride, N. (2020). “Ready for primetime:” edTPA, preservice music educators,
and the hyperreality of teaching. Visions of Research in Music Education, 35, 1–26. wwwusr.rider.edu/%7Evrme/v35n1/visions/Bernard%20and%20McBride_Hyperreality%20
Manuscript.pdf
Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41(3), 687–699. https://doi.
org/10.1177/001316448104100307
Brown, J. D. (2009). Choosing the right type of rotation in PCA and EFA. Shiken: JALT Testing
& Evaluation Newsletter, 13(3), 20–25. http://hosted.jalt.org/test/PDF/Brown31.pdf
Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford
Press.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Sage.
Choppin, J., & Meuwissen, K. (2017). Threats to validity in the edTPA video component. Action
in Teacher Education, 39(1), 39–53, https://doi.org/10.1080/01626620.2016.1245638
Costello, A, B., & Osborne, J. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research, and
Evaluation, 10, Article 7. https://doi.org/10.7275/jyj1-4868
Darling-Hammond, L., & Hyer, M. E. (2013). The role of performance assessment in developing teaching as a profession. Rethinking Schools, 27(4). www.rethinkingschools.org/
articles/the-role-of-performance-assessment-in-developing-teaching-as-a-profession
Dover, A., Schultz, B., Smith, K., & Duggan, T. (2015). Embracing the controversy: edTPA,
corporate influence, and the cooptation of teacher education. Teachers College Record,
Article 18109. www.tcrecord.org/books/Content.asp?ContentID=18109
14
Journal of Music Teacher Education 00(0)
Gitomer, D. H., Martinez, J. F., Battey, D., & Hyland, N. E. (2021). Assessing the assessment:
Evidence of reliability and validity in the edTPA. American Educational Research Journal,
58(1), 3–31. https://doi.org/10.3102%2F0002831219890608
Heil, L., & Berg, M. H. (2017). Something happened on the way to completing the edTPA:
A case study of teacher candidates’ perceptions of the edTPA. Contributions to Music
Education, 42, 181–200. www.jstor.org/stable/26367442
Krippendorff, K. (2013). Content analysis: An introduction to its methodology (3rd ed.). Sage.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical
date. Biometrics, 33(1), 159–174. http://dx.doi.org/10.2307/2529310
Miksza, P., & Elpus, K. (2018). Design and analysis for quantitative research in music education. Oxford University Press.
Musselwhite, D. J., & Wesolowski, B. C. (2019). Evaluating the psychometric qualities of the
edTPA in the context of pre-service music teachers. Research Studies in Music Education.
Advance online publication. https://doi.org/10.1177/1321103X19872232
National Association for Music Education. (2014). 2014 Music standards. https://nafme.org/
my-classroom/standards/core-music-standards/
National Association of Schools of Music. (2020). Handbook 2019-20. https://bit.ly/3jTVKQi
Parkes, K. A. (2020). Student teaching and certification assessments. In C. Conway, K.
Pellegrino, A. M. Stanley, & C. West (Eds.), Oxford handbook of preservice music teacher
education in the United States (pp. 231–252). Oxford University Press.
Parkes, K. A., & Powell, S. R. (2015). Is the edTPA the right choice for evaluating teacher
readiness? Arts Education Policy Review, 116(2), 103–113. https://doi.org/10.1080/1063
2913.2014.944964
Pecheone, R. L., & Whittaker, A. (2016). Well-prepared teachers inspire student learning. Phi
Delta Kappan, 97(7), 8–13. https://doi.org/10.1177/0031721716641641
Powell, S. R., & Parkes, K. A. (2020). Teacher evaluation and performativity: The edTPA as a
fabrication. Arts Education Policy Review, 121(4), 131–140. https://doi.org/10.1080/1063
2913.2019.1656126
Sato, M. (2014). What is the underlying conception of teaching of the edTPA? Journal of
Teacher Education, 65(5), 421–434. http://doi.org/10.1177/0022487114542518
Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation,
and sample size requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/
ptj/85.3.257
Stanford Center for Assessment, Learning, and Equity. (2013). 2013 edTPA field fest: Summary
report. https://secure.aacte.org/apps/rl/res_get.php?fid=827
Stanford Center for Assessment, Learning, and Equity. (2015). Educative assessment and
meaningful support: 2014 EdTPA administrative report. https://secure.aacte.org/apps/rl/
res_get.php?fid=2188&ref=edtpa
Stanford Center for Assessment, Learning, and Equity. (2018a). edTPA K-12 Performing arts
assessment handbook (Version 06). http://ceit.liu.edu/Certification/EdTPA/2018/edtpapfa-handbook%202018.pdf
Stanford Center for Assessment, Learning, and Equity. (2018b). Understanding rubric level
progressions: K–12 performing arts (Version 01). https://concordia.csp.edu/teachered/wpcontent/uploads/sites/3/K-12-Performing-Arts-Rubric-Progressions.pdf
Stanford Center for Assessment, Learning, and Equity. (2018c). Educative assessment and
meaningful support: 2017 EdTPA administrative report. https://secure.aacte.org/apps/rl/
res_get.php?fid=4271&ref=edtpa
Hash
15
Stanford Center for Assessment, Learning, and Equity. (2019a). Educative assessment and
meaningful support: 2018 EdTPA administrative report. https://secure.aacte.org/apps/rl/
res_get.php?fid=4769&ref=edtpa
Stanford Center for Assessment, Learning, and Equity. (2019b). Affirming the validity and
reliability of edTPA [White paper]. http://edtpa.aacte.org/wp-content/uploads/2019/12/
Affirming-Validity-and-Reliability-of-edTPA.pdf
Stanford Center for Assessment, Learning, and Equity. (2019c). edTPA EPP performance summary: January 2019 - June 2019. https://sasn.rutgers.edu/sites/default/files/sites/default/
files/inline-files/Jan%20to%20June%202019%20edTPA.pdf
Stemler, S. E., & Tsai, J. (2008). Best practices in interrater reliability: Three common
approaches. In J. Osborn (Ed.), Best practices in quantitative methods (pp. 29–49). Sage.
Revista Educación
ISSN: 0379-7082
ISSN: 2215-2644
revedu@gmail.com
Universidad de Costa Rica
Costa Rica
Actualización de la evaluación docente
de posgrados en una universidad
multicampus: experiencia desde la
Universidad Santo Tomás (Colombia)[1]
Patiño-Montero, Freddy; Godoy-Acosta, Diana Carolina; Arias Meza, Deyssy Catherine
Actualización de la evaluación docente de posgrados en una universidad multicampus: experiencia desde la
Universidad Santo Tomás (Colombia)[1]
Revista Educación, vol. 46, núm. 2, 2022
Universidad de Costa Rica, Costa Rica
Disponible en: https://www.redalyc.org/articulo.oa?id=44070055006
DOI: https://doi.org/10.15517/revedu.v46i2.47955
Esta obra está bajo una Licencia Creative Commons Atribución-NoComercial-CompartirIgual 3.0 Internacional.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
Artículos científicos
Actualización de la evaluación docente de posgrados en una universidad
multicampus: experiencia desde la Universidad Santo Tomás (Colombia)[1]
Update of the Postgraduate Teaching Evaluation in a Multicampus University: Experience from the Santo Tomás
University (Colombia)
Freddy Patiño-Montero
Universidad Santo Tomás,, Bogotá, Colombia
freddypam@hotmail.com
DOI: https://doi.org/10.15517/revedu.v46i2.47955
Redalyc: https://www.redalyc.org/articulo.oa?
id=44070055006
https://orcid.org/0000-0001-5795-4911
Diana Carolina Godoy-Acosta
Universidad Santo Tomás, Bogotá, Colombia
diana.godoya@gmail.com
https://orcid.org/0000-0002-1903-0854
Deyssy Catherine Arias Meza
Universidad Santo Tomás, Bogotá, Colombia
deyssy_90@hotmail.com
https://orcid.org/0000-0001-6689-5706
Recepción: 20 Agosto 2021
Aprobación: 20 Septiembre 2021
Resumen:
Este artículo presenta los resultados de una investigación evaluativa cuyo objetivo estuvo orientado a realizar un ejercicio de
metaevaluación de la evaluación docente de posgrados de la Universidad Santo Tomás, durante el período 2017-2020. Los
referentes teóricos se ubican en orden a las categorías: evaluación educativa, evaluación del profesorado .investigación evaluativa,
así como los referentes institucionales que se tuvieron en cuenta dentro del proceso. La investigación se ubica en el paradigma
cualitativo y corresponde a una metodología de investigación evaluativa que permitió el diseño de ocho pasos que orientaron la
realización del estudio; esto posibilitó la metaevaluación de la evaluación docente de los posgrados de la Universidad Santo Tomás,
donde participaron personas estudiantes, docentes, directoras de programa y decanas de facultad en el diagnóstico, mesas de trabajo,
aplicación de pilotaje, evaluación del instrumento final e implementación de la evaluación. Los resultados alcanzados se presentan
en coherencia con la metodología, en consideración de que no se hacían efectivas políticas y procedimientos institucionales, unido
a que el derecho a réplica del profesorado era casi nulo. Por otro lado, lo más relevante es la definición de una evaluación docente
personalizada de acuerdo con su plan de trabajo y la evaluación del desempeño del personal docente contratado por orden de
prestación de servicios. Todo esto conllevó a la consolidación y parametrización de un aplicativo institucional. Finalmente, se
esbozan algunas conclusiones del proceso y recomendaciones de carácter metodológico para adelantar este tipo de trabajos en
instituciones de educación superior multicampus.
Palabras clave: Evaluación educativa, Evaluación docente, Educación superior, Investigación evaluativa.
Abstract:
is article presents the results of an evaluative research whose objective was oriented to carry out a meta-evaluation exercise of
the postgraduate teacher evaluation of the Santo Tomás University, during the period 2017-2020. e theoretical referents are
placed in order of the categories: educational evaluation, the evaluation of the teaching staff and evaluative research, as well as the
institutional referents that were considered within the process. e research is located in the qualitative paradigm and corresponds
to an evaluative research methodology that permitted the eight-step design that oriented the accomplishment of the study. is
made possible the meta-evaluation of the teaching evaluation of the postgraduate courses of the Santo Tomas University, where
students, teachers, program directors, and deans of faculty participated in the diagnosis, working tables, application of piloting,
evaluation of the final instrument, and implementation of the evaluation. e achieved results are presented in coherence with
the methodology, considering that institutional policies and procedures were not implemented, together with the fact that the
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
1
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
teachers' right to reply was almost nil. On the other hand, the most relevant is the definition of a personalized teacher evaluation
according to their work plan, and the evaluation of the performance of the teacher staff hired by order of service provision.
All this led to the consolidation and parameterization of an institutional application. Finally, some conclusions of the process
and recommendations of a methodological nature are outlined to carry out this type of work in multi-campus higher education
institutions.
Keywords: Educational Evaluation, Teacher Evaluation, Higher Education, Evaluative Research.
Epígrafe
“O la evaluación es útil o no tiene sentido realizarla; tiene que ser un instrumento para la acción y no un mero
mecanismo de justificación o para tranquilizar conciencias. Los evaluadores debemos ser beligerantes en este
sentido” (Escudero-Escorza, 2000, p. 406).
Introducción
La Universidad Santo Tomás [2] (USTA) es una Institución de Educación Superior de carácter privado,
con presencia nacional a través de la sede principal Bogotá, seccionales en Bucaramanga y Tunja, y sedes
en Medellín y Villavicencio. Adicionalmente, cuenta con Centros de Atención Universitaria (CAU) en 23
ciudades y municipios del país. La oferta académica comprende 76 programas de pregrado y 129 de posgrado,
en los cuales están matriculadas cerca de 32,000 personas estudiantes, divididas en 29,000 en pregrado y 3000
en posgrado. Para cumplir con su misión, la USTA cuenta con 2,350 docentes con dedicación de tiempo
completo, medio tiempo y hora cátedra (Mesa-Angulo, 2020).
Desde este contexto, el proceso investigativo inició con un ejercicio de diagnóstico, realizado en 2017,
sobre el estado de la evaluación docente en posgrado, donde se pudo constatar que esta no obedecía a las
mismas dinámicas que en pregrado, al punto que cada uno de los programas tenía sus propios instrumentos
y metodologías. Además de lo anterior, se identificaron algunos problemas como:
Poca significatividad del instrumento que diligencia el estudiantado, puesto que la redacción de los
descriptores en su mayoría está referida únicamente a la modalidad presencial.
La escasa motivación y participación por parte del estudiantado.
La participación intermitente por parte del profesorado.
La poca implementación de planes de mejoramiento por parte del cuerpo docente.
La escasa información para la toma de decisiones desde la gestión de los programas respecto a la
continuidad del cuerpo docente.
En virtud de lo anterior, el objetivo principal de esta investigación fue realizar un ejercicio de
metaevaluación de la evaluación docente de posgrados de la USTA. Sus objetivos específicos fueron: a)
analizar los referentes conceptuales y metodológicos que soportan los procesos de investigación evaluativa
y evaluación educativa, b) evaluar el nivel de implementación de las políticas y procedimientos para la
evaluación docente de posgrados de la USTA, y c) proponer una nueva batería de instrumentos que atienda
a las necesidades de los programas de posgrado.
Estado de la cuestión
Respecto a la evaluación educativa, como se evidencia en los estudios que se refieren a lo largo de esta
publicación, su evolución se ubica en la misma historia de la educación, en cuanto al concepto, alcance y
metodologías. Asimismo, indican que el siglo XX fue especialmente significativo en tanto que se alcanza la
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
2
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
profesionalización de la evaluación educativa y se supera la perspectiva objetivante centrada en la medición.
Así, con base en el enfoque constructivista, el propósito pasa a ser el aprendizaje del estudiantado, lo que
implica un cambio de perspectiva respecto a los fines de la educación y la misma evaluación, como se percibe
en los análisis de Santos-Guerra (2010), Casanova (2007), Escudero-Escorza (2003), Rossett y Sheldon
(2001), House (1995), Stufflebeam y Shinkfield (1987), Guba y Lincoln (1989), entre otros.
En cuanto a la evaluación del profesorado, se encuentra que los diversos trabajos de revisión teórica
identifican algunas tendencias, entre las cuales se destacan: la complejidad de la labor docente, la falta de
consenso frente a lo que significa ser docente de calidad en la universidad; la diversidad de criterios con
relación a la selección y evaluación, asociadas a las nociones subyacentes sobre la buena enseñanza; la tendencia
a reducir las funciones del profesorado universitario únicamente a la docencia; la influencia de la docencia
en la calidad educativa; la diversidad de funciones, agentes y metodologías de evaluación, hasta los estímulos
salariales y la carrera académica, entre otras, de acuerdo con los trabajos de Rueda (2014), Ramírez-Garzón
y Montoya-Vargas (2014), Montoya y Largacha (2013), Fernández y Coppola (2012), Escudero-Muñoz
(2010), Murillo-Torrecilla (2008), y Tejedor-Tejedor y Jornet-Meliá (2008).
De forma complementaria, la revisión permitió constatar que la investigación evaluativa se ha venido
fortaleciendo desde las últimas décadas como una de las metodologías de investigación, cuya finalidad se
centra en el mejoramiento de la calidad, especialmente del servicio educativo, con un alto énfasis a generar
participación de las partes involucradas y con una amplia flexibilidad metodológica, como indican BelandoMontoro y Alanís-Jiménez (2019), Escudero-Escorza .2006, 2019), Tejedor-Tejedor y Jornet-Meliá (2008),
Tejedor-Tejedor (2009), Litwin (2010) y Saravia-Gallardo (2004).
Referentes conceptuales
Evaluación educativa
Respecto a la primera categoría, evaluación educativa, se identifica como un proceso formativo que se
realiza sobre las acciones desarrolladas en el marco de las instituciones educativas, con la intención de detectar
dificultades e implementar planes de mejora que permitan solucionarlas de manera satisfactoria y pertinente.
Por ende, implica aspectos tales como los resultados de aprendizaje y la evaluación institucional.
Con base en el concepto propuesto, resulta relevante lo expuesto por Casanova (2007) en el Manual de
Evaluación Educativa, cuando afirma que este “consiste en un proceso sistemático y riguroso de recogida de
datos” (p. 60), cuyo propósito es disponer de información continua y significativa, que permita formar juicios
de valor para tomar decisiones que mejoren la actividad educativa.
Los elementos planteados por Casanova adquieren relevancia puesto que se entiende como el resultado de
un conjunto de actividades, claramente relacionadas entre sí, que se dan en el marco del transcurrir cotidiano
de las instituciones educativas, desde su fase inicial (diagnóstico) hasta la entrega de resultados, pero sin
terminar con estos, pues una vez se obtienen, se da inicio al ciclo de mejoramiento, que implica ir a cada
una de las instancias, factores y actores evaluados para establecer las rutas más adecuadas para asegurar que
efectivamente el proceso en sí mismo se vaya cualificando.
En línea con ello, Scriven (1967) afirma que “la evaluación es en sí misma una actividad metodológica que
es esencialmente similar si estamos tratando de evaluar máquinas de café o máquinas de enseñanza, los planes
para una casa o los planes para un programa de estudios” (p. 40). Scriven enmarca la evaluación como un
procedimiento, que lleva implícita la idea de secuencia, progresión en la ejecución de una serie de pasos. De
hecho, al revisar la propuesta de Scriven es posible afirmar que su objetivo consiste en desplazar la evaluación
desde los objetivos hacia las necesidades, en tanto que toda ella está orientada hacia la persona consumidora
(usuario).
Por su parte, para Rossett y Sheldon (2001), la evaluación es el proceso de examen de un programa o
proceso para determinar qué funciona, qué no y por qué. La evaluación determina el valor de los programas y
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
3
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
actúa como modelo para el juicio y la mejora. Respecto a este punto, las personas autoras nuevamente toman
el término proceso para referirse a la evaluación, al tiempo que incluyen el elemento valorativo que se debe
dar dentro de este, así como la intención de utilizarlos en perspectiva de mejoramiento, en este caso de los
programas.
Entonces, conviene preguntarse: ¿cuál evaluación y al servicio de quién? La evaluación educativa es un
tema que ha logrado mantener especial relevancia en el ámbito académico, en cuanto aspecto neurálgico
en los procesos educativos. Es decir, se ha evidenciado que la evaluación es un tema que trasciende los
espacios convencionales de debate, puesto que en ella convergen múltiples factores y relaciones humanas,
tales como: la dimensión social, en cuanto que es de alguna manera una forma de establecer relaciones entre
los el estudiantado, entre el estudiantado y el profesorado, y entre cuerpo docente, ya que se pregunta por
los valores, el respeto por las personas y el sentido de la justicia [por mencionar algunas] (Santos-Guerra,
2010); una dimensión política (House, 1995), por tanto, no debe ser solo veraz, sino justa, en la medida que
es tomada la mayoría de la veces como un instrumento de poder que, según su uso, llega a determinar la vida
de las personas; una dimensión filosófica, en cuanto que se debe preguntar por el fundamento, la razón de ser
de la acción evaluativa para que no quede reducida a una mera actividad desarticulada, es decir, al plano del
activismo sin ninguna reflexión; de igual manera una dimensión teleológica, es decir, que exista una mirada,
un horizonte claro hacia el cual se quiere llegar, un para qué, que ayude a darle sentido al proceso.
Ahora bien, de acuerdo con la línea teórica definida por personas autoras como Stake (2006), en la cual la
evaluación educativa es el proceso de emitir un juicio de valor con base en evidencias objetivas sobre el mérito y
deficiencias de algo. Del mismo modo, Cordero y Luna (2010) argumentan que “la evaluación comprende dos
componentes: el estudio empírico, determinar los hechos y recolectar la información de manera sistemática;
y la delimitación de los valores relevantes para los resultados del estudio” (p. 193). Justamente esa es la postura
que se asume al inicio de este apartado.
En síntesis, como afirma Cabra-Torres (2014):
la evaluación ha servido de motor para gran parte de los cambios de orientación de los sistemas educativos, en razón de
la información que produce y de los interrogantes que despiertan la gestión y el análisis de los resultados que entrega a la
sociedad (p. 178).
Evaluación del profesorado
En cuanto a la evaluación del profesorado, se concibe como una herramienta de gestión que posibilita
el desarrollo de la carrera docente, en el marco de una institución educativa (en este caso, de educación
superior). En este sentido, implica la recolección de información por parte de los agentes e instancias en las
que se desempeña, en el marco de las funciones universitarias, con el fin de establecer estrategias y actividades
que le sirvan al personal docente para identificar sus fallas y mejorarlas con apoyo de la institución. Al
mismo tiempo, posibilita a las Instituciones de educación superior la implementación de planes de formación
docente que redunden en beneficios para quienes obtienen bajas calificaciones en este proceso, lo cual ha
permitido caracterizar cada vez más ideas acerca de los atributos del buen profesor (Belando-Montoro y AlanísJiménez, 2019). Como última instancia, provee de herramientas a las Instituciones de Educación Superior
(IES) para la toma de decisiones informadas sobre la continuidad o no de un profesor o profesora.
Lo afirmado hasta el momento se encuentra en plena consonancia con los planteamientos de Montoya
y Largacha (2013), Vásquez-Rizo y Gabalán-Coello (2012), Fernández y Coppola (2010), y Luna-Serrano
(2008), quienes destacan que la evaluación de la docencia universitaria implica una amplia diversidad de
agentes evaluadores, en tanto que la profesión académica no se limita únicamente a la función docente,
sino que son amplias y diversas las funciones y roles del profesorado en las universidades. De allí que aún
hoy no haya consenso respecto a cómo evaluarla ni cuál es el mejor método. En consecuencia, como indica
Rueda (2014), “es necesario reconocer la relevancia del rol que puede cumplir la evaluación sistemática del
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
4
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
desempeño docente en la profesionalización y perfeccionamiento permanente del profesorado” (p. 99). Es
decir, se enmarcan en una categoría más amplia como es la profesión académica.
De allí que un error bastante común es considerar la profesión académica desde un ideal de institución
de educación superior, desde el desempeño de la función docente propiamente dicha e incluso desde una
modalidad tradicional, puesto que “la actividad docente no se restringe a la interacción en el aula, existen
otros modelos de enseñanza como la formación en servicio o la educación a distancia” (Rueda, Luna, García
y Loredo, 2011; citado en Rueda, 2014, p. 100)
Así, por ejemplo, al revisar textos clásicos sobre evaluación del maestro y la maestra, se encuentra con que
ya en ellos se enunciaban retos a los que se enfrenta como el incremento de conocimientos, los cambios del
estudiantado, como se indicaba en su momento, “la creciente investigación en la psicología, la sociología y
campos afines, que es pertinente a la enseñanza y aprendizaje” (Simpson, 1967, p. 12), toda vez que tensionan
sus propias prácticas pedagógicas.
Lo planteado hasta el momento evidencia la necesidad de realizar un análisis multidimensional del trabajo
del profesorado, que atiende a diferentes perspectivas de quienes reciben su servicio, e incluso que propios
miembros del cuerpo docente se puedan autoevaluar a partir de los mismos criterios con que son evaluados
externamente, de manera que este ejercicio realmente se haga a partir de aspectos conmensurables. En este
sentido, se debe tener en cuenta la finalidad del proceso evaluativo y autoevaluativo. Es decir, “para que
el maestro adquiera una preparación excelente y su enseñanza alcance un nivel superior, se requiere que
preste continua atención al problema de la autoevaluación y su meta reconocida: el automejoramiento del
maestro” (Simpson, 1967, p. 11).
Ahora bien, en cuanto a estos aspectos metodológicos, se encuentra que la estrategia y el instrumento más
común para realizar la evaluación es el uso de cuestionarios de opinión que responde el estudiantado, los
cuales remiten especialmente a aspectos didácticos y evaluativos, tal como se encuentra en las investigaciones
de Rueda, Luna, García y Loredo (2011; citado en Rueda, 2014) y Litwin (2010), realizadas en México y
Argentina, respectivamente.
Sobre los últimos rasgos mencionados, es pertinente enunciar que, efectivamente, también son utilizados
en la evaluación realizada en la USTA, como se aprecia en algunas referencias a documentos institucionales:
Referentes institucionales
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
5
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
TABLA 1
Referentes institucionales
Fuente: elaboración propia con base en documentos institucionales
Fuente: elaboración propia con base en documentos institucionales
Procedimientos Metodológicos
De acuerdo con diversas personas autoras, especialmente al revisar la obra de Escudero-Escorza (2000, 2006,
2016 y 2019), la investigación evaluativa es concebida como una metodología del ámbito de las ciencias
sociales que se ha fortalecido en las últimas décadas, en tanto que brinda las herramientas suficientes para
la implementación de un ejercicio de evaluación riguroso, con la participación de los directos involucrados.
Ello, en perspectiva de que sea posible la definición de los aspectos que requieren ajustes, modificaciones o
supresiones en el marco del mejoramiento de los procesos o las prácticas donde se requieran implementar
cambios a través de un ejercicio evaluativo más participativo, consciente y pertinente, que contribuya a la
calidad de la educación.
Delimitada de esta forma, se puede afirmar que se ubica en campo de la investigación cualitativa, en tanto
que “incluye formulaciones paradigmáticas múltiples y, también, complejas críticas, epistemológicas y éticas,
a la metodología de investigación tradicional en las ciencias sociales” (Denzin y Lincoln, 2012, p. 24). Es
decir, al centrarse en la evaluación respecto a diferentes campos de conocimiento, advierte en sí misma una
intención de valorar y mejorar los procesos que se dan en su interior. Dicha perspectiva transformadora
enfatiza, como afirma Escudero-Escorza (2016), “la función de esta al servicio del cambio social y, en
concreto, al servicio de la mejora social” (p.14). En ese sentido, la investigación evaluativa en educación
beneficia de forma significativa a todos los agentes e instancias educativas y, por tanto, a las instituciones y
a la sociedad en general.
Dado que en parte su sello diferenciador radicó en la evaluación de programas sociales, en algún punto de su
desarrollo llegó a confundirse con esta actividad. Sin embargo, dada su amplitud y fundamentación terminó
por imponerse la tradición de la investigación evaluativa; “mientras que la evaluación de programas se definió
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
6
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
como la investigación evaluativa directamente aplicada a programas sociales” (Owen y Rogers, 1999, citado
en Escudero-Escorza, 2006, p. 181).
En el mismo texto, Escudero-Escorza (2006) presenta una serie de elementos que permiten identificar la
investigación evaluativa. A continuación, se retoman algunos de estos:
La solución de problemas concretos como finalidad.
Se investiga sobre todo en situaciones naturales.
La observación es la principal fuente de conocimiento.
Se emplean tanto cuantitativos como cualitativos.
Se busca la mejora de programas sociales.
Se informa a los responsables de tomar decisiones sobre programas y prácticas. (pp. 180 - 181)
Asimismo, no se puede soslayar la intencionalidad o los propósitos con los cuales se realizan las evaluaciones
del profesorado, entre los cuales, según un estudio reciente, se cuentan cerca de 15 tipos distintos de
propósitos (Escudero-Escorza, 2019). A pesar de ello, en la actualidad sigue sin haber suficiente consenso
respecto a lo que es “un buen profesor” (Tejedor-Tejedor, 2009, p. 79). Por tanto, cobra sentido el hecho
que sean las propias IES quienes, a la luz de su filosofía institucional y horizonte estratégico, puedan
“determinar el modelo de profesor que se quiere, estableciendo los comportamientos que se consideran
deseables para después analizar en qué medida la conducta del profesor satisface el referente de calidad
establecido” (Tejedor-Tejedor, 2009, p. 93). Lo cual, para el caso de la USTA está claramente identificado
en los elementos destacados de sus documentos institucionales, tal como se evidenció en la Tabla 1.
En función de lo anterior y en atención al proyecto definido se estableció una serie de etapas que hicieron
posible el ejercicio definido desde el alcance de sus objetivos. Como se percibe en la Figura 1, las personas
investigadoras trazaron una ruta metodológica propia para la investigación, compuesta por ocho (8) fases,
la cual recoge, en buena medida, las principales recomendaciones de los expertos en este campo, en el
entendido que “todas las aproximaciones metodológicas son útiles en algún momento y para alguna faceta
evaluativa y que todas tienen sus limitaciones y que en la práctica se requieren generalmente aproximaciones
metodológicas diversas y complementarias” (Escudero-Escorza, 2019, p. 24). Asimismo, dada la naturaleza
del estudio, se establece un muestreo cualitativo, en atención a las fases establecidas y a los agentes e instancias
intervinientes en el proceso, es decir, un tipo muestreo no probabilístico, como se recomienda en estos casos
(Hernández-Sampieri y Mendoza-Torres, 2018). Por tanto, se fijó la muestra por criterios y por conveniencia
(Otzen y Manterola, 2017). Por criterios, porque debido a la estructura orgánica de la universidad fue
necesario garantizar representación de diversos organismos colegiados, y por conveniencia, en cuanto a que
se realizaron invitaciones al grupo de representantes referido más adelante, el cual participó de manera
voluntaria en diversos ejercicios que contribuyeron en las diferentes fases del proceso.
A continuación, se representan los momentos, los cuales son detallados posteriormente en el apartado
sobre resultados y discusión.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
7
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
FIGURA 1.
Metaevaluación de la evaluación docente de posgrados
Fuente: elaboración propia
Análisis y discusión de resultados
Desarrollo de las fases:
1. Análisis contextual . Se realiza la revisión de referentes documentales que constate la necesidad de la
consolidación de un sistema de evaluación, en el cual se encontró que en 2013 la Unidad de Posgrados
realizó un primer ejercicio diagnóstico y según el Informe de Gestión de la Vicerrectoría Académica General
(VAG) - Plan de Acción 2011-2013, se recomendó consolidar el sistema institucional de evaluación docente
a nivel de Posgrados. Por otro lado, se evidencia que entre los años 2014 y 2016 los programas de Posgrados
aplicaron instrumentos de evaluación de manera espontánea, no unificados ni sistemáticos, estos ejercicios
evaluativos no fueron obligatorios y, en general, no contemplaron lo definido en la Dimensión de la Política
Docente ([USTA], 2015), sino criterios establecidos al interior de cada de programa. Se percibe una serie de
dificultades asociadas a la baja participación de docentes y de estudiantes, está última no llegaba al 30 %; lo
que evidenció que buena parte de quienes participaban son estudiantes que perdieron asignaturas.
A la luz de los resultados obtenidos y tal como se muestran en el acápite anterior, cabe resaltar que es
imposible generar una propuesta nueva si se desconocen los esfuerzos previos que ha realizado la institución,
dado que es allí donde se obtienen experiencias e insumos sobre los cuales replantearse dinámicas y alcances
para que, según un contexto con sus particularidades, se logre cumplir con los objetivos académicos y
administrativos de los programas.
2. Revisión de referentes conceptuales e institucionales . Esta actividad fue realizada de manera independiente
y posteriormente se unificaron y discutieron los hallazgos por parte de los equipos de trabajo de Currículo
de la DUAD [3] , en su momento VUAD [4] junto los profesionales de la Unidad de Posgrados de la
sede Principal. A partir de allí, se consolida el proyecto de evaluación docente, que responde a los procesos
planeados en la articulación con la VAG, donde se realizó un primer diagnóstico acerca de las metodologías
e instrumentos aplicados en los diferentes programas de posgrados, en la Sede Principal y de la DUAD. En
este punto se pudo constatar que, incluso, en términos de políticas y procedimientos institucionales, muchos
aspectos que estaban definidos en los documentos, o bien no se conocían o no se hacían efectivos en la DUAD.
3. Análisis de la implementación de la evaluación. Unido a los hallazgos del punto anterior, aquí se encontró
que la socialización de los resultados de su evaluación directamente con el profesorado y el derecho a réplica a
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
8
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
partir de estos, antes de consignarlos de manera definitiva era casi nulo. Asimismo, que un alto porcentaje de
docentes no consultaba los resultados de su evaluación, o lo hacía únicamente como requisito para participar
en las convocatorias de ascenso en el escalafón docente. Finalmente, y quizás, lo más relevante aquí fue
la definición de una evaluación docente personalizada en función del plan de trabajo elaborado por cada
docente al inicio de semestre, lo cual no ocurría y llamó la atención como la mayor oportunidad de mejora
en el nuevo procedimiento a implementar. También se identificó que la condición de profesores, tanto en los
procesos de autoevaluación con fines de renovación de Registro Calificado y Acreditación de Alta Calidad
de los programas de Posgrado, exigían información documental sobre los procesos de evaluación docente de
cara a la definición de planes de mejora y de formación propios de la carrera docente, esto exigía un ejercicio
más riguroso respecto a la trazabilidad del ejercicio docente.
4. Evaluación de la batería de instrumentos utilizada . A partir del trabajo articulado entre la Unidad
de Posgrados y el Equipo de Currículo de la DUAD, se constata la existencia de múltiples instrumentos
utilizados por los programas, diferentes al oficial. Ahora bien, respecto al instrumento oficial definido por
la Unidad de Desarrollo Curricular y Formación Docente [UDCFD], se constató que buena parte de
los descriptores estaban determinados para la modalidad presencial y que incluso no correspondían a las
dinámicas propias de los posgrados. A continuación, se refieren algunos a modo de ejemplo:
Manual del deportista fue divulgado a tiempo y es de total conocimiento
El docente utiliza el gimnasio de la USTA como soporte de la preparación física integral.
Asimismo, se evidenció que los factores relacionados con la integración de las funciones sustantivas desde
el currículo, y que son evidentes para el estudiantado, no se evalúan de manera directa. Es decir, no se evalúan
acciones relacionadas con investigación y proyección social, que son trabajadas desde la docencia.
En este mismo sentido es importante resaltar en implementaciones similares que existen instituciones
que, tal como la Universidad Santo Tomás, cuentan con distintas modalidades de enseñanza dentro de sus
Facultades, esto hace aún más desafiante el reto de construir un único modelo de evaluación, pues el ejercicio
académico y administrativo requiere de especificidades acorde a las necesidades de cada modalidad. Esto
requiere de tiempo y negociaciones entre los diferentes actores participantes del instrumento de evaluación
para lograr que en consenso se acojan las particularidades de cada uno.
5. Formulación del nuevo instrumento de evaluación. Teniendo en cuenta el marco de referencia que ofrece
el documento Dimensión de la Política Docente [DPD], en el cual se definen todos los aspectos de orden
conceptual y metodológico de la evaluación docente, el equipo de trabajo decide acatarlas en su gran mayoría,
especialmente aquellas de orden conceptual. Se contemplan adicionalmente las particularidades del personal
docente que está vinculado por orden de prestación de servicios (OPS), dado que suman un gran número en
los programas de posgrado. En el aspecto metodológico, específicamente en lo referido a los instrumentos y
a la escala de ponderación se proponen los cambios más significativos. A continuación, se presentan algunos
de ellos:
Luego de diversos análisis, el equipo define asumir la escala de valoración de la DPD que contempla
seis escalas de ponderación que van del 0 al 5, correspondiente con los siguientes criterios: 0 No se
cumple, 1 Se cumple insuficientemente, 2 Se cumple con bajo grado, 3 Se cumple medianamente, 4
Se cumple en alto grado y 5 Se cumple plenamente.
La DPD define tres instrumentos: uno para estudiantes; otro para decanos; y uno para docentes.
Los tres son diferentes en cuanto a la redacción de los descriptores, número de descriptores por
aspecto evaluado, etc. Lo anterior se considera una oportunidad de mejora que se asume en la nueva
propuesta. Así, la nueva propuesta consta de los siguientes instrumentos:
I.
nstrumento de evaluación de estudiantes sobre el desempeño docente de posgrado.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
9
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
II.
III.
IV.
Instrumento de evaluación del director o líder del Programa Académico de Posgrado al
Docente.
Instrumento de evaluación del decano al docente, este se utilizaría en caso tal de que el
programa de posgrado no cuente con la figura de director o líder del Programa Académico
de Posgrado.
Instrumento de Autoevaluación Docente de Posgrado
Estos instrumentos están basados directamente en la evaluación que realiza el estudiantado sobre su
desempeño, ya que es este es quien tiene la asignación más alta en la escala de ponderación; la de estudiantes
llega al 50 % y la autoevaluación docente a un 25 %, y la evaluación de la persona decana o directora de
programa el restante 25 %, para un total de 100 %.
En este punto, se remitió al Departamento de Talento Humano el nuevo instrumento propuesto para
evaluar la viabilidad jurídica relacionada con la contratación de personal docente OPS. Además, para efectos
de garantizar el cumplimiento de aspectos señalados en la DPD, desde la UDCFD se asigna a una persona
docente que haga parte del equipo de trabajo para el diseño e implementación de la propuesta de la Evaluación
Docente de Posgrados.
Es importante resaltar que para los programas de posgrado presenciales se logró, en articulación con el
departamento de TIC de la USTA, que la batería de preguntas se filtrara según el plan de trabajo de cada
docente para que solo aparecieran aquellas preguntas que estuvieran relacionadas con las actividades para las
cuales fueron asignadas horas desde la nómina de cada programa (Plan de Trabajo Docente). Para el caso de
la DUAD, debido a una incompatibilidad de sistema, se realizó en un formulario de Google que permitía
tener toda la batería de preguntas con la posibilidad de decir no aplica a la actividad que no hacía parte del
plan de trabajo.
6. Socialización y ajustes con los agentes involucrados. Se realizó la socialización con grupos de representantes
de cada uno de los agentes involucrados, tales como: estudiantes, docentes, comités de currículo de
Facultad (DUAD), personas coordinadoras de Programa, decanas, decanos, vicerrectoras y vicerrectores de
la modalidad distancia (2017-2018). En todos los casos se recibieron las sugerencias y recomendaciones en
términos de forma, redacción y número de descriptores propuestos; las cuales fueron asumidos casi en su
totalidad.
7. Validación de métricas de los instrumentos. Habida cuenta del proceso anterior, se definió la versión
preliminar de los instrumentos para la Evaluación Docente de Posgrados, la cual se validó por parte de pares
académicos y de un experto en psicometría de la Facultad de Psicología. Conforme a la retroalimentación,
se realizaron los respectivos ajustes.
8. Implementación. La implementación comienza con un pilotaje con algunos grupos de estudiantes de los
programas de posgrado de modalidad distancia, que mostraban una continuidad en matrículas, tales como:
Maestría en Didáctica, Maestría en Educación, Especialización en Pedagogía para la Educación Superior,
Especialización en Patología de la Construcción, Maestría en Gestión de Cuencas Hidrográficas.
Además de lo anterior, se realizó una segunda validación del instrumento que se hizo a través de un segundo
ejercicio piloto para la parametrización del Aplicativo Institucional con los posgrados de Maestría en Calidad
y Gestión Integral, Especialización en Administración y Gerencia de Sistemas de la Calidad, Especialización
en Finanzas y Especializaciones en Finanzas y Gerencia Empresarial, todos ellos de modalidad presencial.
En este orden de ideas, parte de la innovación se concentró en la actualización y mejoramiento de la
herramienta de sistematización de los procesos de evaluación docente de posgrados en la Universidad Santo
Tomás, mediante la parametrización del instrumento, el cual consta de dos interfaces: la primera para el
administrador del aplicativo institucional a través de un micrositio, y la segunda para el usuario, en este caso
los actores involucrados (personal directivo, docentes y estudiantes).
Este instrumento genera información confiable y clasificada en 9 tipos de reportes que permiten evidenciar
oportunamente los resultados del proceso de Evaluación Docente en los Posgrados; también se puede acceder
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
10
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
fácilmente a esta información en línea, por parte de los decanos, decanas, directoras, directores y docentes, lo
anterior como suministro para la toma de decisiones en las diferentes instancias institucionales (Dirección
de Investigación e Innovación, 2021).
Después del desarrollo de la actualización de los instrumentos, la validación y la parametrización en
el aplicativo institucional, se continuó con la implementación de esta herramienta únicamente en los
posgrados de la sede principal en Bogotá, dadas las diferencias en los sistemas académicos entre los programas
presenciales y los programas a distancia, así como en las sedes y seccionales.
Así las cosas, el proceso de implementación se ha llevado a cabo desde el primer semestre de 2019 hasta el
segundo semestre de 2020. Sin embargo, dados los ajustes realizados a nivel institucional con ocasión de la
pandemia mundial por la COVID-19, durante el período 2020-1, en los programas presenciales no se realizó
la evaluación docente debido a una decisión de la alta dirección de la Universidad. En virtud de ello, en la
Figura 2, se presenta la información del proceso de aplicación en los diferentes períodos hasta el 2020-2:
FIGURA 2
Relación de participantes en la evaluación docente, período 2019-1.
Fuente: elaboración propia
Como se evidencia en la figura, en contraste con lo afirmado en los apartados anteriores, es notable la
participación en el primer ejercicio de evaluación docente correspondiente al nuevo instrumento, aplicativo
y procedimiento, en todos los casos con cifras superiores al 70 %, cuando anteriormente estas llegaban al 30
%. De esta manera, el grupo poblacional con mayor participación fue el personal directivo del programa con
un 96.43 %, seguido por el docente con un 81,27 % y, finalmente, el de estudiantes con un 72,87 %.
Del mismo modo, en la Figura 3 se puede apreciar que en el 2019-2 el porcentaje de estudiantes que
participaron en el proceso de evaluación docente fue del 69,47%, lo cual da cuenta de una leve disminución
respecto al período anterior, no así en los casos relacionados con el personal directivo del programa y docentes,
donde es notoria la disminución en la participación. En el primer caso, tal contracción está en el orden del
14 % de diferencia, mientras en el segundo es cercano al 6 %, con respecto al período 2019-1. Lo anterior
puede ser evidencia de la importancia de trabajar de forma sistemática en la cultura de la evaluación entre los
miembros de la comunidad académica, además, es posible que la mayor participación del profesorado en el
primer semestre obedece a contar con un insumo indispensable para la convocatoria al ascenso en el escalafón
docente que se realiza en el segundo semestre.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
11
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
FIGURA 3.
Relación participantes evaluación docente período 2019-2
Fuente: elaboración propia
Tal como se observa en la Figura 4, en el 2020-2 del total de estudiantes matriculados, el 68,58 % llevó
a cabo el proceso de evaluación docente. Por su parte, la autoevaluación contó con una participación del
81,86 % del total de docentes y un 78,13 % de las personas directoras de posgrados. Estos porcentajes
evidencian que se mantiene la disminución en la participación del estudiantado, mientras que se presenta la
mayor participación del cuerpo docente desde la implementación del nuevo instrumento y procedimiento.
Asimismo, se recupera la tasa de participación de las personas directoras de programa.
FIGURA 4.
Relación de participantes en la evaluación docente, período 2020-2.
Fuente: elaboración propia
En las Figuras 2, 3 y 4 se observa que, dada la naturaleza voluntaria de la participación por parte del
estudiantado, se ha contado con una participación cercana al 70 % que evidencia la responsabilidad las
personas estudiantes en su proceso de aprendizaje y una conciencia respecto a las implicaciones que tiene su
voz en el mejoramiento de los programas académicos. Para el cuerpo docente, el porcentaje de participación
en su autoevaluación es cercano al 80 % y se considera como una participación lejana del ideal, la cual es del
100 %, dada las directrices institucionales que incentivan al personal docente a participar en su proceso de
calificación. Caso parecido ocurre con las personas directoras de los programas, que, aunque participan en
su gran mayoría, siguen faltando a la participación de la totalidad de ellos en su compromiso con evaluar al
estamento docente de los programas que dirigen.
Finalmente, los resultados obtenidos durante estos 3 periodos académicos dan cuenta de la necesidad de
seguir cultivando la participación de todos los actores con el fin de llegar al 100 % en la participación de todos
los integrantes.
En este mismo orden de ideas, y con el fin de complementar los resultados, en la Figura 5 se observa el
porcentaje promedio de participantes para el corte de los tres períodos de implementación.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
12
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
FIGURA 5.
Porcentaje promedio de participación en la evaluación de docentes de posgrado
Fuente: elaboración propia
Por otra parte, en la Figura 6 se presenta el promedio obtenido en la evaluación global de docentes
de posgrado en la modalidad presencial en los períodos 2019-1 y 2019-2, lo cual da cuenta de una alta
ponderación, en consideración de que la máxima escala posible es 5,0.
FIGURA 6.
Promedio general - Evaluación docente para los periodos 2019-1 y 2019-2
Fuente: elaboración propia
Ahora bien, para el caso de los posgrados de la DUAD, se llevó a cabo la implementación de los nuevos
instrumentos de evaluación docente y la compilación de los datos a través de formularios inteligentes, en este
caso se utilizó la plataforma Google Forms, de la cual se obtuvo la información de la Figura 7:
FIGURA 7.
Participación en la evaluación docente en los posgrados de la DUAD periodos 2019-1 a 2020-2
Fuente: elaboración propia
El estudiantado como principal fuente de información realiza de forma anónima el proceso de evaluación
docente, por lo anterior los datos extraídos de los formularios son con base en el total de docentes asignados
a espacios académicos en los diferentes periodos, es decir, que la participación presente en la Figura 7 se
determinó así:
Para el 2019-1, del total de 115 docentes, el 57 % fue evaluado mínimo por una persona estudiante, el 23
% llevó a cabo su autoevaluación y el 73 % de los directores de posgrado evaluaron al estamento docente;
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
13
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
para el 2019-2, del total de 102 docentes el 53 % fue evaluado mínimo por una persona estudiante, el 55 %
llevó a cabo su autoevaluación y el 64 % de las personas directoras de posgrados la realizaron; para el 2020-1
del total de 96 docentes el 74 % fue evaluado mínimo por una persona estudiante, el 60 % llevó a cabo su
autoevaluación y todas las personas directoras de posgrados evaluaron al equipo docente; para el 2020-2,
del total de 96 docentes, el 69 % fue evaluado mínimo por una persona estudiante, el 70 % llevó a cabo su
autoevaluación y el 72 % de las personas directoras de posgrados evaluaron al profesorado.
TABLA 2.
Disminución de matrícula entre los períodos 2019-1 y 2020-2
Fuente: elaboración propia
En los datos de la Tabla 2 se aprecia la disminución de la matrícula entre los períodos relacionados, lo cual
da cuenta de un fenómeno nacional, marcado por una reducción significativa en la educación superior, la cual
aún hoy es objeto de estudio por parte de las IES, personas investigadoras y el mismo Ministerio de Educación
Nacional. Además, es evidente que, en período de lanzamiento del nuevo instrumento de evaluación, se
presentó la tasa más alta de participación en los tres agentes establecidos. Sin embargo, llama la atención
que la participación de estudiantes y directivos ha venido decreciendo, lo cual implica la implementación de
nuevas estrategias de motivación y divulgación por parte de la Unidad de Posgrados, para poder retomar los
índices iniciales e incluso llegar a mejorarlos. Este fenómeno no ocurre con cuerpo docente, quienes muestran
una participación sostenida durante los períodos, lo cual puede ser atribuible al interés que representan estos
resultados en aspectos relacionados con el escalafón docente.
Conclusiones
El objetivo principal de la investigación fue realizar un ejercicio de metaevaluación de la evaluación docente
de posgrado de la USTA, de manera que, a partir de allí, se pudiera proponer un nuevo procedimiento y
batería de instrumentos de evaluación unificado para todos los programas de posgrado. De allí que en 2019
se implementó el aplicativo institucional de Evaluación Docente de Posgrados en todos los programas de
posgrados Bogotá de la sede principal. Esto permitió contar con información confiable, oportuna y clasificada
para tomar decisiones y formular los distintos planes de acción según las directrices institucionales vigentes.
Por los planteamientos realizados a lo largo del texto, se afirma que se cumplió el objetivo propuesto, al
tiempo que se espera que la evaluación docente adquiera mayor relevancia en el mejoramiento de las prácticas
pedagógicas en relación con su plan de trabajo; mejorar los resultados de aprendizaje en el estudiantado;
caracterizar mejor al personal docente para replantear la asignación de funciones en perspectiva de potenciar
las capacidades, y establecer planes de formación que permitan incidir directamente en los aspectos a mejorar.
Es decir, se espera que, a partir de las buenas prácticas en la implementación de la nueva propuesta de
evaluación docente, se impacte directamente en la toma de decisiones, de forma que antes de determinar
la salida de docentes de la Universidad, se logre aprovechar realmente los resultados de la evaluación en
la adecuada ubicación de ellos en las funciones que realizan mejor; al tiempo que se capaciten en aquellos
aspectos para los que su formación previa, su experiencia o simplemente la ausencia de ella, hayan llevado a
bajos resultados en los procesos de evaluación. Lo anterior, enmarcado plenamente en la cualificación de la
profesión académica al interior de la Universidad Santo Tomás.
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
14
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
Un aspecto importante dentro del proceso fue evidenciar que en las Universidades Multicampus, como es
el caso del estudio, aunque desde la gestión directiva se realizan esfuerzos significativos en el establecimiento
de políticas y procedimientos de carácter institucional, en ocasiones la inclusión de las tres modalidades de
la oferta educativa de la USTA en estos aspectos suele ser un elemento que requiere la implementación de
oportunidades de mejora. Ello, en tanto que en la actualidad la USTA cuenta con más de 34.000 estudiantes a
nivel nacional, lo que requiere sinergias entre sedes, seccionales y DUAD, para que se logre llevar a cabalidad la
planeación institucional que obedece en este caso al Plan Integral Multicampus, que proyecta la Universidad
hasta 2027.
La existencia de un nuevo procedimiento y batería de instrumentos, y su implementación, permitió a los
posgrados de la USTA contar con información y acceder a nuevos escenarios de participación para para
la toma de decisiones en las diferentes instancias institucionales. Asimismo, el seguimiento a la aplicación
periódica de estos instrumentos permitió disminuir la brecha de complejidad de la cultura evaluativa
universitaria, que presentaba dificultades para sintonizar a los actores y el proceso de evaluación.
Finalmente, es importante recomendar el apoyo de las directivas a este tipo de investigaciones que implican
ejercicios de metaevaluación de las prácticas, procesos y procedimientos de la vida universitaria, en tanto que
este tipo de acciones requieren buena disposición, recursos y toma de decisiones respecto a las conclusiones
o innovaciones que deriven de ellas.
Finalmente, se recomienda a todas las personas que se acerquen a un ejercicio similar al presente,
propiciar una cultura institucional donde se reconozca la importancia de la evaluación en todos los niveles y
participantes, no solo académicos sino también administrativos, ya que este proceso permea todas las áreas y
permite alcanzar niveles de calidad que potencian las instituciones a lo largo del tiempo, pues se deja bien en
claro que la evaluación no se reduce a un ejercicio puntual, sino que abarca un proceso de constante cambio
que implica una revisión periódica de sus instrumentos y procedimientos.
Referencias
Belando-Montoro, M. y Alanís-Jiménez, J. F. (2019). Perspectivas Comparadas entre los Docentes de Posgrado de
Investigadores en Educación de la UNAM y la UCM. REICE: Revista Iberoamericana sobre Calidad, Eficacia y
Cambio en Educación, 17(4), 93-110. https://doi.org/10.15366/reice2019.17.4.005
Cabra-Torres, F. (2014). Evaluación y formación para la ciudadanía: una relación necesaria. Revista Ibero-Americana
De Educação, (64), 177-193. https://doi.org/10.35362/rie640413
Casanova, M. (2007). Manual de Evaluación Educativa. (9ª Ed.). La Muralla.
Cordero, G. y Luna, E. (2010). Revista Iberoamericana de Evaluación Educativa, 3(1e), 191-202. https://revistas.ua
m.es/riee/article/view/4503/4927
Dirección de Investigación e Innovación. (2021). Certificado de innovación de procedimiento y servicio. Universidad
Santo Tomás.
Denzin, N. K. y Lincoln, Y. S. (Coords.) (2012). Manual de investigación cualitativa. (Vol. 1). El campo de la
investigación cualitativa. Gedisa, S.A.
Escudero-Escorza, T. (2000). La evaluación y mejora de la enseñanza en la universidad: otra perspectiva. Revista de
Investigación Educativa, 18(2), 405-416. https://revistas.um.es/rie/article/view/121071/113761
Escudero-Escorza, T. (2003). Desde los test hasta la investigación evaluativa actual. Un siglo, el XX, de intenso
desarrollo de la evaluación en educación. Relieve, 9(1), 11-43. https://ojs.uv.es/index.php/RELIEVE/article/v
iew/4348/4025
Escudero-Escorza, T. (2006). Claves identificativas de la investigación evaluativa: análisis desde la práctica. Contextos
Educativos, 8(9), 179-199. https://redined.educacion.gob.es/xmlui/handle/11162/47847
Escudero-Escorza, T. (2016). La investigación evaluativa en el Siglo XXI: Un instrumento para el desarrollo educativo
y social cada vez más relevante. RELIEVE, 22(1), 1-20. http://dx.doi.org/10.7203/relieve.22.1.8164
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
15
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644
Escudero-Escorza, T. (2019). Evaluación del Profesorado como camino directo hacia la mejora de la Calidad Educativa.
Revista de Investigación Educativa, 37(1), 15–37. https://doi.org/10.6018/rie.37.1.342521
Escudero-Muñoz, J. M. (2010). La selección y la evaluación del profesorado. Revista Interuniversitaria de Formación
del Profesorado, 24(2), 201-221. https://www.redalyc.org/articulo.oa?id=27419198010
Fernández, N. y Coppola, N. (2010). La evaluación de la docencia universitaria desde un abordaje institucional. Revista
Iberoamericana de Evaluación Educativa, 3(1), 37-50. https://repositorio.uam.es/handle/10486/661582
Fernández, N. y Coppola, N. (2012). Aportes para la reflexión sobre la evaluación de función docente universitaria.
Revista Iberoamericana de Evaluación Educativa, 5(1e), 106-119. https://revistas.uam.es/riee/article/view/4430
Guba, E. G. y Lincoln, Y. S. (1989). Fourth Generation Evaluation [Evaluación de cuarta generación]. Sage
Hernández-Sampieri, R. y Mendoza-Torres, C. P. (2018). Metodología de la investigación: las rutas cuantitativa,
cualitativa y mixta. McGraw-Hill Interamericana Editores, S.A.
House, E. (1995). Evaluación, ética y poder. Morata.
Litwin, E. (2010). La evaluación de la docencia: plataformas, nuevas agendas y caminos alternativos. Revista
Iberoamericana de Evaluación Educativa 2010, 3(1), 51-59. https://revistas.uam.es/riee/article/view/4504
Luna-Serrano, E. (2008). Evaluación en contexto de la docencia en posgrado. . REencuentro. Análisis de Problemas
Universitarios, 75-84. https://www.redalyc.org/pdf/340/34005307.pdf
Mesa-Angulo, J. (2020). La Santo Tomás: una universidad país. Ediciones USTA. https://repository.usta.edu.co/ha
ndle/11634/29077?show=full
Montoya, J. y Largacha, E. (2013). Calidad de la educación superior: ¿Recursos, actividades o resultados? En L. OrozcoSilva (Ed.), La educación superior: retos y perspectivas. (pp. 379-417). Ediciones Uniandes.
Murillo-Torrecilla, F. (2008). La evaluación del profesorado universitario en España. Revista Iberoamericana de
Evaluación Educativa, 1(3), 29-45. https://repositorio.uam.es/handle/10486/661532
Otzen, T. y Manterola, C. (2017). Técnicas de Muestreo sobre una Población a Estudio. International Journal of
Morphology, 35(1), 227-232. https://scielo.conicyt.cl/pdf/ijmorphol/v35n1/art37.pdf
Ramírez-Garzón, M. I. y Montoya-Vargas, J. (2014). La evaluación de la calidad de la docencia en la universidad: Una
revisión de la literatura. REDU. Revista de Docencia Universitaria, 12(2), 77-95. https://riunet.upv.es/handle/
10251/139977
Rossett, A. y Sheldon, K. (2001). Beyond the Podium: Delivering Training and Performance to a Digital World. [Más
allá del podio: brindar capacitación y rendimiento en un mundo digital]. Jossey-Bass/Pfeiffer.
Rueda, M. (2014). Evaluación docente: La valoración de la labor de los maestros en el aula. Revista Latinoamérica de
Educación Comparada, 5(6), 97-106. http://www.saece.com.ar/relec/revistas/6/art1.pdf
Santos-Guerra, M. (2010). La evaluación como aprendizaje: una flecha en la diana. (3a ed.). Bonum.
Stake, R. (2006). Evaluación comprensiva y evaluación basada en estándares. Editorial Graó.
Saravia-Gallardo, M. A. (2004). Evaluación del profesorado universitario. Un enfoque desde la competencia profesional
[Tesis Doctoral, Universidad de Barcelona]. https://dialnet.unirioja.es/servlet/tesis?codigo=3411
Scriven, M. (1967). e methodology of evaluation [La metodología de la evaluación]. En R. Tyler, R. Gagné y M.
Scriven (Eds.), Perspectives of curriculum evaluation. (pp. 39-83). Rand McNally.
Simpson, R. H. (1967). La autoevaluación del maestro. (E. F. Setaro, Trad.). Paidós.
Stufflebeam, D. L. y Shinkfield, A. J. (1987). Evaluación sistemática. Guía teórica y práctica. Paidós-MEC.
Tejedor-Tejedor, F. J. (2009). Evaluación del profesorado universitario: enfoque metodológico y algunas aportaciones
de la investigación. Estudios Sobre Educación, (16). https://dadun.unav.edu/handle/10171/9169
Tejedor-Tejedor, F. J. y Jornet-Meliá, J. M. (2008). La evaluación del profesorado universitario en España. Revista
electrónica de investigación educativa, 10 (SPE), 1-29. https://redie.uabc.mx/redie/article/view/199
Universidad Santo Tomás. (2004a). Estatuto Docente. Ediciones USTA
Universidad Santo Tomás. (2004b). Proyecto Educativo Institucional (3a ed.). https://usantotomas.edu.co/documen
tos-institucionales
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
16
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid...
Universidad Santo Tomás. (2010a). Dimensión de la Política Docente. https://usantotomas.edu.co/documentos-inst
itucionales
Universidad Santo Tomás. (2010b). Modelo Educativo Pedagógico. USTA Ediciones.
Universidad Santo Tomás. (2015). Documento Marco Desarrollo Docente. USTA Ediciones.
Vásquez-Rizo, F. E. y Gabalán-Coello, J. (2012). La evaluación docente en posgrado: variables y factores influyentes.
Educación y Educadores, 15(3), 445-460. https://www.redalyc.org/pdf/834/83428627006.pdf
Notas
[1] Este artículo es resultado del proceso de metaevaluación adelantado sobre la Evaluación Docente (Procedimientos e
Instrumentos), realizado por la Unidad de Posgrados y el Equipo de Currículo de la Facultad de Ciencias y Tecnologías,
Universidad Santo Tomás. Bogotá, 2017-2020.
[2] Primera universidad privada del país en obtener la Acreditación Institucional de Alta Calidad en la modalidad
Multicampus (Resolución número 01456 del 29 de enero de 2016, MEN).
[3] DUAD: División de Educación Abierta y a Distancia.
[4] VUAD: Vicerrectoría de Educación Abierta y a Distancia
Información adicional
Cómo citar: Patiño-Montero, F., Godoy-Acosta, D. C. y Arias-Meza, D. C. (2022). Actualización de la
evaluación docente de posgrados en una universidad multicampus. Experiencia desde la Universidad Santo
Tomás (Colombia). Revista Educación, 46(2). http://doi.org/10.15517/revedu.v46i2.47955
PDF generado a partir de XML-JATS4R por Redalyc
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
17
Çukurova Üniversitesi Eğitim Fakültesi Dergisi
Vol: 48 Numb: 2 Page: 1299-1339
https://dergipark.org.tr/tr/pub/cuefd
Analyzing Academic Members’ Expectations from a Performance
Evaluation System and Their Perceptions of Obstacles to Such an
Evaluation System: Education Faculties Sample
Gürol YOKUŞ a*, Tuğba YANPAR YELKEN b
a
Sinop Üniversitesi, Eğitim Fakültesi, Sinop/Türkiye
Mersin Üniversitesi, Eğitim Fakültesi, Mersin/Türkiye
b
Article Info
DOI: 10.14812/cufej.467359
Article history:
Received 04.10.2018
Revised
25.03.2019
Accepted 18.10.2019
Keywords:
Performance evaluation,
Quality in higher education,
Accountability.
Abstract
The assesment and evaluation of academic members in faculties in a systematic way is
a crucial issue because higher education institutions put a large emphasis on a
transparent, efficient and successful management. This study aims to conduct a mixed
(quantitative and qualitative) research about the expectations of Education Faculties’
academic members about a performance evaluation approach and the obstacles to such
an evaluation system. Convergent parallel mixed method design has been preferred as
research model. “Expectations from performance assessment” subscale and “barriers
to performance assessment” subscale have been used as data collection tools which are
developed by Tonbul (2008). Independent Samples t-test and ANOVA are used for
analysis of quantitave data; and content analysis is used for analysis of qualitative data.
As a result of this study, it is found out that academic members have a moderate level
of expectations from a performance evaluation approach. The highest expectations
belong to assistant professors while the lowest belong to professors. The mostly agreed
expectations of academic members from a performance evaluation approach are found
to be “developing a consensus about the criteria of an effective academician, affecting
professional development of academic members positively and increasing workload of
academic members”. The most frequent obstacles to a performance evaluation
approach emerged as “current organizational mechanism of higher education
institutions” and “workload of faculty academic members”. The scores of both
expectations and obstacles significantly differ depending on “taking academic incentive,
work experience in higher education, academic title and satisfaction level of
academicians from their institutions”. As a result of qualitative analysis, there emerge
many themes and codes related to a performance evaluation system. In “Attitude
Towards Performance Approach” theme, the most frequent codes appeared to be
“adopters, doubters”. In Academicians’ Priorities theme, the codes emerged as
“research and publications, evaluation of quality of instruction, advisory for
undergraduates and postgraduates”; In Positive Effects theme, the codes emerged as
“motivation, financial support, search of quality”; In Negative Effects theme, the codes
emerged as “intra-institutional rivalry, academic dishonesty”; In Obstacles theme, the
codes emerged as “intense workload, lack of instrintic motivation”; and finally In
Suggestions theme, the codes emerged as “more officer employment, institutional
support for academic efforts and research publishings”.
Eğitim Fakültesi Öğretim Elemanlarının Performans Değerlendirme
Yaklaşımından Beklentileri ve Performansın Önündeki Engellere İlişkin
Görüşlerinin İncelenmesi: Karma Yöntem Araştırması
*
Makale Bilgisi
Öz
DOI: 10.14812/cufej.467359
Öğretim elemanlarının performansının sistematik şekilde ölçülmesi ve değerlendirilmesi
yükseköğretim kurumlarının kalitesi için önemlidir. Bu çalışmanın amacı, çeşitli devlet
üniversitelerinin Eğitim Fakültelerinde görev yapan öğretim elemanlarının performans
değerlendirme yaklaşımından beklentileri ve performans değerlendirmenin önündeki
Author: gurolyokus@gmail.com
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Makale Geçmişi:
Geliş
04.10.2018
Düzeltme 25.03.2019
Kabul
18.10.2019
Anahtar Kelimeler:
Performans değerlendirme,
Yükseköğretimde kalite,
Hesap verebilirlik.
engellere ilişkin görüşlerinin nicel ve nitel olarak incelenmesidir. Bu araştırma
kapsamında karma araştırma yöntemlerinden yakınsayan paralel karma desen tercih
edilmiştir. Veri toplama aracı olarak Tonbul (2008) tarafından geliştirilen “Performans
Değerlendirme Yaklaşımına İlişkin Beklentiler” altölçeği ve “Performans Değerlendirme
Sisteminin Önündeki Engeller” altölçeği kullanılmıştır. Nicel veriler için ilişksiz
örneklemler t-testi, ANOVA; nitel veriler için içerik analizi tercih edilmiştir. Çalışmanın
sonucunda, öğretim elemanlarının performans değerlendirme yaklaşımıyla ilgili
beklentilerinin orta düzeyde olduğu, performans değerlendirme yaklaşımıyla ilgili en
yüksek beklentiye sahip olanların doktor öğretim üyeleri, en düşük beklentiye sahip
olanların ise profesörler olduğu ortaya çıkmıştır. Performans değerlendirmenin
önündeki en önemli iki engelin ise yükseköğretim kurumlarının mevcut örgütsel işleyişi
ve öğretim üyelerinin iş yükü olduğu görülmüştür. Performans Değerlendirmeye İlişkin
Beklentiler ve Engellerle ilgili puanlar “akademik teşvik alma, çalışma deneyimi,
akademik unvan ve kurumdan memnuniyet düzeyi”ne göre anlamlı farklılık
göstermektedir. Nitel analiz sonucunda ise en sık tekrar eden kodlara bakıldığında ise
Değerlendirmeye Karşı Tutum temasında “benimseyenler, şüpheyle yaklaşanlar”;
Akademisyenlerin Öncelikleri temasında “akademik yayınlar”, “öğretimin kalitesinin
değerlendirilmesi”, “lisans ve lisansüstü danışmanlık”; Olumlu etkileri temasında
“motivasyon”, “maddi destek”, “kalite arayışı”; Olumsuz etkileri temasında “kurum içi
rekabet, akademik sahtekarlıklar”; Engeller temasında “yoğun iş yükü, içsel motivasyon
eksikliği”; Öneriler temasında ise “memur istihdamı, yayın ve çalışmaların kurumca
desteklenmesi” kodları ortaya çıkmıştır.
Introduction
Nowadays, many organizations focus on making a systematic performance evaluation of its members
for a transparent, efficient and successful management. In higher education, public or private universities
make effort to produce a reliable evaluation system. The higher education instutions feel the necessity to
identify performance indicators and announce their level of achieving mission and strategies due to a
variety of reasons such as global competitiveness and society pressure for transparency (Hamid, Leen, Pei
& Ijab 2008). Especially in competitive environment of 21 st century, a better performance evaluation
system creates advantages for universities and it offers opportunities for evaluating their own running
process and members in a more effective way.
When literature is reviewed, it is noticed that there are discussions related to accountability of higher
education institutions. The base of discussions focuses on evaluation of performance of institutions and
making a public announcement of the results involving stakeholders’ views. Also, universities are criticized
for their academic members behaving like ivory towers as a closed society (Glaser, Halliday, & Eliot, 2003).
The criticisms are summarized by Esen and Esen (2015):
The research conducted by academic members doesn’t focus on societal problems.
Their studies are too much theoretical.
Societal resources are wasted in vain (Etzkowitz, Webster, Gebhardt, & Terra, 2000).
Research are not transformed into communal, and they are conducted esoterically.
The identities of academicians transform into individuals with constricted autonomy who were
worried about disturbing university or administrative structure (Elton, 1999).
Higher education institutions should not be viewed as unamenable organizations and institutions,
although they function as autonomous. Higher education institutions have the power to influence the
society, economic structure and social life to which they belong. Therefore, instead of being ivory towers,
universities should take science, society and nation together and perform at international quality
standards and feel a conscientious responsibility to prioritize social benefit rather than career
development. Vidovich and Slee (2001) claims that it is necessary to make performance evaluations in
universities for the following reasons:
accountability to customers (continuous improvement activities for scientific research),
accountability to government (efficient and productive use of resources),
1300
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
accountability to students and society (providing comprehensive educational experiences,
providing vocational training to improve the quality of life, meeting the labor force needs of the
society).
Since the beginning of 21st century, higher education has gone through significant changes. UNESCO
(2004) makes a list of the global developments which provide new inferences for higher education
institutions: i) the emergence of new education providers such as multi-national companies, corporate
universities, and media companies; ii) new forms of delivering education including distance, virtual and
new face-to-face, such as private companies; iii) greater diversification of qualifications and certificates;
iv) increasing mobility of students, programmes, providers and projects across national borders; v) more
emphasis on lifelong learning which in turn increases the demand for postsecondary education; and vi)
the increasing amount of private investment in the provision of higher education. Considering all these
developments, higher education institutions have the capacity of affecting the society, economic
structure and social life. Therefore, they are expected to make performance at international quality
standards considering science, community and nation altogether instead of being ivory towers and they
are expected to prioritize societal benefit as well as carreer development. Vidovich and Slee (2001)
emphasize that making a performance evaluation in universities is necessary in terms of accountability to
members (sustainable enhancement efforts for scientific research), accountability to government
(efficient and creative use of resources) and accountability to students&society (providing extensive
educational experience, providing professional education for increasing life quality, meeting the need of
society’s workforce).
Performance evaluation in higher education involves a variety of products and processes. In its
essence, performance evaluation indicates the minimum acceptable level in terms of quality and it
provides opportunity for identifying strenghts and weaknesses of individuals and institutions. In this way,
individuals and institutions not only become aware of their weakness, but also recognize at what aspects
they are good at. Batool, Qureshi and Raouf (2010) state that performance evaluation might not include
all dimensions of this concept and performance evaluation of an institution does not mean the same thing
with assessing academic programs, courses or the quality of graduates. They pointed out that
performance evaluation of an institution mean assessing the current situation in terms of the quality and
effectiveness of the institution.
Within the context of this study, performance evaluation in higher education is defined as «assessing
the professional qualifications of academic members related to their instructional roles and their level of
contribution to accomplishing institutional goals. Therefore, a performance evaluation system is
necessary for three purposes: assessing academic members’ a variety of studies such as research,
academic service, instruction and publications, offering them a comprehensive feedback supporting their
self-development and valuing their current performance. Vincent (2010) points out the advantages of a
performance evaluation approach in higher education:
• Development and progression of individuals stand on realistic goals.
• It creates conformity between individuals’ goals and institution’s goals.
• It helps to identify the strengths and weakness of individuals within an organization.
• It works as a feedback mechanism for purpose of enhancement.
• It helps to identify which courses and instruction are needed.
• It helps the institution to take a major role and responsibility in terms of education, society,
economics and politics.
Tonbul (2008) claims that performance evaluation practices increase the accomplishment level of
institutional goals, help to identify failing issues in organizational process and provide specific data about
the organizational climate and culture’s effect on members; which in turn lead to an increase in
institutional performance. It is seen that organizations become more successful and lasting which make
1301
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
an effective and functional use of feedback mechanism in processes related to workflow and organization
(Latham & Pinder, 2005). Kalaycı (2009) attracts the attention that it is very unlikely to predict success or
failure in higher education without a proper evaluation; however, with evaluation of educational
performances of academic members, it becomes open to criticism by other stakeholders and this situation
is challenging. This issue might result in negative circumstances. For instance, Kim et al. (2016) claim that
a large number of professors put a low emphasis on their role of educator while putting a greater
emphasis on their researcher identity; because faculty evaluation systems are mainly based on research.
In order not to cause negative consequences, performance evaluations should not be done for fulfilling
formality or obligation. This threat is especially valid for public universities funded by government. Kalaycı
and Çimen (2012) attract the attention that public universities need quality studies from now on and it
emerges as a necessity for them to perform institutional quality process practices not just for purpose of
formality but increasing quality and standing out in this competitive environment.
The major reasons which encourage universities to make performance evaluation in 21st century
emerge as institutional image and reputation, internationalization and global university rankings. There
are many factors affecting institutional reputation and image. In a report published by Higher Education
Authority (2013), it appears that academic members are closely interested in their field of expertise which
indicates that they are continually following studies which are conducted in litearature review. When it
comes to internationalization, an institution’s including both national and international students and
academic members indicates that it has a global identity and is ready for global competitiveness in global
market (O'Connor et al., 2013). However, the number of students and academic members is not a
sufficient indicator for quality. The quality of academic members and the quality of their teaching
performance should also be assessed because they affect the the quality of education and they are
regarded as assurance for quality control (Açan and Saydan, 2009).
When literature is reviewed, it is noticed that the most frequently used performance assessment and
evaluation techniques in higher education are Self-Assessment, Key Performance Indicators (KPI), Relative
Evaluation, Appraisal, Six Sigma and Total Quality Management (Çalışkan, 2006; Kalaycı, 2009; Paige,
2005). All of these techniques might not be appropriate for assessing individual performances of academic
members. For instance, performance comparison technique involves evaluating the current performance
of an individual with performance of another one who is accepted as leader within the same context. This
might not be inappropriate for evaluating academic members’ performance because it is strictly
dependent upon excellence of quality; however, each individual differs from each other in terms of
working style and self-development. Among these techniques, Key Performance Indicators stand out as a
convenient way as an evaluation method in higher education. In KPIs, performance indicators are
operationally defined and it is specified which operations constitute a concept.
When current practices are reviewed related to performance evaluation in Turkish higher education,
it is criticized that there is only made a quantitative assessment of academic members’ research and
publications and the evaluation is based on subjective judgements (Esen and Esen, 2015). In this regard,
Council of Higher Education started academic incentive system in 2015 to increase academic members’
motivation in Turkey and to support their academic activities financially (Academic Incentive Grant
Regulation, 2015). Within this academic incentive regulation, the performance of the academic staff is
evaluated by the Council of Higher Education based on their national and international projects, research,
publications, exhibitions, patents received, references to their studies, and academic awards received. As
a result, faculty members who perform sufficient work are financially supported. Apart from academic
incentive, there is a variety of performance evaluations of academic members in Turkish higher education
system such as:
a) Registry system
b) Academic promotion and appointment criteria
c) Questionnaires of Academic Member Evaluation
d) Annual reports
1302
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
e) Surveys of Student Views
(Esen & Esen, 2015)
Performance evaluation in higher education is very important in terms of increasing the effectiveness
of the services provided; however, the criteria and reliability of this process are as important. In this
regard, Çakıroğlu, Aydın and Uzuntiryaki (2009) state that there are very promising studies which indicate
the reliability of the evaluations made by experienced faculty members and they emphasize that the
following criteria should be taken into consideration during evaluation:
collecting data from various sources relaed to teaching performance (such as colleagues,
students, advisors, master students, graduates) and in different formats (student assessment
surveys, student interviews, observation results, course materials, student products, etc.),
clearly identifying evaluation criteria,
informing about evaluation process,
informing the assessors on how to make an assessment,
the candidates not playing an evaluative role,
random selection of the assessors among those who meet the criteria,
minimum 3 and maximum 5 members taking part in jury.
The basis of the evaluation of the performance of faculty members is to increase the effectiveness of
universities. There is increasing pressure on national and global universities to systematically perform
performance evaluations due to concepts such as quality, efficiency, effectiveness, accountability. The
reason why education faculties are preferred in this study is that Higher Education Council of Turkey
emphasizes accreditation studies especially in education faculties within the scope of “Bologna Process.
Higher education institutions in Turkey aim to increase their accountability as a quality indicator and
inform the internal and external stakeholders of the current situation. In order to prove that they have
accomplished their mission and vision within this scope, universities carry out performance evaluation
studies of the instructors and present this to the knowledge of the public, students, families, government
and private sector. In the accreditation process carried out in education faculties, it is important to identify
academic staffs’ expectations and barriers for performance assessment. Therefore, while performance
evaluation is so important for higher education institutions, research is needed to determine the
expectations of the instructors whose performance is evaluated.
Within the context of this study, there is made a quantitative and qualitative analysis of Education
Faculty academic members’ expectations from a performance evaluation system and the obstacles to
such an evaluation system. The following research questions are attempted to be answered:
1. What are the expectations of academic members in Education Faculties from a performance
evaluation system?
1.1. Do the expectations of academic members from a performance evaluation system differ
depending on following as variables: academic title, academic experience, academic incentive status and
satisfaction from institutions?
2. What are the perceptions of academic members in Education Faculties related to obstacles to a
performance evaluation system?
2.1. Do the perceptions of academic members related to obstacles to a performance evaluation
system differ depending on following variables: academic title, academic experience, academic incentive
status and satisfaction from institutions?
3. What are general views of academic members in Education Faculties related to performance
evaluation system?
1303
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Method
Convergent parallel mixed design has been preferred as research model in this study. Quantitative and
qualitative data were collected simultaneously, analyzed independently and then they were converged in
discussion. There is an equal emphasis on both quantitative and qualitative part in convergent mixed
design and there is made independent analysis and eventually interpretations are made using both data
(Creswell and Plano Clark, 2014). The Figure 1 shows the mixed design used in this research:
Descriptive
statistics
t-test and ANOVA
Qantitative data
collection and
analysis
Qualitative data
collection and
analysis
Interpretations of
both quantitative
and qualitative
analysis
Content
Analysis
Şekil 1. A model for a convergent parallel design in mixed research studies
Participants
The data of this study were collected in 2018 from academic members in Education Faculties in Turkey
including dr. research assistants, assistant professors, associate professors and professors. Participants
are from different regions of Turkey including Marmara, Black Sea, Egean, Mediterranean and East
Anatolia. The instructors who have too much course load are not included in the study group and data are
collected only from the faculty members who completed their doctoral education. Within the context of
this study, convenient sampling technique was used for quantitative data for sample selection and data
were obtained from 104 academic members in six universities who agreed to participate in this research.
For qualitative data, participants were selected with maximum diversity sampling technique for purpose
of collecting all kinds of different views about the current situation which is among purposeful sampling
techniques. Qualitative data were obtained from 50 academic members in Education Faculties.
Quantitative phase includes 25 dr. research assistants, 35 assistant professors, 31 associate professors
and 13 professors. Since convenient sampling is used, sampling is not made according to the department
criteria; but ultimately, 22 percent of participants teach in Science Education Department, 11 percent
teach in Pre-School Education Department, 28 percent teach in Educational Sciences Department and 31
percent teach in Primary School Teaching Department. In qualitative phase, samples include 13 research
assistants, 17 assistant professors, 15 associate professors and 5 professors. Maximum diversity has been
achieved according to academic title and department variable. 20 percent of participants teach in Science
Education Department, 10 percent teach in Pre-School Education Department, 40 percent teach in
Educational Sciences Department and 30 percent teach in Primary School Teaching Department.
Data Collection Tool
In this study, personal information form, “Expectations from Performance Evaluation Approach”
subscale with 4-likert 16 items and “Obstables to Performance Evaluation Approach” subscale with 10
items developed by Tonbul (2008) were used for data collection. Exploratory factor analysis and varimax
rotation were applied for scale development. The internal consistency reliability related to subscale
1304
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
“Expectations from Performance Evaluation Approach” was found to be .92, and subscale “Obstables to
Performance Evaluation Approach” found to be .87. The internal consistency of these subscales was
recalculated in this study and reliability of the first subscale appeared as .84 and second subscale as .78.
If Cronbach Alpha Coefficient value - an indicator of homogeneity between scale items- is between .60.80, it is an evident of high reliability (Tonbul, 2008). The items in these subscales are accumulated in one
factor and this one factor explains fifty-six percent of total variance.
Also, a questionnaire with open-ended questions was developed for purpose of supporting
quantitative data and making a deeper analysis. A professor from Educational Sciences Department, an
Associate Professor from Assessment and Evaluation Department and a Professor who works as an expert
in higher education quality studies analyzed the questions and made some suggestions. The questions
were revised in light of these suggestions. The final form of questions includes:
2.1. What do you think about making a periodic and data-based assessment of academic members?
2.2. What criteria should be assessed within performance evaluation? Could you order these criteria
according to significance level for you?
2.3. What are the positive and negative consequences of making a performance evaluation of
academic members?
2.4. What are the obstacles to performance of academic members in higher education and what do
you suggest for overcoming these obstacles?
Data Analysis
The equality of variances and normality of data were checked in order to identify the analysis method
for quantitative data. The skewness and kurtosis values ranged from -1 to +1 which indicated that data
distributed normally. Also, sample size was bigger than 50 (N=104); therefore, Kolmogorov Smirnov test
was done for normality of data and it was found not to be significant (p>.05) which was an indicator of
normality. As a result, parametric tests were used in the study. Independent Samples T test was done for
checking whether there was a significant difference between participants in terms of academic incentive
variable. One Way of Variance Analysis (ANOVA) was done for checking whether there was a significant
difference between participants in terms of variables of work experience, academic title and satisfaction
from institution.
Inductive content analysis was done for analyzing qualitative data. Rater reliability agreement
percentages were identified by investigating academic members’ views collected by open-ended
questions. Academic members’ views collected by questionnaire were coded by researcher and one
independent expert. Miles and Huberman (1994)’s reliability formula was used for calculation of
agreement percentages.
Reliability = Agreement/(Agreement + Disagreement)
The interrater-reliability related to all codes identified by two raters was found to be 0.89. It is possible
to assert that reliability is met for data analysis because %80 and above agreement percentage is accepted
as sufficient (Mokkink et al., 2010). In this study, there has been used a variety of validity strategies listed
by Creswell (2003) which are frequently used in qualitative research methods such as “Members’ Check,
“External Audits”, “Rich, Thick Description” and “Chain of Evidence”. The participants were asked whether
the findings of the study reflect their own ideas correctly, an independent expert who had little contact
with the study participants and who knew the method of study was consulted and this study remained as
loyal to the nature of the data as possible with direct quotations.
1305
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Findings
3.1 Findings Related to Expectations From a Performance Evaluation System
The first research question of this study “What are the perceptions of academic members related to
their expectations from a performance evaluation system?” was attempted to be answered. Table 1
presents the general score mean of participants and Table 2 presents the score means depending on
academic titles.
Table 1.
The General Score Mean Related To Academic Members’ Expectations from a Performance Evaluation
System
General score mean
Expectation Subscale
of
N
Minimum
Maximum
Mean
104
1,50
3,31
2,3023
Standard Deviation
,43859
In Table 1, when score means of academic members are reviewed, it is seen that their expectations
from a performance evaluation system is not at a high level ( =2,30), it is at moderate level (which means
partially agree). Table 2 presents ANOVA test results which indicate whether academic members’
expectations significantly differ depending on academic titles:
Table 2.
The ANOVA Results Related To Whether Expectations from a Performance Evaluation System Differ
Depending On Academic Title
N
Standart
Deviation
Sum of
Squares df
Mean of F
Squares
p
Source of Difference
Research
Assistant>Associate
Prof.
25 2,4525
,50688
Research
Assistant
Between
Groups
3
1,774
12,24.000Assistant Prof.
>Associate Prof.
5,321
Associate Prof. >Prof.
Assistant
Professor
5
Associate
Professor
1
Professor
Total
3
2,4875
,25174
3
2,1754
,44177 Intragroup
1
1,8173
,16230
104 2,3023
,43859
3
14,492 100
145
When arithmetic mean and standard deviation values according to academic titles are analyzed, it is
observed that assistant professors have the highest, on the other hand, professors have the lowest
expectations from a performance evaluation system. As there appears a significant difference between
groups in Table 2, post hoc tests have been used for identifying between which groups the significant
1306
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
difference is. As the variances are found not to be equal with “Levene F” test, Games-Howel statistical
method has preferred which works well with unequal groups. As a result of analysis, it is found out that
Research Assistants and Assistant Professors have higher level of expectations than Associate Professors
and Professors.
When subscale is analyzed item by item, it appears that the highest expectations from a performance
evaluation system are:
It creates a consensus on the criteria of being an effective academic member (
It positively affects academic members’ professional development (
It increases workload of academic members (
=3,42)
=3,27)
=2,40)
It causes tension within institution ( =2,39)
Academic members’ the lowest expectations from a performance evaluation system appear as:
It increases academic members’ motivations (
=1,90)
It contributes to development of a qualified institutional culture (values, attitude towards work,
understanding of responsibility, relationships etc.) ( =1,76)
It helps academic members to get better prepared for their courses (
=1,70)
Table 3 presents the analysis results related to whether academic members’ expectations froma
performance evaluation system differ depending on academic incentive status.
Table 1.
T-test Results Related to Whether Expectations from a Performance Evaluation System Differ Depending
On Academic Incentive Status
Standard
Deviation
N
Academic
Incentive
Yes, I take
52
2,43
,38
No, I don’t
52
2,16
,45
t
,322
p
,002
Firstly, equality of variances are checked with Levene test and significance value in appropriate t
column is accepted. As a result of analysis, expectations from a performance evaluation system
significantly differ depending academic incentive status [t(102)=3,22, p<.05)]. Academic members who
take academic incentive -a financial aid which is given to academic members who produce a certain
number of research and projects- have significantly higher level ol expectations than those who do not.
Table 4 presents the ANOVA results related to whether academic members’ expectations from a
performance system differ depending on work experience.
1307
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Table 4.
The ANOVA Results Whether Expectations from a Performance Evaluation System Differ Depending On
Work Experience
Standard
Deviation
N
Sum of
squares
df
Mean of
squares
F
p
Source
Difference
of
0-5 years> more
than 15 years
0-5 years
17
2,43
,51
Between
Groups
1,55
3
4,67
10,28
,000
6-10 years> more
than 15years
11-15 years>more
than 15 years
6-10 years
38
2,43
,28
11-15 years
14
2,51
,44
More than 15
35
years
2,00
,39
104 2,30
,43
Total
1,51 100
Within
group
15,1
103
When Table 4 is reviewed, it is clearly seen that the lowest scores of expectations belong to academic
members who have more than 15 years working experience. The mean scores of other three groups are
higher than the mean score of this group at a significant level; but there is no significant difference
between the mena scores of these three groups. Table 5 presents the ANOVA results related to whether
academic members’ expectations from a performance system differ depending on satisfaction level from
their institutions.
Table 5.
The ANOVA Results Related To Whether Expectations from a Performance Evaluation System Differ
Depending On Satisfaction Level from Institution
Standard
Deviation
N
Low
10
2,70
,31
Moderate 35
2,39
,32
42
2,00
,47
Very high 17
1,80
,11
High
Total
104 2,30
Sum of
squares
df
Mean
Square
Between
Groups
5,97
3
1,991
Within
Groups
13,08
100
,138
,43859
103
1308
F
p
14,383 ,000
Source of
variation
Low, Moderate
> High, Very
High
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
In Table 5, there is observed a significant difference between mean scores of groups (p<.05) ; therefore
Games-Howel post hoc test is done due to unequality of variances for identying the source of variation.
As a result of post-hoc test, it is observed that academic members who have low and moderate level of
satisfaction from their institutions have significantly higher level of expectations from a performance
evaluation system than those who have high and very high level of satisfaction from their institution.
3.2 The Obstacles to a Performance Evaluation System
The second research question of this study “What are the perceptions of academic members related
to the obstables to a performance evaluation system?” is attempted to be answered. Table 6 presents the
mean and standard deviation related to scores of academic members.
Table 6.
The General Mean Score of Academic Members Related To the Obstacles to a Performance Evaluation
System
The Obstacles Subscale
N
Minimum
Maximum
Mean Standard Deviation
104
2,20
3,80
3,02
,57517
When Table 6 is reviewed, it is seen that the mean score of academic members is high ( =3,02), which
mean that academic members agree with the items in this subscale as obstacles to a performance
evaluation system. When it is analyzed item by item, the most frequently agreed obstacles are:
Higher education institutions’ current organizational structure (hieraricical organization, distribution
of authority and responsibilities, autonomy limits of units) = 3,80
Academic members’ workload
=3,68
Academic members least agree on the following obstacle to a performance evaluation system “cultural
structure (ignoring the problems, personal conflicts, exreme tolerance, discomfort of criticism, lack of
confidence, lack of competitive understanding at Eurapean standards) ( =1,91)
Table 7 presents the anaylsis results related to whether academic members’ perceptions of obstables
to a performance evaluation system differ depending on academic incentive status.
Table 7.
The T-Test Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance
Evaluation System Differ Depending On Academic Incentive Status
Standard
deviation
N
Academic Incentive
Yes, I take
52
2,14
,54
No, I don’t
52
2,74
,51
t
5,77
p
,000
When Table 7 is reviewed, it is seen that academic members’ perceptions of obstacles differ depending
on academic incentive status at a significant level [t(102)=5,77, p<.05)]. Academic members who take
academic incentive have significantly lower perceptions of obstacles to a performance evaluation system.
Table 8 presents the ANOVA results related to whether academic members’ perceptions of obstacles to a
performance evaluation system differ depending on academic title.
1309
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Table 8.
The ANOVA Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance
Evaluation System Differ Depending On Academic Title
Standard
Deviation
N
Sum of
Mean
squares df Square
2,98
Between
.30181
3
Groups 11,089
Assistant
Prof.
35
3,42
.36202
Associate
31
Prof.
3,38
.63314
Professor 13
2,96
.83254
Research 25
Assistant
Total
104
3,
02
3,696
F
p
Source
variation
of
Assistant Prof.>
Research Assistant,
13,508 ,000
Professor
Associate
Prof
>Research Assistant,
Professor
Within
Group
27,365
100 ,274
.61101
When Table 8 is reviewed, it is seen that there is a statistically significant difference between groups
(p<.05); therefore, Games-Howel statistical post hoc test (for cases of unequal variations) is used for
identifying the source of variation. As a result of analysis, it is observed that the highest scores of obstacles
belong to Assistant Professors and Associate Professors, the lowest scores belong to Research Assistants
and Professors. Table 9 presents the ANOVA results related to whether academic members’ perceptions
of obstacles to a performance evaluation system differ depending on working experience.
Table 9.
The ANOVA Results Related To Whether Perceptions of Obstacles to a Performance Evaluation System
Differ Depending On Working Experience
Standard
Deviation
N
0-5 years 17
6-10 years 38
2,72
,51
3, 26
,28
14
3,78
,44
More than
35
15 years
2,88
,39
104 3,02
,54
11-15
years
Total
Sum of
Mean
squares df Squares
Between
Groups
21,938
3 4,67
F
44,27
p
,000
Source
variation
of
11-15years,
6-10 years>05years, more
than 15 years
11-15 years
>6-10 years
Within Group
1,51 100
16,51
103
1310
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
When Table 9 is reviewed, it is seen that there is a statistically significant difference; therefore GamesHowell post hoc test is done. As a result of test, it appears that academic members who have 0-5 year
working experience have the lowest scores from obstacles subscale, and those who have 11-15 years of
working experience have the highest scores from obstacles subscale. Academic members who work more
than 10 and less than 15 years think that almost all items in the subscale really pose an obstacle to a
performance evaluation.
Table 10 presents the ANOVA results related to whether academic members’ perceptions of obstacles
to a performance evaluation system differ depending on satisfaction level from institution.
Table 10.
The ANOVA Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance
Evaluation System Differ Depending On Satisfaction Level From Institution
Standard
Deviation
N
10
3,36
,31
Moderate 35
3,58
,32
42
2,62
,47
Very high 17
2,58
,11
Low
High
Total
104 3,02
Sum of
squares
Between
Groups
Within
Group
5,97
13,08
,43859
Mean
Squares
df
3
100
1,991
F
14,38
P
Source of
variation
Low,Moderate
Level>High,
,00
Very high
,138
103
When Table 10 is reviewed, it is seen that there is a statistically significant difference; therefore
Games-Howell post hoc test is done. As a result of test, it appears that academic members who have high
and very high level of satisfaction from their institutions have significantly lower scores of obstacles to a
performance evaluation system.
3.3 Qualitative Analysis of Academic Members’ General Views Related to Performance Evaluation
System
Within context of this study, qualitative data have been obtained from academic members related to
their views about performance evaluation system. The data collected have been analyzed with content
analysis. As a result of content analysis, there emerges the following six themes: “attitude towards
performance evaluation theme, priorities of academic members, positive effects of performance
evaluation, negative effects of performance evaluation, obstacles to performance evaluation and
suggestions for obstacles”.
1.
What do you think about making a periodic and data-based assessment of academic
members?
There is a difference of opinion among academic members in Educaton Faculties. Although it appears
that most of the academic members support a perodic and data-based assessment, there are some other
academic members who have negative attitudes and criticism for such a system by asserting that it is wide
open to abuse. Table 11 presents the analysis of qualitative data about this theme.
1311
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Table 11.
The Analysis of Data Related to Making a Periodic Performance Evaluation
Theme
Attitude
Towards
Performance
Evaluation
Description
Having positive, negative or recessive
attitudes about performance evaluation
Codes
Frequency
Adopters
28
Doubters
12
Resistants
10
When Table 11 is reviewed, it is seen that most of the academic members adopt such a system. They
claim that performance evaluation would support their development in many aspects. The codes related
to views of academic members are given below:
Adopters: “I believe that performance evaluation will bring about good results in assuring quality in
higher education”(K6)
Doubters: “It is very nice to be supported by system. But is it all about publishing? It is a matter of
question for me how will this evaluation be done, and by whom?” (K5)
Resistants: “performance can not be assessed. It is ridiculous to compare individuals. It has been tried
many times before, but it is found out to be useless” (K13)
2. What criteria should be assessed within performance evaluation? Could you order these criteria
according to significance level for you?
Academic members in Education Faculties express a variety of views related to what criteria should be
included within evaluation. They express significance level of these criteria, which provide valuable
qualitative data. Table 12 presents the analysis of qualitative data about which criteria should be included
within performance evaluation.
Table 12.
The codes related to academic members’ preference for performance evaluation criteria
Theme
Priorities of
Academic
Members
Descriptors
The criteria which
should be included
within performance
evaluation
and
ordering
these
criteria according to
significance level
Codes
Frequency
Research and publications
17
The quality of instruction
10
Undergraduate and postgraduate advisory
8
Workload (course hours etc.)
6
Jury memberships (Jury of thesis, Jury of
Associate Professor etc.)
Perosnal interest and career
5
4
When Table 12 is reviewed, it is seen that academic members in the first place want their research
and publications to be assessed, and then their teaching quality during classroom. According to them,
teaching quality includes methods they use, the quality of presentation of content, material use and every
1312
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
piece of effort which makes learning permanent. Academic members also want their personal interests
and career to be assessed by the system. The codes related to views of academic members about
performance evaluation criteria are given below:
Research and Publications: “The most important criteria in a performance evaluation system must be
academic members’ publications in terms of quantity and quality” (K6)
The quality of instruction: “Instruction is as important as academic studies. Classroom work, expecially
activities and teaching methods can be assessed”
Undergraduate and postgraduate advisory: “We are not just researchers, but also advisors and this
issue is ignored by the system. For instance thesis advisory is a tedious job and should be included within
evaluation”(K22)
Workload: “There is no time left for other things rather than teaching courses. An academic member
should be assessed with his/her courses, efforts he/she makes for students and administration. Academic
members who teach more courses are the best academicians.” (K30)
3. What are the positive and negative consequences of making a performance evaluation of academic
members?
Academic members in Education Faculties put emphasis on both positive and negative impacts of a
performance evaluation system. In “Positive Impacts” theme, the codes appear as “motivation”, “financial
support”, “search of quality”, “support of development via self-criticism”, “continuity of dynamism”;
however, in “Negative Impacts” theme, the codes appear as “intra-institutional rivalry”, “academic
dishonesty”, “cause of stress”, “domination of quantity over quality”. Table 13 presents the analysis of
qualitative data about positive and negative impacts of performance evaluation.
Table 13.
The codes related to academic members’ views about positive and negative impacts of performance
evaluation
Theme
Description
Codes
Frequency
Motivation
12
Financial Support
8
Search of quality
8
Supporting development via selfcriticism
4
The continuity of dynamism
4
Positive Impacts
Positive consequences of
performance evaluation
system
Negative Impacts
Negative consequences of
performance evaluation
Intra-institutional rivalry
7
Academic dishonesty
6
Cause of stress
6
Domination of quantity over quality
8
When Table 13 is reviewed, it is seen that academic members express 36 views under 5 codes related
to positive impacts theme and 27 views under 4 codes related to negative impacts theme. The codes
1313
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
related to views of academic members about positive and negative impacts of performance evaluation
are given below:
Motivation: “This system motivates academic members to conduct new studies” (K24)
Search of quality: “academic members who are subject to an evaluation feel the need to pursue quality.
No one wants to be called a bad teacher” (K9)
The continuity of dynamism: “In public universities, especially old academic members are resistant to
renew themselves. This situation leads to fossilization in higher education; because there is no evaluation
and sanction. Evaluation results in dynamism”(K29)
Intra-institutional rivalry: “it prevents cooperation, grows jealousy, a competitive environment
increases egoist behaviors rather than productivitiy” (K36)
Academic dishonesty: “conducting research with fake data, request others to write his/her name as
the last name in studies with no effort”
Domination of quantity over quality: “publish publish publish, it is enough. There are lots of academic
members who make research but what about the quality? No one asks for this question. No one talks
about quality now”
4. What are the obstacles to performance of academic members in higher education and what do you
suggest for overcoming these obstacles?
Academic members in Education Faculties list a number of obstacles to a performance evaluation
system and then offer some suggestions for overcoming these obstacles. In “Obstacles” theme, the codes
appear as “intensive workload (courses, advisory, administrative duties)”, “efforts are not appreciated”,
“cumbersome organizational process”, “lack of internal motivation” and “too crowded classrooms”; in
“Suggestions” theme, the codes appear as “reducing course loads of academic members”, “institutional
support for academic efforts and research publishings”, “evaluation criteria determined by universities”,
“perodical budget allocation to academic members from The Council of Higher Education” and lasty
“employing more officers”. Table 14 presents the analysis of qualitative data related to academic
members’ views about obstacles and suggestions.
Table 14.
The Codes Related To Academic Members’ Views about Obstacles to a Performance Evaluation and Their
Suggestions
Theme
Description
Codes
Frequency
Obstacles
Intensive
workload
administrative duties)
Obstacles
to
a
performance evaluation
system
(courses,
advisory,
18
Efforts are not appreciated
12
Cumbersome organizational process
10
Too crowded classrooms
8
Lack of internal motivation
6
Reducing courseload of academic members
11
Suggestions
1314
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Suggestions
for
overcoming obstacles
institutional support for academic efforts and
research publishings
11
Evaluation criteria should be determined by
universities
10
Periodical budget allocation to academic members
from YÖK
8
Employing more officers
4
When Table 14 is reviewed, it is seen that academic members express 48 views under 5 codes in
“Obstacles” theme and 44 views under 5 codes in “Suggestions” theme. The codes related to views of
academic members about obstacles to a performance evaluation and their suggestions are given below:
Intensive workload: “it takes time to make something of high quality. There is left no time for academic
members. They teach courses, take care of students or are busy with administrative duties.” (K25)
Lack of internal motivation: “there are lost of things in academic life which decreases motivation. If an
individual starts this profession for some other reasons, he/she has low level of motivation for selfdevelopment”(K19)
Cumbersome organizational process: “burecracy and very slow running process put an obstacle to
performance while making projects or other studies” (K8)
Institutional support for academic efforts and research publishings: “the most important suggestion
for performance increase is that academic members should be supported by institution. This might include
research, publication, congress participation or educations for self-development” (K7)
Employing more officers: “If the institution employs more officers, academic members will be freed
from paperworks” (K21)
Periodic budget allocation to academic members from YÖK: “The Council of Higher Education should
allocate a certain amount of budget for academic members, ask them to plan their budget use and make
budget-product comparison at the end of period” (K10)
Discussion & Conclusion
In accordance with findings of this study, it is observed that there is a difference of opinion among
academic members related to performance evaluation system. It is seen that academic members in
education faculties who have more than 15 years of working experience and highly satisfied have lower
expectations about performance evaluations than others. When academic members’ views are reviewed
depending on academic title, it is seen that research assistants and assistant professors have positive
attitude towards performance evaluation, while associate professors and professor show low level of
positive behaviors towards performance evaluation. Accordingly, Stonebraker and Stone (2015)
emphasize that there is an increase in the average age of academic members with the elimination of
mandatory retirement and this raises some concerns about the impact of this aging on productivity in
class. They claim that age has a negative impact on student ratings of faculty members that is strong
across genders and groups of academic disciplines. However, this negative effect begins after faculty
members reach their mid-forties. This explains the reason for negative attitudes of professors towards
performance evaluation system. This finding is also parallel with Esen and Esen’s (2015) study findings.
They find out in their study that the there is a decrease in positive perception of academic members about
the positive impacts of performance evaluation as there is a progress in academic titles. Bianchini, Lissoni,
and Pezzoni, (2013) emphasize that the students tend to evaluate professors’ performance more
negatively than assistant professors. From a general point of view, it appears in this study that there is a
hesitation and lack of confidence in academic community about the efficiency of a performance
evaluation system.
1315
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
This study indicates that academic members expect from a performance evaluation system to develop
a consensus about the criteria of an effective academician, positively affect professional development of
academic members; on the other hand increase workload of academic members and lead to intrainstitutional tension. Qualitative analysis also shows that nearly half of academic members support
performance evaluation while a certain number of academic members hesitates about how it will be
applied and by whom. The academic members in the faculties of education claim that performance
evaluation increases motivation and search for quality; but it may also lead to competition within
institution and academic fraud. Traditionally, performance evaluation in faculties tend to focus on
research indicators (Bogt and Scapens, 2012); therefore higher education institutions plan their
evaluations considering governmental funding, research awards and high rankings which all lead to an
evaluation which only favours academic members with top publications (Douglas 2013, Hopwood 2008).
These findings differ to a certain extent from studies of Tonbul (2008); Esen and Esen (2015) and Başbuğ
and Ünsal (2010). Tonbul (2008) asserts that academicians have higher expectations from performance
evaluation approach because they think evaluation approach helps to identify the obstacles to an effective
performance and recognize one’s own deficiencies. Accordingly, Esen and Esen (2015) emphasize that
academic members expect from a performance evaluation system to develop a qualified organizational
culture, provide continuity of organizational innovation, positively affect professional development of
academic members and helps to recognize own deficiencies. This study also indicates that the most
important obstacles to performance evaluation appear as organizational process of higher education
institutions, intensive workload and lack of intrinsic motivation. Within the scope of the proposals, they
request for employment of more officers and institutional support for their publications and academic
studies. As a result of Tonbul (2008)’s study, he lists the obstacles to performance evaluation as
inadequacy of organizational opportunities, the organizational culture and uncertainty in evaluation
criteria. In study of Esen and Esen (2015), it is found out that the most important factors which put
obstacle to performance evaluation are inadequacy of organizational opportunities, current
organizational process of higher education institutions and academic promotion criteria. Also, Başbuğ and
Ünsal (2010) claim that the lack of physical conditions for scientific research is the most significant factor
which puts obstacle to academic performance.
Academic members in this study emphasize that they prefer to be evaluated according to following
criteria: first of all for their academic publications and research, secondly their quality of instruction and
thirdly their counseling service to postgraduates. This finding is supported by Braunstein ve Benston
(1973) as they find out that research and visibility are highly related in evaluation of performance of
academic members, but effective teaching is only moderately related to these performance criteria. In
practice, academic members’ performance of instruction is mostly done by students. Arnăutu and Panc
(2015) criticize this situation by claiming that research and scientific productivity, administrative capacity
and reputation are not presented in the evaluation made by students, therefore they do not have
information necessary to evaluate academic members’ role within faculty. Ünver (2012) conducts
research about evaluation of academic members by students and it comes out that most of the academic
members think that students fail to make an objective evaluation of academic members; therefore, they
prefer making academic studies rather than focusing on students’ views about their teaching
performance. Turpen, Henderson, and Dancy (2012) state that the faculties focus on the students' test
performance and academic success as quality criteria while higher education institutions focus on
quantitative scoring of students when evaluating the quality of teaching. Within this respect, the quality
of the measurement tools is very important for assessment of teaching performance. Kalaycı and Çimen
(2012) examine the assessment tools used in the process of evaluating the instructional performance of
academicians in higher education institutions and find out that quality of instruction and course
evaluation surveys are developed without any particular approach and twenty percent of items are
inappropriate according to item construction and writing rules, therefore these assessment tools fail to
evaluate academic members’ performance.It is shown in some studies that the assessment of the
performance of the instructors by the students may be related to the quality of the teaching as well as
the qualities of physical attraction and comfort of the course which are not related to the teaching
(Hornstein, 2017; Tan et al., 2019). Shao, Anderson, and Newsome (2007) claim that academic members
1316
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
request peers/colleagues’ considerations for performance assessment and other criteria such as class
visits, preparation for class, follow-up of current developments in the field. There are other factors
affecting performance evaluations of academic members. Özgüngör and Duru (2014) find out that there
is deterioration in the perceptions of the instructors as there is an increase in course load, instructors’
experience, and the total number of students taking instructor’s course. It comes out that the students of
the Faculty of Education tend to give higher scores to the faculty members than the students of all other
faculties, whereas the students of the Faculty of Technical Education and Engineering give lower scores
to the faculty members. It is also revealed that faculty members with a course load of 45 hours or more
are evaluated more negatively than other faculty members with less course load. In Faculty of Education,
the faculty members with 60-100 students receive the worst performance evaluations. Arnăutu and Panc
(2015) refer to students and academic members’ different expectations from each other; claiming that
students focus on communicative issues and expect from professors a good relationship and personalized
feedback, while professors believe that the attention should be focused on the quality of the education
process (such as information update).
In this study, it is found out that the performance evaluation of the academic members creates a
consensus on the criteria of the effective academic member and positively affects the professional
development of the academic members. These qualifications enhance the professional quality of
academic members working in the faculties of education and provide a sustainable professional
development process. Filipe, Silva, Stulting and Golnik (2014) emphasize that sustainable professional
development improved through performance evaluation is not only limited to educational activities, but
also develops qualities such as management, teamwork, professionalism, interpersonal communication
and accountability. Açan and Saydan (2009) attempt to determine the academic quality characteristics of
the academic members and come up with those criteria: “the teaching ability of the instructor, the
assessment and evaluation skills of the instructor, the empathy of the instructor, the professional
responsibility of the instructor, the instructor's interest in the course and the gentleness of the instructor”.
Esen and Esen (2015) state that the performances of faculty members in the United States are generally
based on four factors which include instruction, research (professional development), community service
and administrative service. Among them, they emphasize that the most important ones are the instruction
and research dimension. Performance evaluation results are used for making decisions about whether
they are appropriate in their current position, promoting them or extending working periods of academic
members.
In this study, it is seen that academic members who do not take academic incentive have lower
expectations than those who deserve such a payment. Kalaycı (2008), regarding performance evaluation
system in Turkey, claim that it is not even in preparation stage compared to global practices. However,
there has occurred a number of promising developments in this area in Turkish higher education. Focusing
on this problem, the Council of Higher Education in Turkey decided to create Higher Education Quality
Council in 2015 to provide assurance that “a higher education institution or program fully fulfills the
quality and performance processes in line with internal and external quality standards”. In parallel, the
Academic Incentive Award Regulation has been put into practice in order to evaluate the performance of
academic staff working in higher education according to standard and objective principles, to increase the
effectiveness of scientific and academic studies and to support academic members. It seems to succeed
its aim because in this study academic members who take incentive are highly motivated and they make
consensus on the criteria of the effective faculty member which are in compliance with the academic
incentive award.
It is important to make performance evaluation in higher education in terms of increasing efficieny of
services; however, it is also important to determine which criteria will be used and assure reliability of
assesment. In this respect, Çakıroğlu, Aydın and Uzuntiryaki (2009) claim that there are very promising
research about the reliability of experienced academic members’ evaluations and they emphasize that
the following criteria should be considered within the context of evaluations:
1317
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
• Data about instructional performance should be collected from a variety of sources (colleagues,
students, advisors, postgraduate students, graduates etc.) and in a variety of formats (student evaluation
surveys, student interviews, observation results, course materials, student products etc.),
• clearly identifying evaluation criteria,
• informing evaluators about how to make evaluation process
• selecting evaluators randomly from candidates who meet criteria of being evaluator
• jury should include at least 3, at most 5 members.
To sum up, academic members’ views about performance evaluation are analyzed and it is recognized
that there is no consensus among academic members about performance evaluation. Academic members
are aware of positive impacts of such a system; however, they also have concerns about the realiability
of assessment, evaluation criteria, evaluation process and evaluators. This study indicates that the most
important criteria for academic members which should be included in evaluation are research and
publication, quality of instruction and undergraduate & postgraduate advisory. Among positive impacts
of performance evaluation system, it stands out that performance evaluation motivates academic
members, provides financial support and leads to search of quality; however, academic members put
emphasis on negative impacts of such a system which include intra-institutional competition and
academic fraud. Academic members make some suggestions for overcoming obstacles which include
reducing course loads, providing more institutional support for academic efforts, allocation of a certain
amount of budget for each member from the Council of Higher Education and employing more officers.
There is a variety of requests about performance evaluation criteria; however, it is important to establish
an effective evaluation system based on monitoring of peformance based on multiple data types in terms
of improving the quality of higher education and making systematic improvements.
As a result of this research, it is recommended that higher education institutions increase the
objectivity and efficiency in performance evaluations and create human resources services within
faculties. Also, they should design sustainable strong performance plans, use a holistic evaluation cycle,
provide consultancy services to academic members, students and internal stakeholders on how to
improve performance, prepare understandable and objective guidelines for performance evaluators, and
develop institutional culture which specifies that feedback is valuable not judgmental.
1318
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Türkçe Sürümü
Giriş
Performansın sistematik şekilde ölçülmesi ve değerlendirilmesi her tür organizasyonun şeffaf, etkili ve
başarılı bir işleyiş için üzerinde hassasiyetle durduğu bir boyuttur. Özel veya kamu yükseköğretim
kurumlarının çoğu sistematik bir değerlendirme için çeşitli çalışmalar yapmaktadırlar. Tüm yükseköğretim
kurumları, küresel ölçekteki artan rekabet ve şeffaflığa ilişkin toplumsal baskı gibi nedenlerle üniversiteler
standart performans göstergeleri belirlemenin yanı sıra vizyon, misyon ve stratejilere erişme düzeyini
ortaya koyma ihtiyacı hissetmektedirler (Hamid, Leen, Pei & Ijab 2008). Özellikle bugünün rekabetçi
ortamında, daha iyi bir değerlendirme sistemi üniversitelere yön gösterici avantajlar sunmakta ve kendi
çalışanlarını ve işleyişi değerlendirmelerine fırsat tanımaktadır. Devlet üniversitelerinin kamu tarafından
finanse edilmesi, baskı altında kalınmadan verimli bir performans değerlendirme yapılması için uygun
ortam sağlamaktadır.
Alan yazına bakıldığında yükseköğretim kurumlarının hesap verebilirliği ile ilgili çeşitli tartışmaların
olduğu görülmektedir. Bu tartışmaların temeli, kurumların performansının değerlendirilmesi ve sonuçların
halka açık biçimde diğer paydaşların da katılımına imkan tanıyacak şekilde yayınlanması ile ilgilidir.
Yükseköğretime getirilen başka bir eleştiri de üniversitelerin performansının en önemli belirleyicilerinden
olan öğretim üyelerinin, dünyadan kopuk “kapalı bir toplum” olarak bir “fildişi kule”de yaşadıkları
iddiasıdır (Glaser, Halliday, & Eliot, 2003). Esen ve Esen (2015) bu eleştirileri şu şekilde özetlemektedir:
• Öğretim üyelerinin yaptıkları çalışmaların toplumsal sorunlara dönük olmadığı,
• fazlasıyla teorik olduğu,
• toplumsal kaynakların boşa harcandığı yönündeki eleştiriler (Etzkowitz, Webster, Gebhardt, &
Terra, 2000).
• araştırmaların toplumsala dönüştürülmesi yerine, tek taraflı ve sadece o alanla sınırlandırılmış
olarak yürütülmesi,
• akademisyen kimliğinin bulunduğu üniversite ya da yönetsel yapıyı ürkütmekten tedirgin, özerkliği
daralmış bir kimliğe dönüşmesi (Elton, 1999).
Yükseköğretim kurumları, her ne kadar özerk olarak görev yapsa da bireysel organizasyon ve kuruluşlar
gibi ele alınmamalıdır. Yükseköğretim kurumları ait oldukları toplumu, ekonomik yapıyı ve sosyal yaşamı
etkileme gücüne sahip kurumlardır. Dolayısıyla, üniversiteler fildişi kuleler yerine bilim, toplum ve ulusu
birarada ele alıp uluslararası kalite standartlarında performans göstermeli ve kariyer gelişimi yerine
toplumsal faydayı ön planda tutmayı vicdani bir sorumluluk olarak hissetmelidirler. Üniversitelerde
performans değerlendirmenin yapılması, çalışanlara hesap verebilirlik (bilimsel araştırmalar için sürekli
iyileştirme faaliyetleri), devlete hesap verebilirlik (kaynakların verimli ve üretken kullanımı), öğrenci ve
topluma hesap verebilirlik (kapsamlı eğitsel deneyimler sunma, yaşam kalitesini artıracak mesleki
eğitimler sunma, toplumun işgücü ihtiyacını karşılama) açısından gereklidir (Vidovich & Slee, 2001). Ayrıca,
yükseköğretimde performans değerlendirmeyi gerekli hale getiren küresel gelişmeleri UNESCO (2004)
“girişimci üniversiteler, şirket üniversiteleri gibi yeni kurumlar; uzaktan, sanal ve özel şirketler gibi yeni
eğitim hizmeti dağıtım türleri; yeterlilik ve sertifikaların daha fazla çeşitlenmesi; yurtdışına yönelik artan
öğrenci, program, tedarikçi ve proje hareketliliği; yükseköğretim sunumunda artan özel yatırımlar şeklinde
sıralamıştır. Bu gelişmeler kalite, erişim, çeşitlilik ve finansman açısından yükseköğretime yönelik önemli
çıkarımlardır (akt.Tezsürücü & Bursalıoğlu, 2013). Yükseköğretimde performans değerlendirmesi hem
çeşitli süreçleri hem de ürünleri kapsamaktadır. Temelinde, performans değerlendirmesi kalite açısından
kabul edilebilir minimum düzeyi göstermekte ve bireylerin/kurumların gelişmeye açık yönlerini
1319
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
tanımalarına olanak sağlamaktadır. Birey veya kurumlar, sadece gelişmeye açık yönlerinin farkına
varmamakta; mevcut haliyle hangi yönlerde güçlü olduğunu da saptamaktadırlar. Batool, Qureshi & Raouf
(2010), performans değerlendirmesi denildiğinde bu kavramın bütün boyutları kapsamayabileceğini,
kurumsal performans değerlendirmesinin, akademik programların, derslerin veya mezunların kalitesini
ölçmekle aynı anlama gelmediğini belirtmişlerdir. Kurumsal performans değerlendirme daha çok kurumun
kalite ve etkililiği açısından mevcut durumunun değerlendirilmesi demek olduğunun altını çizmişlerdir.
Bu çalışma kapsamında yükseköğretimde performans değerlendirmesi «öğretim elemanlarının
öğretimsel rollerine ilişkin mesleki yeterliliğinin ve aynı zamanda kurumsal hedeflerin yerine getirilmesine
katkı düzeyinin ölçülmesi» olarak tanımlanabilir. Öğretim elemanlarının araştırma, akademik hizmet,
eğitim-öğretim, yayın gibi çeşitli çalışmalarının değerlendirilmesi, geri dönüt verilerek bireylerin
gelişiminin desteklenmesi ve çalışmalarının takdir edilmesi, performans değerlendirme sisteminin varlığını
zorunlu hale getirmektedir. Vincent ve Nithila (2010), yükseköğretimde gerçekleştirilecek bir performans
değerlendirmesi yaklaşımının sağlayacağı avantajlar arasında şunları dile getirmektedir:
• Bireyin gelişim ve ilerlemesinin gerçekçi hedeflere dayanmasını sağlar.
• Bireyin hedefleriyle kurumun hedeflerini birbirine uygun hale getirir.
• Organizasyon içindeki bireylerin zayıf yönleri ve güçlü yönlerini teşhis eder.
• İyileştirme amaçlı geri dönüt mekanizması işlevi görür.
• İhtiyaç duyulan eğitim ve kursları belirlemeye yarar.
• Kurumun eğitsel, toplumsal, ekonomik ve siyasal olarak daha büyük rol ve sorumluluklar almasını
sağlar.
Tonbul (2008) ise performans değerlendirme uygulamalarının, örgütsel hedeflerin gerçekleşme
düzeyini artıracağı, kurumsal işleyişte aksayan yönlerin saptanmasını kolaylaştıracağı, örgütsel iklim ve
kurum kültürünün çalışanlar üzerindeki etkisine ilişkin özgül veriler sağlayabileceği ve bu doğrultuda
örgütsel performansın artacağını belirtmiştir. İş akışı ve organizasyonla ilgili süreçlerde, geribildirim
düzeneğini etkin ve işlevsel biçimde işe koşan örgütlerin daha başarılı ve kalıcı oldukları görülmektedir
(Latham & Pinder, 2005). Kalaycı (2009) yükseköğretimde değerlendirme yapmadan başarıyı veya
başarısızlığı yordama olasılığının düşük olduğunu; fakat akademisyenlerin öğretim performanslarının
değerlendirilmesiyle birlikte öğrenme-öğretme ortamlarının herkesçe sorgulamaya açık hale geleceğini,
bu durumun ise oldukça zorlayıcı olduğunu ifade etmiştir. Bununla ilgili olarak, Kim ve diğerleri (2016) pek
çok profesörün eğitimcilik rolüne daha düşük önem verdiği, araştırmacı rolüne daha büyük öncelik
verdiğine; çünkü fakülte değerlendirme sisteminin araştırmaya dayalı olduğuna vurgu yapmışlardır.
Performans değerlendirme sadece zorunluluk ve formalite amacıyla yapılmamalıdır. Bu tehlike özellikle
devlet üniversiteleri için ihtimal dâhilindedir. Kalaycı ve Çimen (2012) “artık devlet üniversitelerinin de
“kalite süreçleri uygulamalarını” formaliteyi tamamlamak amacıyla değil, gerçekten kaliteyi yükseltmek ve
rekabette öne çıkmak amacı ile yürütmesi gerektiğini, devlet üniversitelerinin de kalite çalışmalarına
gereksinimi olduğunu” belirtmişlerdir.
21. yüzyılda üniversitelerini performans değerlendirmeye zorlayan sebepler arasında kurumsal itibar,
uluslararasılaşma ve dünya üniversite sıralamaları yer almaktadır. Kurumsal itibarı belirleyen pek çok
faktör bulunmaktadır. Higher Education Authority’nin (2013) araştırma ve öğretimle ilgili yaptığı itibar
araştırmasında; akademisyenlerin uzmanlık alanlarındaki bölümlerle yakından ilgilendikleri ve bilgi sahibi
oldukları ortaya çıkmıştır. Kurumun uluslararası ve ulusal hem öğretim elemanı hem öğrenci barındırması
kurumun global kimliğe sahip olduğu ve küresel markette rekabete hazır olduğu izlenimi verdiği ifade
edilmiştir (O'Connor ve diğerleri, 2013). Kurumun uluslararası öğretim elemanı, öğrenci bulundurması
başlı başına yeterli değildir. Bir üniversitenin kalitesi ve niteliğine ilişkin en önemli göstergelerden biri
öğretim elemanlarının performansı ve bununla doğrudan ilgili olarak verilerin derslerin kalite düzeyidir.
Öğretim elemanı kalitesi, eğitimin kalitesini doğrudan etkileyen faktörlerin başında gelmekte, öğretim
elemanlarının performanslarının değerlendirilmesi kalite kontrolünün güvencesi olarak görülmektedir
(Açan ve Saydan, 2009).
1320
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Yükseköğretim kurumları da dâhil olmak üzere kurumsal anlamda en sık kullanılan performans ölçüm
ve değerlendirme tekniklerine bakıldığında bunların «Öz Değerlendirme, Temel Performans Göstergeleri
(TPG), Göreceli Değerlendirme, Takdir Etme, Altı Sigma, Toplam Kalite Yönetimi» olduğu görülmektedir
(Çalışkan, 2006; Kalaycı, 2009; Paige, 2005). Öğretim üyelerinin bireysel değerlendirmesi kapsamında
burada belirtilen tekniklerin hepsi uygun veya uygulanabilir olmayabilir. Örneğin, performans
karşılaştırması tekniği bir bireyin aynı bağlamda öncü/örnek/lider kabul edilen bir başkasıyla
karşılaştırılarak, mevcut performansının değerlendirilmesini içermektedir. Anlaşıldığı üzere, bu teknik
mükemmeliyet arayışında en iyi örneklerin rehberlik edici yönünü kullanmak isteyen bir organizasyon için
uygun olabilir; fakat bütün personelin değerlendirilmesinde uygun değildir; çünkü her birey çalışma şekli
açısından ve kendini geliştirme yöntemi olarak birbirinden ayrılmaktadır. Bu teknikler arasında örneğin,
TPG tekniği, yükseköğretimde öğretici konumunda olanların performanslarını değerlendirmede kullanmak
için uygundur. TPG tekniğinde değerlendirilecek performans göstergelerinin işe vuruk tanımları yapılır. İşe
vuruk tanımda önemli olan, bir kavramın hangi işlemlerle tanımlandığının belirtilmesidir.Küresel düzeyde
gerçekleştirilen performans ölçümleri ve değerlendirme teknikleri her ülkenin yükseköğretim
kurumlarında birebir aynı şekilde uygulanmayabilir. Türkiye'de performans değerlendirmeye ilişkin
mevcut uygulamalara bakıldığında “öğretim üyelerinin sadece araştırma ve yayın etkinlikleri konusundaki
performansını nicel olarak ölçüldüğü ya da subjektif yargılar temelinde değerlendirme yapıldığı”
görülmektedir (Esen ve Esen, 2015). Bununla ilgili olarak Yükseköğretim Kurumu Türkiye’deki
akademisyenlerin akademik faaliyetlerini desteklemek ve motivasyonlarını artırmak amacıyla 2015 yılında
akademik teşvik uygulaması başlatmıştır. Bu yönetmelikte “Devlet yükseköğretim kurumları kadrolarında
bulunan öğretim elemanlarına yapılacak olan akademik teşvik ödeneğinin uygulanmasına yönelik olarak,
bilim alanlarının özellikleri ve öğretim elemanlarının unvanlarına göre akademik teşvik puanlarının
hesaplanmasında esas alınacak faaliyetlerin ayrıntılı özellikleri ve bu faaliyetlerin puan karşılıkları ile bu
hesaplamaları yapacak komisyonun oluşumu” hakkında detaylı değerlendirme ölçütleri yer almaktadır
(Akademik Teşvik Ödeneği Yönetmeliği, 2015). Akademik teşvik sistemi ile birlikte öğretim elemanlarının
ulusal ve veya uluslararası yürüttükleri proje, araştırma, yayın, sergi, aldıkları patent, çalışmalarına yapılan
atıflar, almış oldukları akademik ödüller esas alınarak Yükseköğretim Kurulu tarafından performansları
değerlendirilmektedir. Bunun sonucunda yeterli çalışmayı gerçekleştiren öğretim elemanları maddi açıdan
desteklenmektedirler. Alan yazındaki öğretim elemanlarının performans değerlendirmelerinin nasıl
yapıldığına bakıldığında ise çeşitli yöntemlerin olduğu görülmektedir. Türkiye'de öğretim üyelerinin
performanslarını değerlendirmede kullanılabilecek birbirinden bağımsız çeşitli yöntemler şunlardır:
a. Sicil sistemi
b. Akademik yükseltilme ve atanma kriterler
c. Öğretim üyesi değerlendirme anketleri
d. Yıllık sunulan faaliyet raporları
e. Akademik teşvik uygulaması
f. Öğrenci anketleri
(Esen ve Esen, 2015)
Yükseköğretimde performans değerlendirmenin yapılması, verilen hizmetlerin etkililiğini artırma
açısından oldukça önemlidir; fakat yapılacak performans değerlendirmenin hangi kriterlere göre
yapılacağı ve güvenirliği en az onun kadar önemlidir. Bu konuda, Çakıroğlu, Aydın ve Uzuntiryaki (2009)
“deneyimli öğretim üyelerinin yaptığı değerlendirmelerin güvenirliği konusundaki araştırmaların oldukça
ümit verici” olduğunu belirtmişler ve aşağıdaki kriterlerin göz önünde bulundurulması gerektiğine vurgu
yapmışlardır:
• öğretim performansına yönelik verilerin çeşitli kaynaklardan (meslektaş, öğrenci, danışman,
lisansüstü öğrencisi, mezun gibi) toplanması ve farklı formatlarda (öğrenci değerlendirme anketleri,
öğrenci görüşmeleri, gözlem sonuçları, ders materyalleri, öğrenci ürünleri vb.) olması,
1321
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
• değerlendirme kriterlerinin açıkça belirlenmesi,
• değerlendirilecek kişilere nasıl değerlendirileceğine yönelik bilgilendirme yapılması,
• değerlendiricilere nasıl değerlendirme yapacaklarına yönelik bilgilendirme yapılması,
• aday konumunda olan kişilerin değerlendirici rolü almaması,
• değerlendiricilerin kriterleri sağlayanlar arasında rastgele yöntemle seçilmesi,
• jürinin en az 3 en çok 5 üyeden oluşması.
Öğretim elemanlarının performanslarının değerlendirilmesinin temelinde üniversitelerin etkililiğini
artırma amacı yatmaktadır. Bu çalışmada eğitim fakültelerinin tercih edilmesinin sebebi özellikle “Bologna
Süreci” kapsamında YÖK’ün eğitim fakültelerinde akreditasyon çalışmaları üzerinde önemle durmasıdır.
Üniversitelerde eğitim fakültelerinde yürütülen akreditasyon çalışmalarında akademik personelin
performans değerlendirmeye yönelik beklentilerinin ve engellerin belirlenmesi amaca ulaşma bakımından
önemlidir. Türkiye’de bulunan yükseköğretim kurumları bir kalite göstergesi olarak hesap verebilirliğini
artırmayı ve mevcut durumlarını iç ve dış paydaşlarına bildirmeyi amaçlamaktadırlar. Üniversiteler bu
kapsamda misyon ve vizyonlarını gerçekleştirdiklerini kanıtlamak amacıyla öğretim elemanlarına ait
performans değerlendirme çalışmaları yürütmekte ve bunu rapor olarak halkın, öğrencilerin, ailelerin,
hükümetin, özel sektörün bilgisine arz etmektedirler. Ulusal ve küresel ölçekte üniversiteler üzerinde
kalite, verimlilik, etkililik, hesap verebilirlik gibi kavramlardan dolayı sistematik olarak performans
değerlendirmeleri yapmaya yönelik artan bir baskı bulunmaktadır. Dolayısıyla, performans değerlendirme
yükseköğretim kurumları için bu kadar önemliyken performansı değerlendirilen öğretim elemanlarının
beklentilerinin ne olduğunun belirlenmesi konusunda araştırma yapılmasına ihtiyaç bulunmaktadır.
Yükseköğretimde öğretim elemanlarının performans değerlendirme yaklaşımından beklentileri ve
performans değerlendirmenin önündeki engellere ilişkin görüşlerinin nicel ve nitel olarak incelemeyi
amaçlayan bu çalışma kapsamında, aşağıdaki sorular araştırılmıştır:
1. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin
beklentileri nasıldır?
1.1. Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentileri çeşitli
değişkenlere göre anlamlı farklılık göstermekte midir? (akademik ünvan, akademik deneyim, teşvik alma
durumu, kurumundan memnuniyet düzeyi)
2. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme sisteminin önündeki
engellere ilişkin görüşleri nasıldır?
2.1. Öğretim elemanlarının performans değerlendirme sisteminin önündeki engellere ilişkin algıları
çeşitli değişkenlere göre anlamlı farklılık göstermekte midir? (akademik ünvan, akademik deneyim, teşvik
alma durumu, kurumundan memnuniyet düzeyi)
3. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin genel
görüşleri nelerdir?
Yöntem
Bu araştırma kapsamında karma araştırma yöntemlerinden yakınsayan paralel karma desen tercih
edilmiştir. Nicel ve nitel veriler eş zamanlı toplanmış, ayrı ayrı analiz edilmiş ve bulguları karşılaştırılmıştır.
Yakınsayan paralel desende, nitel ve nicel araştırmalara eşit öncelik tanınır, analiz sırasında ayrı
çözümlemeler yapılır ve en sonunda birlikte yorumlama gerçekleşir (Creswell ve Plano Clark, 2014). Bu
araştırmada kullanılan karma desen Şekil 1‘de gösterilmiştir:
1322
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Betimsel İstatistik
Nicel veri
toplama ve
analizi
Nitel Veri
Toplama ve
analizi
t-test ve ANOVA
Nicel ve nitel
analizlerin
birlikte
yorumlanması
İçerik
Analizi
Şekil 2.
Karma Araştırmalarda Paralel Yakınsak Desen Önerisi
Katılımcılar
Bu çalışmanın verileri, 2018 yılı içerisinde devlet üniversiteleri bünyesinde faaliyet gösteren Eğitim
Fakültelerinde görev yapmakta olan araştırma görevlisi doktor, doktor öğretim üyesi, doçent ve profesör
kadrosunda bulunan öğretim elemanlarından elde edilmiştir. Çalışma grubu Marmara Bölgesi, Karadeniz
Bölgesi, Ege Bölgesi, Akdeniz Bölgesi ve Doğu Anadolu Bölgesinde yer alan devlet üniversitelerinin Eğitim
Fakültelerinde görev yapan katılımcılardan oluşmaktadır. Ders yüklerinin yoğunluğu yüzünden ve bu
araştırma kapsamında sadece doktora eğitimini tamamlayan öğretim elemanlarından veri toplandığı için
öğretim görevlileri çalışma grubuna dâhil edilmemiştir. Nicel boyuttaki veriler toplanırken elverişli
örnekleme tekniği kullanılmış ve çalışmaya katılmayı kabul eden altı üniversiteden 104 öğretim
elemanından veri toplanmıştır. Nitel boyutta ise örneklem seçimi maksimum çeşitlilik örneklemesiyle elde
edilmiş, ve incelenen durum hakkındaki farklı görüşleri temsil eden 50 katılımcıdan veri toplanmıştır. Nicel
aşamada 25 araştırma görevlisi doktor, 35 doktor öğretim üyesi, 31 doçent, 13 profesör yer almaktadır.
Elverişli örnekleme kullanıldığı için bölüm kriterine göre örneklem alımı yapılmamıştır; fakat nihai olarak
katılımcıların yüzde 22’si Fen Eğitimi, yüzde 11’i Okul Öncesi Eğitimi, yüzde 28’i Eğitim Bilimleri, yüzde 31’i
de Sınıf Eğitimi bölümünde görev yapmaktadır. Nitel aşamada 13 araştırma görevlisi doktor, 17 doktor
öğretim üyesi, 15 doçent ve 5 profesör yer almaktadır. Nitel aşamada katılımcılar belirlenirken akademik
unvan ve bölüm değişkenine göre maksimum çeşitlilik sağlanmıştır. Katılımcıların yüzde 20’si Fen Eğitimi,
yüzde 10’u Okul Öncesi Eğitimi, yüzde 40’ı Eğitim Bilimleri ve yüzde 30’u Sınıf Eğitimi bölümünde görev
yapmaktadır.
Kullanılan Veri Toplama Araçları
Bu çalışmada veri toplamak amacıyla kişisel bilgi formu, Tonbul (2008) tarafından geliştirilen 16
maddeden oluşan 4’lü likert tipinde “Performans Değerlendirme Yaklaşımına İlişkin Beklentiler” altölçeği
ve 10 maddeden oluşan “Performans Değerlendirme Sisteminin Önündeki Engeller” altölçeği
kullanılmıştır. Ölçek geliştirilirken açımlayıcı faktör analizi ve varimax dik döndürme tekniği uygulanmıştır.
Kullanılan ölçme aracına ait Cronbach alfa güvenirlik değerlerinin, “Performans Değerlendirme
Yaklaşımına İlişkin Beklentiler” ölçeği için 0.92 olduğu, “Performans Değerlendirme Sisteminin Önündeki
Engeller” altölçeği için .87 olduğu ortaya çıkmıştır. Bu çalışmanın verileri ile tekrar güvenirlik analizi
gerçekleştirilmiş ve Cronbach alfa değeri birinci altölçek için .84, ikinci altölçek için .78 bulunmuştur. Ölçek
maddeleri arasındaki homojenliği ölçen Cronbach Alfa değeri .60 ile .80 arasında olması ölçeğin üst
düzeyde güvenirliğe sahip olduğunun bir kanıtıdır (Tonbul, 2008) Kullanılan bu ölçekte dağılım tek faktörde
toplanmış ve tek faktör toplam varyansın %55,8’ini açıklamaktadır.Ayrıca, nicel verilerin nitel verilerle
desteklenmesi ve zengin çözümleme amacıyla performans değerlendirme yaklaşıma ilişkin açık uçlu
sorular sorulmuştur. Eğitim Bilimleri alanından bir profesör, Ölçme ve Değerlendirme alanından bir doçent
1323
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
ve yükseköğretim çalışmaları alanında çalışan bir uzmandan görüşleri alınmış ve gerekli düzeltmeler
yapılmıştır. Açık uçlu soruların nihai hali şu şekildedir:
2.1. Akademisyenlerin performansının verilere dayalı ve periyodik olarak ölçülüp değerlendirilmesi
hakkındaki düşünceniz nedir?
2.2. Performansa dayalı değerlendirme yapılırken, içerisinde hangi boyutların olmasını istersiniz? Bu
boyutları önem sırasına göre maddeler halinde yazınız.
2.3. Performans değerlendirmesinin akademisyenlerin performansını etkileyen olumlu ve olumsuz
yönleri nelerdir?
2.4. Yükseköğretimde akademisyenlerin performansını artırma önündeki engeller nelerdir ve bu
engellerin ortadan kalkması için önerileriniz nelerdir?
Veri Analizi
Nicel verilerin hangi yöntemle çözümleneceğini belirlemek için varyansların eşitliği ve verilerin
dağılımına ilişkin normallik değerine bakılmıştır. Bu amaçla çarpıklık ve basıklık katsayılarına bakılmış ve (1,+1) aralığında olduğu görülmüştür. Ayrıca örneklem sayısı 50’den büyük olduğu için (N=104) Kolmogrov
Smirnov testi yapılmış ve test sonucunda anlamlılık değerinin (p>.05) olduğu görülmüştür. Normallik
varsayımı sağlandığı için, akademik teşvik alma durumu değişkeni açısından katılımcıların verdikleri
yanıtlar arasında anlamlı bir fark olup olmadığını test etmek için “İlişkisiz Örneklemler için t-test”
yapılmıştır. Çalışma deneyimi, akademik ünvan ve kurumundan memnuniyet düzeyi değişkenleri açısından
katılımcıların ölçek maddelerine verdikleri yanıtlar arasında anlamlı bir fark olup olmadığını test etmek
amacıyla tek yönlü varyans analizi (ANOVA) yapılmıştır.
Nitel verilerin analizinde tümevarımsal içerik analizi kullanılmıştır. Açık Uçlu Anket ile toplanan
akademisyen görüşleri üzerinden kodlayıcı güvenirliği uyuşum yüzdeleri belirlenmiştir. Bu değerler
belirlenirken açık uçlu ankette yer alan akademisyen görüşleri bir araştırmacı ve bir uzman tarafından
kodlanmıştır. Bu işlem ankette yer alan her madde için tekrar edilmiştir. Uyuşum yüzdeleri, Miles ve
Huberman’ın (1994) güvenirlik formülü kullanılarak hesaplanmıştır.
Güvenirlik = Görüş Birliği / (Görüş Birliği + Görüş Ayrılığı)
Hesaplama sonucunda performans değerlendirme yaklaşımıyla ilgili görüşlere ilişkin güvenirlik 0.89
bulunmuştur. Uyuşum yüzdesinin % 80 ya da daha üstü olması yeterli görüldüğünden veri analizi açısından
güvenirliğin sağlandığı söylenebilir (Mokkink ve diğerleri, 2010). Bu araştırmada Creswell (2003)
tarafından sıralanan nitel araştırma yöntemlerinde kullanılan “Katılımcı Kontrolü, “Uzman Kanısı”, “Zengin
Betimleme” ve “Kanıt Zinciri” geçerlik stratejilerinden yararlanılmıştır. Katılımcılara çalışma bulgularının
kendi düşüncelerini doğru yansıtıp yansıtmadığını sorulmuş, çalışma katılımcılarıyla az teması olan ve
çalışma yöntemini bilen bağımsız bir uzmana danışılmış ve doğrudan alıntılarla verinin doğasına mümkün
olduğu ölçüde sadık kalınmıştır.
Bulgular
3.1 Performans Değerlendirme Yaklaşımına İlişkin Beklentiler
Araştırmada ilk olarak “Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin
beklentileri nasıldır?” sorusuna cevap aranmış ve katılımcıların ölçekten aldıkları genel puan ortalaması
Tablo 1’de sunulmuştur.
Tablo 1.
Öğretim Elemanlarının Performans Değerlendirme Yaklaşımına İlişkin Beklentilerinin Genel Ortalaması
Beklenti Genel Ortalama
N
104
Minimum
1,50
1324
Maksimum
3,31
Ortalama
2,3023
Standart Sapma
,43859
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 1’de öğretim elemanlarının ölçekten elde ettiği puan ortalamasına bakıldığında ( =2,30),
performans değerlendirme yaklaşımıyla ilgili beklentilerinin yüksek olmadığı, orta düzeyde (kısmen
katılıyorum) olduğu dikkat çekmektedir. Öğretim elemanlarının performans değerlendirme yaklaşımına
ilişkin beklentilerinin akademik ünvan değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin
ANOVA test sonuçları Tablo 2’de yer almaktadır.
Tablo 2.
Akademik Ünvan Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin Beklentiler
ANOVA Testi
N
Arş.Gör.Dr.
Dr.Öğr.Üyesi
Doç.Dr.
Standart
Sapma
25
2,4525 ,506
35
2,4875 ,251
31
2,1754 ,441
Kareler
Toplamı
Gruplararası
Grup içi
Prof.Dr.
Toplam
5,321
14,492
13 1,8173 ,162
104 2,3023 ,438
Kareler F
Sd Ort.
p
Farkın kaynağı
Arş.Gör>Doç.Dr.,
ProfDr.Öğr.Üyesi>
3 1,774
12,24 .000 Doç, Dr., Prof.
Doç.Dr.>Prof.
10
0
145
Ölçekten alınan ortalama puanların akademik ünvanlara göre aritmetik ortalama ve standart sapma
değerine bakıldığında ise performans değerlendirme yaklaşımıyla ilgili en yüksek beklentiye sahip olanların
doktor öğretim üyesi olduğu görülürken, en düşük beklentiye sahip olanların ise profesörler olduğu ortaya
çıkmaktadır. Tablo 2’de gruplar arası anlamlı farklılık olduğu ortaya çıktığı için, anlamlı farklılığın hangi
gruplar arasında olduğunu görmek amacıyla post hoc testlerine bakılmıştır. “Levene F” testine ait olan
(Sig) değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar
arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi
kullanılmıştır. Analiz sonucunda Araştırma Görevlileri ile doktor öğretim üyelerinin ortalama puanları
doçent ve profesörlerin puanlarından anlamlı derecede yüksektir. Araştırma görevlileri ile doktor öğretim
üyeleri arasında beklenti puanlarında anlamlı farklılık yoktur.
Ölçekte yer alan maddeler incelendiğinde ise performansa ilişkin en yüksek beklentilerin aşağıdaki
maddelerle ilgili olduğu görülmektedir:
Etkili öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması sağlanır. (
Öğretim üyesinin mesleki gelişimi olumlu etkilenir. ( =3,27)
Öğretim üyesinin iş yükü artar. ( =2,40)
Kurum içi gerginliğe neden olur. ( =2,39)
=3,42)
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin en düşük beklentileri ise
şunlardır:
Öğretim üyelerinin motivasyonu artar. ( =1,90)
Nitelikli bir kurum kültürünün (değerler, işe ilişkin tutum ve sorumluluk anlayışı, ilişkiler vb) gelişmesine
katkıda bulunur. ( =1,76)
Öğretim üyesinin derslere daha hazırlıklı gelmesi sağlanır. ( =1,70)
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin akademik teşvik
alma durumu değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin analiz sonuçları Tablo
3’te yer almaktadır.
1325
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 2.
Akademik Teşvik Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin T- Testi
Sonuçları
N
Akademik Teşvik
SS
t
p
,322
,002
Aldım
52
2,43
,38
Almadım
52
2,16
,45
Tablo 3’te yer alan analiz sonucunda, performans değerlendirme yaklaşımına ilişkin beklentilerin
akademik teşvik alma durumuna göre anlamlı şekilde farklılaştığı görülmektedir [t(102)=3,22 p<.05)].
Akademik teşvik almış olan öğretim elemanlarının performans değerlendirme yaklaşımından beklentileri,
almayanlara göre anlamlı derecede yüksektir.
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin çalışma deneyimi
değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları Tablo 4’te yer
almaktadır.
Tablo 4.
Çalışma Deneyimi Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin Beklentiler
ANOVA Testi
Standart
Sapma
N
0-5 sene
17
2,43
,51
6-10 sene
38
2,43
,28
14 2,51
11-15 sene
15
seneden
35 2,00
fazla
Total
104 2,30
,44
Kareler
Toplamı
Gruplar
1,55
arası
Grup içi
1,51
Sd
3
Kareler F
Ort.
4,67
10,28
p
,00
Farkın kaynağı
0-5sene> 15seneden
fazla
6-10sene>
15seneden fazla
11-15sene>15
seneden fazla
100 15,1
,39
,43
103
Tablo 4’te analiz sonucunda performans değerlendirmeyle ilgili beklentilere ilişkin en düşük puana
sahip olanların 15 senden fazla çalışma deneyimi olanlar olduğu ortaya çıkmıştır. Diğer bütün grupların
puan ortalamaları, bu grubun puan ortalamasından anlamlı derecede yüksektir. İlk üç grubun kendi
aralarında puan ortalamaları arasında anlamlı faklılık bulunmamaktadır.
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin kurumlarından
memnniyet düzeyi değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test
sonuçları Tablo 5’te yer almaktadır.
1326
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 5.
Kurumundan Memnuniyet Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin
ANOVA Testi
Standart
Sapma
N
Az
Orta Düzey
10
2,70
,31
35
2,39
,32
42
2,00
,47
Tamamıyla
17
1,80
,11
Toplam
104 2,30
Oldukça
,438
Kareler
Toplamı
Sd
Gruplar
5,97
arası
Grup içi 13,08
3
Kareler Ort. F
p
1,991
14,38
,00
Farkın kaynağı
Az,Orta
Düzeyde>
Oldukça,
Tamamıyla
100
,138
103
Tablo 5’te ANOVA testi sonucunda gruplar arası anlamlı farklılık (p<.05) olduğu ortaya çıktığı için hangi
gruplar arasında anlamlı farklılık olduğuna bakılmıştır. “Levene F” testine ait olan (Sig) değeri p<.05 olduğu
için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar arasında karşılaştırma
yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi kullanılmıştır. Post-hoc testi
sonucunda bulunduğu kurumdan az ve orta düzeyde memnun olan öğretim elemanları, oldukça ve
tamamıyla memnun olanlara göre performans değerlendirme yaklaşımıyla ilgili anlamlı derecede daha
yüksek beklentilere sahiptir.
3.2 Performans Değerlendirme Yaklaşımının Önündeki Engeller
Araştırmada ikinci olarak “Öğretim elemanlarının performans değerlendirme yaklaşımının
önündeki engellere yönelik görüşleri nasıldır?” sorusuna cevap aranmış ve katılımcıların ölçekten aldıkları
puan ortalaması ve dağılımın standart sapması Tablo 6’da sunulmuştur.
Tablo 6.
Öğretim Elemanlarının Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin Genel Puan
Ortalamaları
Standart
N
Minimum
Maksimum
Ortalama
Sapma
Engeller Altölçeği
104
2,20
3,80
3,02
,57517
Tablo 6’da öğretim elemanlarının ölçekten elde ettiği puan ortalamasına bakıldığında ( =3,02),
performans değerlendirme yaklaşımının önündeki engellerle ilgili, ölçekte yer alan maddelere katıldıkları
görülmektedir. Madde madde bakıldığında, öğretim elemanlarının performans değerlendirmenin
önündeki engellere ilişkin en fazla katıldıkları ifadelerin şunlar olduğu görülmektedir:
Yükseköğretim kurumlarının mevcut örgütsel işleyişi (hiyerarşik yapılanma, yetki ve sorumlulukların
dağılımı, birimlerin özerklik sınırları). = 3,80
Öğretim üyesinin iş yükü. =3,68
1327
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Performans değerlendirme yaklaşımına ilişkin en az katıldıkları ifade ise “Kültürel yapı (olumsuzlukları
görmezden gelme, kişisel çekişmeler, aşırı hoşgörü, eleştirilme rahatsızlığı, güvensizlik, Batı
standartlarında rekabetçi bir anlayışın eksikliği vb.). ( =1,91)”dır.
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engellere ilişkin görüşlerinin
akademik teşvik alma durumu değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin analiz
sonuçları Tablo 7’de yer almaktadır.
Tablo 7.
Akademik Teşvik Değişkenine Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin TTesti Sonuçları
N
Akademik Teşvik
SS
t
P
5,77
,000
Aldım
52
2,14
,54
Almadım
52
2,74
,51
Tablo 7’de performans değerlendirme yaklaşımına ilişkin beklentilerin akademik teşvik alma
durumuna göre anlamlı şekilde farklılaştığı görülmektedir [t(102)=5,77, p<.05)]. Akademik teşvik almış
olan öğretim elemanlarının, performans değerlendirme yaklaşımının önündeki engeller altölçeğinden
anlamlı derecede daha düşük puan aldıkları görülmektedir.
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engellere ilişkin görüşlerinin
akademik ünvan değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları
Tablo 8’de yer almaktadır.
Tablo 8.
Akademik Ünvana Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin Anova Testi
Standart
Sapma
N
Arş.Gör.Dr 25
.
Dr.Öğr.Üy
35
esi
2,98
,30181
3,42
,36202
Doç.Dr.
31
3,38
,63314
Prof.Dr.
Toplam
13 2,96
104 3,02
,83254
,61101
Kareler
Toplamı
Gruplar
arası
Sd
3
11,089
Kareler Ort.
3,696
F
p
Farkın kaynağı
Doç.>Arş.Gör.Dr.,
13,50 ,000 Prof.Dr.
Dr.Öğr.Üyesi>Arş
.Gö.Dr,Prof.Dr.
Grup içi
27,365
100
,274
Tablo 8’de gruplar arası anlamlı farklılık olduğu ortaya çıktığı için (p<.05), anlamlı farklılığın hangi
gruplar arasında olduğunu belirlemek için post hoc testlerine bakılmıştır. “Levene F” testine ait olan (Sig)
değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar
arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi
kullanılmıştır. Analiz sonucunda performans değerlendirmenin önündeki engellerle ilgili en yüksek
puanların dr. öğretim üyeleri ve doçentlere ait olduğu, en düşük puanların ise araştırma görevlileri ve
profesörlere ait olduğu ortaya çıkmıştır. Araştırma görevlileri ile profesörlerin engellere ilişkin puanları
arasında istatiktiksel olarak anlamlı farklılık yoktur.
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engeller altölçeği
puanlarının çalışma deneyimi değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA
test sonuçları Tablo 9’da yer almaktadır.
1328
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 9.
Çalışma Deneyimi Değişkenine Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin
ANOVA Testi
Standart
Sapma
N
0-5 sene 17
2,72
,51
6-10 sene 38
3, 26
,28
14
3,78
,44
35
2,88
,39
104 3,02
,54
11-15
sene
15
seneden
fazla
Total
Kareler
Toplamı
Gruplar
21,938
arası
Sd
Kareler Ort. F
3
4,67
p
44,276
,000
Farkın kaynağı
11-15sene, 6-10
sene>0-5sene,
15seneden fazla
11-15 sene >610sene
Grup içi
1,51
100
16,516
103
Tablo 9’da yer alan post hoc analiz sonucuna göre, performans değerlendirmenin önündeki engeller
altölçeğinden en düşük puan alanların 0-5 sene çalışma deneyimi; en yüksek puan alanların ise 11-15 sene
çalışma deneyimi olanlar olduğu ortaya çıkmıştır. 11-15 sene çalışma deneyimi olanların performans
değerlendirme önündeki engellere ilişkin puanları diğer bütün gruplara göre anlamlı derecede yüksektir.
11-15 sene çalışma deneyimine sahip olan grup, çoğu şeyin performans değerlendirmeyi engellediğini
düşündükleri ve neredeyse her maddenin engel olarak adlandırıldığı bir grup olarak ortaya çıkmıştır.
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engeller altölçeğinden
aldıkları puanlar, kurumlarından memnniyet düzeyi değişkenine göre (az, orta düzeyde, oldukça ve
tamamıyla) anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları Tablo 10’da yer
almaktadır.
Tablo 10.
Kurumdan Memnuniyete Göre Performans Değerlendirme Önündeki Engellere İlişkin ANOVA Testi
N
10
2,58
,31
35
2,62
,32
42
3,48
,47
Tamamıyla
17
3,36
,11
Toplam
104 3,02
Az
Orta Düzeyde
Oldukça
Kareler
Toplamı
Standart
Sapma
Gruplar 5,97
arası
Grup içi
Kareler Ort. F
Sd
3
13,08
,43859
p
100
1,991
14,383
,000
Farkın
kaynağı
Az,Orta
Düzeyde>
Oldukça,
Tamamıyla
,138
103
Tablo 10’da ANOVA testi sonucunda gruplar arası anlamlı farklılık (p<.05) olduğu ortaya çıktığı için,
anlamlı farklılığın hangi gruplar arasında olduğunu belirlemek amacıyla post hoc testi yapılmıştır. “Levene
F” testine ait olan (Sig) değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu
durumlarda gruplar arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel
istatistik yöntemi kullanılmıştır. Post-hoc testi sonucunda bulunduğu kurumdan az ve orta düzeyde
memnun olan öğretim elemanları, oldukça ve tamamıyla memnun olanlara göre performans
değerlendirmenin önündeki engellerle ilgili belirtilen maddelere daha fazla katıldıklarını ortaya çıkmıştır.
1329
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
3.3 Performans Değerlendirmeye Yönelik Genel Yaklaşıma İlişkin Nitel Analiz
Çalışma kapsamında Eğitim Fakültesi öğretim elemanlarının performans değerlendirmeyle ilgili genel
yaklaşımlarına ilişkin nitel veriler toplanmıştır. Toplanan nitel veriler içerik analizi yöntemiyle analiz
edilmiştir. Yükseköğretimde performans değerlendirme yaklaşımına ilişkin dört açık uçlu soru sorulmuş ve
cevaplardan elde edilen nitel veriler içerik analiziyle incelenmiştir. İçerik analizi sonucunda “tutum boyutu,
akademisyenlerin öncelikleri, performans değerlendirmenin olumlu etkileri, performans
değerlendirmenin olumsuz etkileri, performans değerlendirme önündeki engeller, peformans
değerlendirmeyi engelleyen faktörlere ilişkin öneriler” temaları ortaya çıkmıştır.
1. Akademisyenlerin performansının verilere dayalı ve periyodik olarak ölçülüp değerlendirilmesi
hakkındaki düşünceniz nedir?
Türkiye’deki Eğitim Fakültelerindeki öğretim elemanları arasında, bu konuya ilişkin görüş
ayrılıkları bulunmaktadır. Katılımcıların çoğunluğu verilere dayalı ve periyodik bir değerlendirmeden yana
olsa da bu yaklaşıma ilişkin olumsuz tutumları olan ya da değerlendirme yaklaşımının suistimallere açık
olduğundan şüphelen bireyler bulunmaktadır. Buna ilişkin nitel verilerin analizi Tablo 11’de yer almaktadır.
Tablo 11.
Performans Değerlendirmenin Periyodik Yapılmasına Yönelik Verilerin İçerik Analiziyle Kodlanması
Tema
Tanım
Tutum
Boyutu
Performans Değerlendirme
yaklaşımına ilişkin olumlu,
olumsuz veya çekinik tutum
içerisinde olma
Kodlar
Frekans
Benimseyenler
28
Şüpheyle yaklaşanlar
12
Direnç gösterenler
10
Tablo 11 incelendiğinde öğretim elemanlarının çoğu performans değerlendirmenin birçok yönden
olumlu olacağını ve böyle bir değerlendirmeyi destekleyeceklerini belirtmişlerdir. Öğretim elemanlarının
vermiş oldukları cevaplar “Tutum Boyutu” teması içerisinde yer alan “benimseyenler”, “şüpheyle
yaklaşanlar”, ve “direnç gösterenler” kodları altında incelenmiştir. Kanıt zinciri göz önünde bulundurularak
bu kodlara ilişkin görüşlerden bazıları aşağıda verilmiştir:
Benimseyenler: “Yükseköğretimde kalite ve niteliği sağlamada performans değerlendirmenin iyi sonuçlar
getireceğine inanıyorum” (K6)
Şüpheyle yaklaşanlar: “Çalışmalara destek verilmesi güzel. Ancak her şey yayın mı? Değerlendirmeyi
kimlerin nasıl yapacağı bende soru işareti” (K5)
Direnç gösterenler: “Performans ölçülemez. Bireyleri karşılaştırmak anlamsızdır. Tarih boyunca denendi,
bir faydası görülmedi, tekrar denemenin anlamı yok” (K13)
2. Performansa dayalı değerlendirme yapılırken, içerisinde hangi boyutların olmasını istersiniz? Bu
boyutları önem sırasına göre maddeler halinde yazınız.
Eğitim Fakültelerindeki öğretim elemanları performans değerlendirme yaklaşımında hangi boyutların
yer alması gerektiğiyle ilgili çeşitli görüşler belirtmişlerdir. Öğretim elemanlarının hangi boyutlara ne kadar
önem verdiklerini ifade etmeleri nitel açıdan önemli veriler sağlamıştır. Buna ilişkin nitel verilerin analizi
Tablo 12’de yer almaktadır.
1330
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 12.
Performans Değerlendirmede Yer Alması Gereken Boyutlara İlişkin Verilerin İçerik Analiziyle Kodlanması
Tema
Akademisyenlerin
Öncelikleri
Tanım
Performans
değerlendirme
yaklaşımında yer alması
gereken ögeler ve bu
ögelerin önem sırasına
konulması
Kodlar
Frekans
Akademik yayınlar
17
Öğretimin
değerlendirilmesi
kalitesinin
10
Lisans ve lisansüstü danışmanlık
8
İş yükleri (ders saati vb)
6
Jüri Üyelikleri (Tez, doçentlik vb.)
5
Öznel ilgi ve uğraş alanları
4
Tablo 12 incelendiğinde öğretim elemanlarının bir performans değerlendirme kapsamında, öncelikle
akademik yayınların sayısının ve kalitesinin ölçülmesini, bundan sonra sınıf içerisinde öğretim elemanının
ders işleme biçimi, kullandığı yöntemler, içeriği sunuş kalitesi, materyal kullanımı, öğrenmeyi kalıcı hale
getirmek için yaptığı her şeyin değerlendirilmesi gerektiğini belirtmişlerdir. Öğretim elemanlarının
değerlendirme ögelerinin önemiyle ilgili vermiş oldukları cevaplar “Akademisyenlerin Öncelikleri” teması
içerisinde yer alan “akademik yayınlar”, “öğretimin kalitesinin değerlendirilmesi”, “lisans ve lisansüstü
danışmanlık”, “iş yükleri”, “jüri üyelikleri” ve “öznel ilgi ve uğraş alanları” kodları altında incelenmiştir.
Kanıt zinciri göz önünde bulundurularak bu kodlara ilişkin görüşlerden bazıları aşağıda verilmiştir:
Akademik yayınlar: “Öğretim elemanlarıyla ilgili yapılan bir performans değerlendirmenin en başlıca
üzerinde durması gereken boyut öğretim elemanlarının yayın yapması, bu yayınların kalite ve niteliğinin
ölçülmesidir.” (K6)
Öğretimin kalitesinin değerlendirilmesi: “Akademik çalışmalar kadar önemli olan başka bir boyut
öğretimdir. Sınıf içi çalışmalar, özellikle aktivite ve öğretim yöntemlerine bakılabilir”
Lisans ve lisansüstü danışmanlık: “Öğrencilere yapılan danışmanlıklar göz ardı ediliyor. Mesela tez
danışmanlığı oldukça zahmetli bir iş. Bu performansın da değerlendirmeye alınması lazım”(K22)
İş yükleri: “Derse girmekten diğer şeylere zaman kalmıyor. Bir akademisyen yayından çok girdiği
derslerle ölçülebilir. Çok derse giren hocalar çok çalışan hocalardır.” (K30)
3. Performans değerlendirmesinin akademisyenlerin performansını etkileyen olumlu ve olumsuz
yönleri nelerdir?
Eğitim Fakültelerindeki öğretim elemanları, performans değerlendirme yaklaşımının yükseköğretimde
performansı olumlu veya olumsuz etkileyebileceğini belirtmişlerdir. “Olumlu Etkileri” teması altında
“motivasyon”, “maddi destek”, “kalite arayışı”, “özeleştirinin gelişmeyi teşvik etmesi”, “dinamizmin
sürekliliğinin sağlanması” kodları ortaya çıkarken; “Olumsuz Etkileri” teması altında “kurumiçi rekabet”,
“akademik sahtekarlıklar”, “stres kaynağı”, “niceliğin niteliği gölgelemesi”, kodları ortaya çıkmıştır. Buna
ilişkin nitel verilerin analizi Tablo 13’te yer almaktadır.
1331
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Tablo 13.
Performans değerlendirme yaklaşımının yol açacağı olumlu ve olumsuz etkilere ilişkin verilerin içerik
analiziyle kodlanması
Tema
Tanım
Kodlar
Frekans
Motivasyon
12
Maddi destek
8
Kalite Arayışı
8
Özeleştirinin gelişmeyi teşvik etmesi
4
Dinamizmin sürekliliğinin sağlanması
4
Kurumiçi rekabet
7
Akademik sahtekarlıklar
6
Stres kaynağı
6
Niceliğin niteliği gölgelemesi
8
Olumlu Etkileri
Performans
değerlendirme
yaklaşımının yol
açacağı olumlu
durumlar
Olumsuz Etkileri
Performans
Değerlendirme
yaklaşımının yol açacağı olumsuz
durumlar
Tablo 13 incelendiğinde öğretim elemanlarının performans değerlendirilme yaklaşımının yol açacağı
hem olumlu hem olumsuz durumlarla ilgili görüş belirttiği görülmektedir. Öğretim elemanları performans
değerlendirmenin olumlu etkilerine yönelik 5 kod altında 36 görüş belirtirken; olumsuz etkilerine yönelik
4 kod altında 27 görüş belirtmişlerdir. Kanıt zinciri göz önünde bulundurularak bu kodlara ilişkin
görüşlerden bazıları aşağıda verilmiştir:
Motivasyon: “Öğretim üyesini yeni çalışmalar yapmaya yönlendirir” (K24)
Kalite Arayışı: “Değerlendirmeye tabi tutulan akademisyenler bir kalite arayışı içerisine girer. Kimse
kötü hoca olarak anılmak istemez” (K9)
Dinamizmin sürekliliği: “Devlet üniversitelerinde özellikle eski hocalar kendini yenilemek konusunda
isteksizler. Bu durum da yükseköğretimin köhneleşmesine yol açıyor; çünkü bir değerlendirme ve yaptırım
yok. Değerlendirme demek aynı zamanda dinamizm anlamına gelir”(K29)
Kurumiçi rekabet: “İş birliğini engeller, kıskançlıklar olabilir, bir rekabet ortamı doğarsa bu verimi
artırmak yerine egoist davranışları artırır” (K36)
Akademik sahtekarlıklar: “Sahte verilerle yayın yapma, sonuncu isim olarak adını yazdırma gibi şeyler
olabilir”
Niceliğin, niteliği gölgelemesi: “Yayın yayın yayın nereye kadar. Şimdi herkes bir sürü yayın yapıyor ama
kaçı kaliteli bu normal değil. Birisi çok sayıda kaliteli yayın yapabilir ama kaçı böyle?”
4. Yükseköğretimde akademisyenlerin performansını artırma önündeki engeller nelerdir ve bu
engellerin ortadan kalkması için önerileriniz nelerdir?
1332
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Eğitim Fakültelerindeki öğretim elemanları, performans değerlendirmenin önündeki engeller ve
bunlara yönelik çeşitli öneriler belirtmişlerdir. “Engeller” teması altında “yoğun iş yükü (ders, danışmanlık,
idari görevler)”, “içsel motivasyon eksikliği”, “kalabalık öğrenci sayısı”, “çabaların takdir görmemesi”,
“örgütsel işleyişin hantallığı” kodları ortaya çıkarken; “Öneriler” teması altında “memur istihdamı”, “yayın
ve çalışmaların kurumca desteklenmesi”, “ders yükünü düşük tutmak”, “bireye YÖK tarafından dönemlik
bütçe tahsisi” kodları ortaya çıkmıştır. Buna ilişkin nitel verilerin analizi Tablo 14’te yer almaktadır.
Tablo 14.
Performans Artırma Önündeki Engeller Ve Bu Engellere Yönelik Önerilere İlişkin Verilerin İçerik Analiziyle
Kodlanması
Tema
Tanım
Kodlar
Frekans
Yoğun iş yükü (ders,danışmanlık,idari görevler)
18
Çabaların takdir görmemesi
10
Örgütsel işleyişin hantallığı
8
Kalabalık öğrenci sayısı
6
İçsel motivasyon eksikliği
4
Ders yükünü düşük tutmak
9
Yayın ve çalışmaların kurumca desteklenmesi
8
Ölçütlerin üniversiteler tarafından belirlenmesi
5
Memur istihdamı
4
Bireye, YÖK tarafından dönemlik bütçe tahsisi
4
Engeller
Öğretim
performanslarını
önündeki engeller
elemanlarının
artırmasının
Öneriler
Performans artırma önündeki
engellerin ortadan kalkması için
öneriler
Tablo 14 incelendiğinde, öğretim elemanları engellere yönelik 5 kod altında 44 görüş belirtirken;
önerilere yönelik 4 kod altında 26 görüş belirtmişlerdir. Kanıt zinciri göz önünde bulundurularak bu kodlara
ilişkin görüşlerden bazıları aşağıda verilmiştir:
Yoğun iş yükü: “Kaliteli bir şey ortaya koymak için zaman lazım. Öğretim elemanlarının zamanı yok. Ya
ders vermekte, ya bir öğrencisiyle ilgilenmekte veya bir idari görevi var onun işleriyle uğraşmak
durumunda” (K25)
İçsel motivasyon eksikliği: “Akademide motive edici unsurlardan çok motivasyonu düşüren şeyler var.
Meslek seçiminde kişi isteyerek de başka sebeplerle akademiye girdiyse performansını iyileştirmesi gerekli
isteği olmaz”(K19)
Örgütsel işleyişin hantallığı: “Proje ve benzeri çalışmalarda çok yavaş işleyen resmi süreç, bürokrasi ve
kağıt işleri performans artışı önünde engel olur” (K8)
Yayın ve çalışmaların kurumca desteklenmesi: “Performans artışı için en büyük önerim çalışanların
çabalarının kurum tarafından desteklenmesidir. Bu yayın olur, kongre olur veya kişisel gelişim için eğitim
olur” (K7)
Memur istihdamı: “Bölümlere daha fazla memur alınırsa en azından öğretim elemanlarını uğraştıran
ve bir sürü zamanını alan evrak işlerinden kurtulmuş olurlar.” (K21)
1333
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
YÖK tarafından dönemlik bütçe tahsisi: “YÖK, her öğretim elemanına dönem başında bir bütçe ayırmalı,
bütçe kullanma süreçlerini planlamalarını istemeli ve dönem sonunda bütçe-ürün karşılaştırması
yapmalıdır”(K10)
Tartışma ve Sonuç
Bu çalışmanın bulguları doğrultusunda, yükseköğretimde performans değerlendirme yaklaşımıyla ilgili
eğitim fakültelerindeki öğretim elemanlarının görüşlerinin oldukça farklılaştığı görülmektedir. 15 seneden
fazla çalışma deneyimine sahip olanların ve bulunduğu kurumdan yüksek düzeyde memnun olan öğretim
elemanlarının performans değerlendirmeye ilişkin beklentilerinin diğerlerine göre düşük olduğu
görülmektedir. Akademik ünvana göre bakıldığında, doktor araştırma görevlileri ve doktor öğretim
üyelerinin performans değerlendirme yaklaşımına olumlu baktığı, doçentlerin ve profesörlerin ise
performans değerlendirmeye düşük düzeyde olumlu baktıkları görülmektedir. Benzer şekilde,
Stonebraker ve Stone (2015) zorunlu emekliliğin kalkmasıyla birlikte öğretim elemanlarının yaş
ortalamalarında artış olduğunu, bu yaşlanmanın sınıf içerisinde üretkenlik açısından getireceği
olumsuzluklar konusunda endişeler bulunduğunu belirtmektedirler. Öğretim elemanlarının
performanslarının öğrenciler tarafından değerlendirilmesinde yaş değişkeninin olumsuz bir etkisi
olduğunu ve bu etkinin cinsiyet ve adademik branş bazında da görüldüğü gözlenmektedir; fakat bu
olumsuz etki öğretim elemanları kırklı yaşların ortasına ulaşınca kadar görülmemektedir. Bu bulgu, Esen
ve Esen’in (2015) çalışmasıyla paralellik göstermektedir. Onların çalışmasında akademik unvanlar
yükseldikçe performans değerlendirmesinin hem öğretim üyeleri için, hem de kurumlar için yaratacağı
sonuçlara ilişkin olumlu algılamanın azaldığı ortaya çıkmıştır. Bianchini, Lissoni ve Pezzoni, (2013)
performans değerlendirme ile ilgili yaptıkları çalışmada öğrencilerinin profesörleri, doktor öğretim
üyelerinden daha olumsuz değerlendirdiklerini belirtmişlerdir. Genel olarak öğretim elemanlarının nitel
görüşlerine bakıldığında ise akademik camiada akademik unvan fark etmeksizin performans
değerlendirme yaklaşımıyla ilgili birtakım güvensizlik ve tereddütlerin olduğu görülmektedir.
Performans değerlendirme yaklaşımına ilişkin öğretim elemanlarının beklentilerine bakıldığında etkili
öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması, öğretim üyesinin mesleki gelişiminin
olumlu etkilenmesi, öğretim üyesinin iş yükünün artması ve kurum içi gerginliğe neden olması bakımından
yüksek beklenti içerisinde oldukları ortaya çıkmıştır. Nitel bulgulara bakıldığında ise öğretim elemanları
arasında performans değerlendirme konusunda benimseyenler ve şüpheyle yaklaşanlar olmak üzere
farklılaşmanın olduğu görülmektedir. Eğitim fakültelerindeki öğretim elemanları performans
değerlendirmenin motivasyon ve kalite arayışını artırdığını; fakat bunun yanında kurum içi rekabet ve
akademik sahtekarlıklara yol açabileceğini ifade etmişlerdir. Geleneksel olarak fakültelerde performans
değerlendirmeleri araştırma göstergeleri üzerinde odaklanmaktadır (Bogt ve Scapens, 2012); bu yüzden
yükseköğretim kurumları değerlendirme yaparken devletten alınan destek, araştırma ödülleri ve
araştırmada üst sıralarda olma gibi sadece en iyi yayınları yapan öğretim elemanlarını desteklemek üzere
eğilim göstermektedirler (Douglas 2013, Hopwood 2008). Bu çalışmada performans değerlendirmenin
önündeki en önemli engelleri yoğun iş yükü ve içsel motivasyon eksikliği olarak görürken; öneriler
kapsamında ise daha fazla memur istihdamını, yayın ve çalışmaların kurumca desteklenmesini
belirtmişlerdir. Bu bulgular, performans değerlendirmeyle ilgili çalışma yürüten Tonbul (2008); Esen ve
Esen (2015) ve Başbuğ ve Ünsal’ın (2010) bulgularından farklılık göstermektedir. Tonbul (2008), öğretim
üyelerinin, uygulamaya konulacak bir performans değerlendirme yaklaşımına genelde olumlu yaklaştığını,
beklenti açısından ise etkili performansın önündeki engellerin saptanması ve öğretim üyesinin kendi
eksiklerini görmesi bakımından daha yüksek beklenti düzeyi içerisinde olduklarını ifade etmiştir. Esen ve
Esen (2015) ise öğretim üyeleri arasında performansların değerlendirilmesinin kurumlar ve öğretim üyeleri
için yaratacağı katkının olumlu yönde olacağına dair bir algı bulunduğunu belirtmektedirler. Beklentilerle
ilgili olarak da performans değerlendirmeye yönelik nitelikli bir kurum kültürünün gelişmesi, kurumsal
yenileşmenin süreklilik kazanması, öğretim üyelerinin mesleki gelişiminin olumlu etkilenmesi ve öğretim
üyelerinin kendi eksiklerini daha iyi görmesi boyutlarında akademisyenlerin beklenti içerisinde olduklarına
vurgu yapmışlardır.
1334
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Bu çalışmanın sonucunda performansın önündeki en önemli engellerin yükseköğretim kurumlarının
mevcut örgütsel işleyişi ve öğretim elemanlarının iş yükü olduğu, Tonbul’un (2008) çalışmasında örgütsel
olanakların yetersizliği, kurumlarda egemen olan kültür ve değerlendirme ölçütleri konusundaki belirsizlik
olduğu, Esen ve Esen’in çalışmasında ise (2015) en önemli engellerin sırasıyla kurumsal olanakların
eksikliği, yükseköğretim kurumlarının mevcut örgütsel işleyişi ve akademik yükseltme kriterleri olduğu
ortaya çıkmıştır. Başbuğ ve Ünsal (2010) akademik personelin çoğunluğunun performans
değerlendirilmesine olumlu baktığını ve performansın etkileyen en önemli engelleyici faktörün bilimsel
araştırmanın gerektirdiği fiziksel koşullardan mahrum olmak (laboratuvar, oda, araç-gereç, vb.) olduğunu
belirtmişlerdir. Özgüngör ve Duru (2014) ise ders yükü, deneyim, öğretim elemanının toplam öğrenci sayısı
arttıkça öğretim elemanına yönelik algılarda olumsuzlaşma olduğunu tespit etmiştir. Eğitim Fakültesi
öğrencilerinin öğretim elemanlarına diğer tüm fakültelerin öğrencilerinden daha yüksek puanlar
verdiklerini, Teknik Eğitim ve Mühendislik Fakültesi öğrencilerinin ise öğretim elemanlarına diğer tüm
fakültelerin öğrencilerinden daha düşük puanlar verdiklerini göstermiştir. Ders yüküyle ilgili analizler, ders
yükü 45 saat ve daha fazla olan öğretim elemanlarının, ders yükü daha az olan tüm öğretim
elemanlarından daha olumsuz değerlendirildiklerini ortaya koymuştur. Eğitim Fakültesi için 60-100 arası
öğrencisi olan öğretim elemanları en kötü değerlendirmeleri almışlar. Arnăutu ve Panc (2015) öğrenci ve
öğretim elemanlarının farklı beklentileri olduğunu, öğrencilerin daha çok iletişimsel konular üzerinde
odaklanıp profesörlerden iyi bir ilişki kurmaları ve kişisel dönüt vermelerini bekledikleri, profesörlerin ise
eğitsel sürecin kalitesi üzerinde (bilginin güncelliği gibi) durduklarını belirtmektedirler.
Bu çalışmada öğretim elemanlarının performans değerlendirme kapsamında öncelikle araştırma ve
akademik yayınların değerlendirilmesini, daha sonra öğretim hizmetleri ve lisansüstü danışmanlık
hizmelerinin değerlendirilmesini istedikleri görülmektedir. Bu bulgu Braunstein ve Benston’ın (1973)
çalışması tarafından desteklenmektedir. Onların çalışmasında araştırma ve prestijin performans
değerlendirme birbiriyle yüksek derecede ilişkili olduğu, etkili öğretimin performans değerlendirmeyle
orta derecede ilişkili olduğu ortaya çıkmaktadır. Öğretim elemanlarının öğretim hizmetinin kalitesi
öğrenciler tarafından değerlendirilmektedir; fakat Arnăutu ve Panc (2015) bu durumu eleştirmekte ve bu
değerlendirmelerde araştırma ve yayın üretkenliği, yönetim yeterlilikleri ve akademik tanınırlık göz
önünde bulundurulmadığını, dolayısıyla öğrencilerin öğretim elemanlarının fakülte içerisinde roller
hakkında yeterli bilgiye sahip olmadıklarını vurgulamaktadır. Öğretim elemanlarının performansının
öğrenciler tarafından değerlendirmesi konusunda çalışma yürüten Ünver (2012), öğretim elemanlarının
çoğunun öğrencilerin öğretimi objektif olarak değerlendireceğini düşünmediğini, öğrencilerin kendilerine
dair ortaya koyduğu öğretim becerilerine ilişkin görüşleri üzerinde düşünmek yerine akademik çalışmalar
yapmayı tercih ettiğini belirtmiştir. Turpen, Henderson ve Dancy (2012) yükseköğretim kurumlarının
öğretimin kalitesini değerlendirirken öğrencilerden gelen niceliksel puanlamalar üzerinde odaklandığını;
fakültelerin ise öğrencilerin test performansları ve akademik başarılarını kıstas aldıklarını belirtmektedir.
Bu açıdan, öğretim performansı değerlendirmede kullanılan ölçme araçlarının niteliği oldukça önem
kazanmaktadır. Kalaycı ve Çimen (2012), yükseköğretim kurumlarında akademisyenlerin öğretim
performansını değerlendirme sürecinde kullanılan anketleri incelemiş ve anketlerin belli bir sistematiği
temele almadan hazırlandığını, anketlerde yer alan maddelerin beşte birinin madde yazım kurallarına
uygun olmadığını, dolayısıyla öğretim elemanlarının performansını ölçmede yetersiz kaldığını ortaya
koymuştur. Bazı çalışmalarda da öğretim elemanlarının performansının öğrenciler tarafından
değerlendirilmesi konusunda öğrencilerin değerlendirmelerinin öğretimin kalitesiyle ilgili olduğu kadar
öğretimle ilişiği olmayan fiziksel çekicilik ve dersin rahatlığı gibi niteliklerle de ilgilisi olabileceği ortaya
konulmuştur (Hornstein, 2017; Tan ve diğerleri, 2019). Shao, Anderson ve Newsome (2007) öğretim
hizmetinin kalitesinin değerlendirilmesi hususunda akademisyenlerin sınıf ziyaretleri, derse hazırlık,
alandaki güncel gelişmeleri takip etme durumu ve meslektaş değerlendirmelerine daha fazla yer
verilmesine ilişkin beklentilerinin olduğunu belirtmektedirler.
Bu çalışmada öğretim elemanları performans değerlendirmesinin etkili öğretim üyesinin kriterleri
konusunda görüş birliği oluşturduğu ve öğretim üyesinin mesleki gelişimini olumlu etkilediği ortaya
çıkmıştır. Bu nitelikler eğitim fakültelerinde görev yapan öğretim elemanlarının mesleki açıdan kalitelerini
artırmakta ve sürdürebilir bir mesleki gelişim süreci sağlamaktadır. Filipe, Silva, Stulting ve Golnik (2014)
1335
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
performans değerlendirme sayesinde iyileşen sürdürülebilir mesleki gelişimin sadece eğitsel etkinliklerle
sınırlı olmadığını, aynı zamanda yönetim, takım çalışması, profesyonellik, kişilerarası iletişim ve hesap
verebilirlik gibi nitelikleri de geliştirdiğini vurgulamaktadırlar. Açan ve Saydan (2009) öğretim elemanlarına
yönelik akademik kalite beklentilerini belirlenmeye çalışmışlar ve öğretim elemanının akademik kalite
özelliklerinin “öğretim elemanının öğretim yeteneği, öğretim elemanının ölçme-değerlendirme becerisi,
öğretim elemanının empati kurma becerisi, öğretim elemanının mesleki sorumluluğu, öğretim elemanının
derse ilgiyi özendirme becerisi, öğretim elemanının derse verdiği önem ve öğretim elemanının nezaketi”
boyutlarından oluştuğunu tespit etmişlerdir. Esen ve Esen (2015), Amerika Birleşik Devletleri’nde öğretim
üyelerinin performanslarının genellikle dört boyut esas alınarak yapıldığını, bu boyutların sırasıyla eğitimöğretim, araştırma (profesyonel gelişim), topluma hizmet ve yönetime hizmet olduğunu ifade etmiştir. Bu
dört boyut arasında ise en önemli olanların eğitim-öğretim boyutu ile araştırma boyutu olduğuna vurgu
yapmışlardır. Bu boyutlara göre yapılan performans değerlendirme sonuçlarının ise öğretim üyelerinin
görev süresinin uzatılmasında, bulunduğu kadrodaki uygunluğuna karar verilmesinde ve terfisinde
kullanıldığı ifade edilmiştir.
Bu çalışmada akademik teşvik almayan öğretim elemanlarının performans değerlendirmeye ilişkin
beklentilerinin diğerlerine göre düşük olduğu görülmektedir. Kalaycı (2008), Türkiye’de performans
değerlendirme ile ilgili olarak bu konudaki çabalar ve çalışmaların dünyadaki uygulamalar yanında henüz
mayalanma aşaması değil, malzemelerin hazırlanma aşamasında bile olmadığını belirtmektedir. Bu
sorunun üzerinde odaklanan Yükseköğretim Kurulu, 2015 yılında “bir yükseköğretim kurumunun veya
programının iç ve dış kalite standartları ile uyumlu kalite ve performans süreçlerini tam olarak yerine
getirdiğine dair güvence sağlayabilmek için” Yükseköğretim Kalite Kurulu oluşturulmuştur. Buna paralel
olarak, yükseköğretimde çalışan akademik personelin performansını standart ve nesnel esaslara göre
değerlendirmek, bilimsel araştırmalar ve akademik çalışmaların etkililiğini artırmak ve akademiyenleri
desteklemek amacıyla Akademik Teşvik Ödeneği Yönetmeliği yürürlüğe konulmuştur. Bu çalışmada ortaya
çıkan performans değerlendirme sisteminin olumlu etkileri arasında akademik elemanların motive olması,
öğretim elemanlarının etkili öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması ile ilgili
beklentilerle akademik teşvik yönetmeliğinin uyumlu olduğu ve akademik teşviğe hak kazanan öğretim
elemanlarının performans değerlendirmeyle ilgili beklentilerinin yüksek olduğu görülmektedir.
Özet olarak, performans değerlendirme durumuna ilişkin eğitim fakültesindeki öğretim elemanları
arasında bir görüş birliği olmadığı görülmüştür. Öğretim elemanlarının performans değerlendirmesinin
olumlu etkileri konusunda farkındalıkları bulunmakta; fakat ölçmenin güvenirliği, değerlendirme kriterleri,
değerlendirme süreci ve değerlendiriciler hakkında endişeleri bulunmaktadır. Bu çalışma kapsamında,
öğretim elemanları için değerlendirmede yer alması gereken en önemli kriterlerin sırasıyla araştırma ve
yayın, yapılan öğretimin kalitesi, lisans ve lisansüstü danışmanlık olduğu ortaya çıkmıştır. Performans
değerlendirme sisteminin olumlu etkileri arasında akademik elemanların motive olması, finansal destek
sağlanması ve kalite arayışına sev etmesi olarak belirtilmiştir. Buna rağmen, öğretim elemanları
değerlendirme sisteminin olumsuz etkileri arasında kurumiçi rekabet ve akademik sahtekarlık yer
almaktadır. Öğretim elemanları tarafından performans değerlendirme ile ilgili sorunların çözülmesi
amacıyla ders yüklerinin azaltılması, akademik çabalara kurumsal destek sağlanması, öğretim elemanına
araştırmalar için YÖK tarafından bütçe ayrılması ve daha fazla memur istihdamı gibi öneriler getirilmiştir.
Performans değerlendirmede yer alması gereken ölçütlerle ilgili farklı talepler bulunsa da yükseköğretimin
kalitesini artırma ve sistematik iyileştirmeler yapma açısından performans takibi ve çoklu veri türlerine
dayalı etkili bir değerlendirme sistemi oluşturmak oldukça önemli görülmektedir.
Bu araştırmanın sonucunda öneriler kapsamında yükseköğretim kurumlarının performans
değerlendirme sürecinde nesnellik ve etkililiği artırmaları ve fakülteler içerisinde insan kaynakları
hizmetleri oluşturmaları tavsiye edilmektedir. Ayrıca, bu kurumların sürdürülebilir güçlü performans
planları tasarlamaları, bütüncül bir değerlendirme döngüsü kullanmaları, öğretim elemanlarına,
öğrencilere ve iç paydaşlara performansın nasıl iyileştirilebileceğine ilişkin danışmanlık hizmetleri
sunulması, performans değerlendiriciler için anlaşılır ve nesnel yönergeler hazırlanması ve dönütlerin
yargılayıcı değil değerli olduğunu düşündüren kurum içi kültürün geliştirilmesi önerilmektedir.
1336
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
References
Açan, B., & Saydan, R. (2009). Öğretim elemanlarının akademik kalite özelliklerinin değerlendirilmesi:
Kafkas Üniversitesi İİBF örneği. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 13 (2), 226-227.
Arnăutu, E., & Panc, I. (2015). Evaluation criteria for performance appraisal of faculty members. ProcediaSocial and Behavioral Sciences, 203, 386-392.
Başbuğ, G., & Ünsal, P. (2010). Kurulacak bir performans değerlendirme sistemi hakkında akademik
personelin görüşleri: Bir kamu üniversitesinde yürütülen anket çalışması. İstanbul Üniversitesi Psikoloji
Çalışmaları Dergisi, 29(1), 1-24.
Batool, Z., Qureshi, R. H., & Raouf, A. (2010). Performance evaluation standards for the HEIs. Higher
Education Commission Islamabad, Pakistan. Retrieved October 12, 2019 from
https://au.edu.pk/Pages/QEC/Manual_Doc/Performance_Evaluation_Standards_for_HEIs.pdf
Bianchini, S., Lissoni, F., & Pezzoni, M. (2013) Instructor characteristics and students’ evaluation of
teaching effectiveness: Evidence from an Italian engineering school. European Journal of Engineering
Education, 38 (1),38-57.
Bogt, H. J., & R. W. Scapens. (2012). Performance management in universities: Effects of the transition to
more quantitative measurement systems. European Accounting Review, 21 (3), 451–97
Braunstein, D. N., & Benston, G. J. (1973). Student and department chairman views of the performance
of university professors. Journal of Applied Psychology, 58(2), 244.
Creswell, J.W., & Plano Clark, V.L. (2014). Designing and conducting mixed methods research. Thousand
Oakes, CA, Sage Publications.
Çakıroğlu, J., Aydın, Y., & Uzuntiryaki, E. (2009). Üniversitelerde öğretim performansının değerlendirilmesi.
Orta Doğu Teknik Üniversitesi Eğitim Fakültesi Raporu.
Çalışkan, G. (2006). Altı sigma ve toplam kalite yönetimi. Elektronik Sosyal Bilimler Dergisi, 5(17), 60-75.
Douglas, A. S. (2013). Advice from the professors in a university social sciences department on the
teaching-research nexus. Teaching in Higher Education, 18 (4), 377–88.
Elton, L. (1999). New ways of learning in higher education: managing the change. Tertiary Education and
Management, 5(3), 207-225.
Esen, M., & Esen, D. (2015). Öğretim üyelerinin performans değerlendirme sistemine yönelik tutumlarının
araştırılması. Yüksekögretim ve Bilim Dergisi, 5(1). 52-67
Etzkowitz, H., Webster, A., Gebhardt C., & Terra., B.R.C. (2000). The future of the university and the
university of the future: evolution of ivory tower to entrepreneurial paradigm. Research Policy, 29(2),
313-330.
Filipe, H. P., Silva, E. D., Stulting, A. A., & Golnik, K. C. (2014). Continuing professional development: Best
practices. Middle East African journal of ophthalmology, 21(2), 134.
Glaser, S., Halliday, M. I., & Eliot, G. R. (2003). Üniversite mi? Çeşitlilik mi? Bilgideki önemli ilerlemeler
üniversitenin içinde mi, yoksa dışında mı gerçekleşiyor?. N. Babüroğlu (Ed.), Eğitimin Geleceği
Üniversitelerin ve Eğitimin Değişen Paradigması (ss. 167-178). İstanbul: Sabancı Üniversitesi Yayını.
Hamid, S., Leen, Y. M., Pei, S. H., & Ijab, M. T. (2008). Using e-balanced scorecard in managing the
performance and excellence of academicians. PACIS 2008 Proceedings, 256.
Higher Education Authority (2013). Towards a performance evaluation framework: Profiling Irish Higher
Education. Dublin: HEA
Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating
faculty performance. Cogent Education, 4(1), 1304016.
1337
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Hopwood, A. G. (2008). Changing pressures on the research process: on trying to research in an age when
curiosity is not enough. European Accounting Review, 17 (1), 87–96.
Kalaycı, N. (2009). Yüksek öğretim kurumlarında akademisyenlerin öğretim performansını değerlendirme
sürecinde kullanılan yöntemler. Kuram ve Uygulamada Egitim Yönetimi Dergisi, 15(4), 625-656.
Kalaycı N., & Çimen O. (2012). Yükseköğretim kurumlarında akademisyenlerin öğretim performansını
değerlendirme sürecinde kullanılan anketlerin incelenmesi. Kuram ve Uygulamada Eğitim Bilimleri,
12(2), 1-22
Kim, H. B., Myung, S. J., Yu, H. G., Chang, J. Y., & Shin, C. S. (2016). Influences of faculty evaluating system
on educational performance of medical school faculty. Korean Journal Of Medical Education, 28(3),
289-294.
Latham, G. P., & Pinder, C. C. (2005). Work motivation theory and research at the dawn of the twenty-first
century. Annu. Rev. Psychol., 56, 485-516.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2. Basım).
California: SAGE Publications.
Mokkink, L. B., Terwee, C. B., Gibbons, E., Stratford, P. W., Alonso, J., Patrick, D. L., & de Vet, H. C. (2010).
Inter-rater agreement and reliability of the COSMIN Checklist. BMC Medical Research Methodology,
10, 82.
O'Connor, M., Patterson, V., Chantler, A., & Backert, J. (2013). Towards a performance evaluation
framework: profiling Irish higher education. NCVER's free international Tertiary Education Research.
Retrieved September 8, 2019 from http://hea.ie/assets/uploads/2017/06/Towards-a-PerformanceEvaluation-Framework-Profiling-Irish-Higher-Education.pdf.
Özgüngör, S., & Duru, E. (2014). Öğretim elemanları ve ders özelliklerinin öğretim elemanlarının
performanslarına ilişkin değerlendirmelerle ilişkileri. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi,
29 (29-2), 175-188.
Paige, R. M. (2005). Internationalization of higher education: Performance assessment and indicators.
Nagoya Journal of Higher Education, 5(8), 99-122.
Shao, L. P., Anderson, L. P., & Newsome, M. (2007). Evaluating teaching effectiveness: Where we are and
where we should be. Assessment & Evaluation in Higher Education, 32(3), 355-371.
Stonebraker, R. J., & Stone, G. S. (2015). Too old to teach? The effect of age on college and university
professors. Research in Higher Education, 56(8), 793-812.
T. C. Resmi Gazete. (2015). Akademik teşvik ödeneği yönetmeliği. Karar Sayısı: 2015/8305. Kabul tarihi:
14/12/2015. Yayımlandığı tarih: 18 Aralık 2015. Sayı: 29566.
Tan, S., Lau, E., Ting, H., Cheah, J. H., Simonetti, B., & Lip, T. H. (2019). How do students evaluate
instructors’ performance? Implication of teaching abilities, physical attractiveness and psychological
factors. Social Indicators Research, 1-16.
Tezsürücü, D., & Bursalıoğlu, S. A. (2013). Yükseköğretimde değişim: kalite arayışları. Kahramanmaraş
Sütçü İmam Üniversitesi Sosyal Bilimler Dergisi, 10 (2), 97-108.
Tonbul, Y. (2008). Öğretim üyelerinin performansının değerlendirilmesine ilişkin öğretim üyesi ve öğrenci
görüşleri. Kuram ve Uygulamada Eğitim Yönetimi, 56 (56), 633-662.
Turpen, C., Henderson, C., & Dancy, M. (2012, Ocak). Faculty perspectives about instructor and
institutional assessments of teaching effectiveness. In AIP conference proceedings, 1413 (1), 371-374.
UNESCO (2004), Higher Education in a Globalized Society. UNESCO Education Position Paper, France
1338
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339
Ünver, G. (2012). Öğretim elemanlarının öğretimin öğrencilerce değerlendirilmesine önem verme
düzeyleri. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 43, 472-484.
Vidovich, L. ve Slee, R. (2001). Bringing universities to account? Exploring some global and local policy
tensions. Journal of Education Policy, 16(5), 431-453.
Vincent, T. N. (2010). A constructive model for performance evaluation in higher education institutions.
Retrieved from https://ssrn.com/abstract=1877598 adresinden erişilmiştir.
1339
Journal of University Teaching & Learning Practice
Volume 18
Issue 8 Standard Issue 4
Article 14
2021
Preservice teachers’ perceptions of feedback: The importance of timing,
purpose, and delivery
Christina L. Wilcoxen
University of Nebraska, United States of America, cwilcoxen@unomaha.edu
Jennifer Lemke
University of Nebraska, United States of America, jenniferlemke@unomaha.edu
Follow this and additional works at: https://ro.uow.edu.au/jutlp
Recommended Citation
Wilcoxen, C. L., & Lemke, J. (2021). Preservice teachers’ perceptions of feedback: The importance of
timing, purpose, and delivery. Journal of University Teaching & Learning Practice, 18(8). https://doi.org/
10.53761/1.18.8.14
Research Online is the open access institutional repository for the University of Wollongong. For further information
contact the UOW Library: research-pubs@uow.edu.au
Preservice teachers’ perceptions of feedback: The importance of timing, purpose,
and delivery
Abstract
If the purpose of feedback is to reduce the discrepancy between the established goal and what is
recognized, then how can this discrepancy be minimized through support and guidance? Feedback is
instrumental to a preservice teacher development during their teacher preparation program. This
qualitative study examines 31 first year teachers’ previous experiences with feedback during their
undergraduate practicums. The two research questions addressed: What can be learned from PSTs’
perceptions of feedback practices utilized in teacher preparation programs? and What modifications or
adaptations can be made to current feedback practices and structures in teacher preparation programs to
enhance teacher efficacy and classroom readiness? Semi structured interviews provided a comparison of
qualitative data and an opportunity for open ended questioning. Using descriptive analysis, researchers
discovered that current feedback loops and structures can inhibit pre-service teachers’ ability to make
meaning from the information and move their learning and instruction forward. As teacher preparation
programs work to establish more dialogic approaches to feedback that provide pre-service teachers with
multiple opportunities to reflect individually and collaboratively with university faculty, timing, purpose,
and delivery are important components to consider. Although this article is written based on preservice
teacher perceptions, the implications pertain to multiple fields and authors share a universal framework
for feedback.
Practitioner Notes
1. The goal of teacher preparation is simple: create teachers who are well equipped with the
knowledge and skills to positively impact PK-12 students. Field experiences are
embedded throughout teacher preparation programs to provide pre-service teachers
(PSTs) with meaningful opportunities to develop their ability and knowledge of effective
instructional practices.
2. As teacher preparation programs work to establish more dialogic approaches to feedback
that provide pre-service teachers with multiple opportunities to reflect individually and
collaboratively with university faculty, timing, purpose, and delivery are necessary
considerations.
3. What is the timing of the delivery? The timing of the delivery of feedback must be
considered. Frequency plays a large role in how PSTs view and utilize feedback.
4. Do receivers of the feedback understand the purpose? Ties to evaluation and the need for
directive solutions impact preservice teachers understanding of the purpose behind the
feedback. One way to support this need it to strengthen PSTs’ assessment feedback
literacy.
5. Does the delivery clarify the content and support reflection? As university faculty continue
to explore how to provide explicit feedback, delivery methods that support reflection and
pre-service teacher’s growth are important to consider. With the purpose of feedback
being to help reduce the discrepancy between the intended goal and outcome, pre-service
teachers must have easy access and retrieval of feedback.
Keywords
Preservice teaching, feedback literacy, assessment, teacher preparation
This article is available in Journal of University Teaching & Learning Practice: https://ro.uow.edu.au/jutlp/vol18/iss8/
14
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Preservice Teachers’ Perceptions of Feedback: The Importance of
Timing, Purpose, and Delivery
The goal of teacher preparation is simple: create teachers who are well equipped with the
knowledge and skills to positively impact preschool through high school students. Field
experiences are embedded throughout teacher preparation programs to provide pre-service
teachers (PSTs) with meaningful opportunities to develop their ability and knowledge of effective
instructional practices. Practicum experiences in classrooms give PSTs opportunities to practice
specific pedagogies with students and refine their abilities in real time (Cheng, et al., 2012). It is
critically important for PSTs to experience the teaching process to develop pedagogical and
reflective skills as well as teacher efficacy (Darling-Hammond, 2012; Liakopoulou, 2012;
McGlamery & Harrington, 2007). These structured experiences can bridge understanding on how
to apply feedback and make connections in the context of a school setting (Flushman, et al., 2019).
This practice builds confidence in effectively delivering instruction and managing challenges that
occur in the learning environment.
If the purpose of feedback is to reduce the discrepancy between the established goal and what is
recognized (Hattie and Timperley, 2007), then how can this discrepancy be minimized through
support and guidance? Feedback is instrumental to a PSTs development during their teacher
preparation program and learning is optimized “when they receive systematic instruction, have
multiple practice opportunities and receive feedback that is immediate, positive, corrective and
specific (Scheeler et al., 2004, p. 405). It is important to guide PSTs to interpret their experiences
in authentic settings (Schwartz et al., 2018) and to support the development of effective teaching
practices (Hammerness et al., 2005). Constructive feedback coupled with reflective opportunities
allow the PST to distinguish effective classroom practices from those that are not (Hudson, 2014;
1
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Pena & Almaguer, 2007). “Good quality external feedback is information that helps students
troubleshoot their own performance and self-correct: that is, it helps students take action to reduce
the discrepancy between their intentions and the resulting effects” (Nicol & Macfarlane-Dick,
2006, p. 208). For feedback to be integrated effectively, it needs to be timely, specific, and
accessible to encourage the individual to apply what they learned in future teaching opportunities
(Van Rooij et al., 2019). This is correlational to self-efficacy.
Feedback can also be a significant source of self-efficacy in pre-service teachers (Mulholland &
Wallace, 2001; Mahmood et al., 2021; Schunk & Pajares, 2009). Though feedback can come in a
variety of formats, Rots et al. (2007) found that quality feedback and supervision provided by
university faculty correlated to higher levels of self-efficacy in pre-service teachers. Efficacy
increases when university faculty use prompts to encourage PSTs to focus on what went well and
build upon the strengths of the lesson (Nicol & Macfarlane-Dick, 2006). Timing, purpose, and
delivery play an important role in how faculty use feedback practices with pre-service teachers.
In many current teacher preparation program models, PSTs spend more time working in the field
than they do in coursework (National Council for Accreditation of Teacher Education [NCATE],
2010). With such an emphasis placed on practicum experiences (American Association of
Colleges of Teacher Education [AACTE], 2018; Lester & Lucero, 2017) and the critical role these
play in the development of pre-service teachers, one must consider if current feedback practices
and structures positively contribute to higher levels of teacher efficacy and classroom readiness.
The role of university faculty is to acknowledge and clearly articulate the strengths and
weaknesses of the lesson to promote productive behaviors that will positively contribute to student
learning (Fletcher, 2000). This gap in the research does not include preservice teacher
perceptions. Therefore, it is imperative to consider the perception of pre-service teachers regarding
https://ro.uow.edu.au/jutlp/vol18/iss8/14
2
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
their experiences with feedback, how these experiences align with high quality feedback practices,
and how they are designed for students who experience them (Smith and Lowe, 2021).
This qualitative study examines first year teachers’ previous experiences with feedback during
their undergraduate practicums. The study is expected to contribute to a deeper understanding of
what feedback practices pre-service teachers determine as beneficial and their interpretation of the
context, in addition to what action steps or modifications teacher preparation programs can take to
maximize feedback practices within practicum experiences.
The Purpose of Feedback
Feedback has often functioned as a punisher or reinforcer, a guide or rule, or served as a
discriminating or motivating stimulus for individuals (Mangiapanello & Hemmes, 2015).
Historically feedback has been a one-way transmission of information (Ajjawi & Boud, 2017), but
contemporary views on feedback recognize it as a reciprocal exchange between individuals
focused on knowledge building versus the arbitrary delivery of information (Archer 2010).
Daniels & Bailey (2014) defined performance feedback as, “information about performance that
allows a person to change his/her behavior” (p. 157). Studies show organizations that establish
strong feedback environments exhibit better outcomes in terms of employee performance
(Steelman et al., 2004). Constructive feedback in the presence of a well built feedback hierarchy,
builds intrinsic motivation of employees (Cusella, 2017; The Employers Edge, 2018). With that
explanation, appropriate and meaningful feedback are essential in ensuring that good practices are
rewarded, ineffective practices corrected and pathways to improvement and success identified
(Cleary & Walter, 2010).
3
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
A key purpose of feedback in teacher preparation programs is to enhance pre-service teachers’
knowledge and skills (AACTE, 2018). Feedback serves as one component within complex
structures and interactions to support PSTs’ development (Evans, 2013). Through feedback, PSTs
realize their strengths and weaknesses, gain understanding of instructional methods, and develop a
repertoire of strategies to enhance their performance and student learning (Nicole & MacfarlaneDick, 2006). With this knowledge and understanding, PSTs have opportunities to act upon the
received feedback to improve their performance and enhance student learning (Carless et al.,
2011). Feedback allows PSTs to define effective teaching practices and determine what
instructional methods are valued in specific learning environments.
Feedback is also meant to stimulate PST’s self-reflection. Feedback allows the pre-service teacher
to deconstruct and reconstruct instructional methods and practices with guidance from university
faculty. Specific feedback and reflective dialogue contribute to the pre-service teacher’s ability to
critically reflect on their performance individually and use this understanding and knowledge to
regulate future teaching experiences (Tulgar, 2019). These reflective opportunities to identify
strengths and weaknesses create pathways to improvement.
Feedback can also serve as a way for university faculty to monitor, evaluate and track pre-service
teacher’s progress and performance (Price et al., 2010). Many teacher preparation programs use
feedback as a measure in evaluating PST performance during practicums or other field-based
components. This feedback, often documented through rubrics or other assessment criteria, is
useful in helping establish measurable goals and effective teaching practices across a teacher
preparation program. When the feedback or assessment tools reflect the objectives and goals of the
program, they can strengthen the connection between theory and practice, thereby increasing PST
learning (Ericsson, 2002; Grossman, et al., 2008; Vasquez, 2004). PSTs rely on experienced
https://ro.uow.edu.au/jutlp/vol18/iss8/14
4
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
individuals such as university faculty to articulate, model and provide high quality feedback
through practicums (Darling-Hammond & MacDonald, 2000). This guidance increases
connections between coursework and the classroom.
With research suggesting that pre-service teachers welcome constructive feedback and the
opportunity to learn (Chaffin & Manfredo, 2009; Chesley & Jordan, 2012), university faculty must
seek collaborative opportunities to provide effective feedback that positively contributes to the
development of PSTs. A major role of university faculty is to guide the PST in setting goals for
practicum that foster their development and growth as an educator. When university faculty
clearly articulate the strengths and weaknesses of the lesson and assist the PST in identifying their
next actions, outcomes can be achieved faster.
Components of Effective Feedback
Effective feedback provides the learner with a clear understanding of how the task is being
accomplished or performed and offers support and direction in increasing their efforts to achieve
the desired outcome (Hattie and Timperley, 2007). This model reinforces the need for feedback to
be timely, content specific ,and delivered to meet the needs of the individual receiving it.
Timing
The timing of feedback plays an essential role in shaping PSTs understanding of effective teaching
practices and effective instructional methods. Feedback can be provided to PSTs in a variety of
structures and formats. Deferred feedback refers to notes or qualitative data collected when
observing shared upon completion of the lesson with the teacher (Scheeler et al., 2009). Deferred
5
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
feedback is less intrusive because it allows the teacher to deliver the lesson without disruption.
Immediate feedback refers to when university faculty stop the lesson or instructional activity being
observed to provide corrective feedback and/or modeling when a problem is noted (Scheeler et al.,
2009). Scheeler et al. (2004) found “targeted teaching behaviors were acquired faster and more
efficiently when feedback was immediate” (p. 403). Immediate feedback also reduced the
likelihood of teachers continuing ineffective teaching practices.
Explicit, Quality Feedback
Corrective feedback that identifies errors and ineffective teaching methods with targeted ways to
correct them is one of the most influential means of feedback (Chan et al., 2014; Van Houten,
1980). Studies found that desired teacher behaviors resulted from feedback that was both positive
and corrective, focused on specific teaching behaviors and practices, and provided concise
suggestions for change (Scheeler et al., 2004; Woolfolk, 1993). Feedback that is individualized
and centered on the needs of the individual yields more effective outcomes for learning (Ciman &
Cakmak, 2020; Pinger et al., 2018). When this aligns to the goals and objectives of the specific
lesson, it provides valuable insight as to where the PST is in relation to the goal (Bloomberg &
Pitchford, 2017). This type of feedback increases self-efficacy as it allows the PST to see growth
over time.
Delivery
The delivery of observational feedback may vary depending on the development and readiness of
the PST. Although the goal is for teachers to engage in self-directed reflection, some teachers may
need more support and guidance as they maneuver through the dimensions and complexities of
teaching. A variety of differentiated coaching strategies have been researched over the years
regarding instructional practice and student learning (Aguilar, 2013; Costa & Garmston, 2002;
https://ro.uow.edu.au/jutlp/vol18/iss8/14
6
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Knight, 2016; Sweeney, 2010). These include both conversational and written feedback between
the PST and university faculty.
The New Teacher Center (2017) outlines three differentiated dialogic coaching approaches;
instructive, collaborative, and facilitative. Instructive coaching is directive and guided by the
university faculty who analyze performance and lead conversations. Collaborative coaching is less
directive and both the PST and university faculty have an equal voice in the conversation.
Facilitative coaching allows the teacher to lead the reflective conversation, while university
faculty provides feedback with probing questions to facilitate critical thinking and problem
solving. These conversations contain minimal feedback from university faculty and topics for
discussion are often directed by the teacher.
While oral feedback is a powerful tool in constructing relationships between the PST and
university faculty, written feedback is just as important as it provides pre-service teachers with
formal documentation of clearly articulated strengths and weaknesses. Written comments are far
more effective than a grade or evaluation (Black & Wiliam, 1998; Crooks, 1988) and provide both
the university faculty and the PST with a record of performance in response to learning needs
(Flushman et al., 2019). Conversation and dialogue include the thoughts and beliefs of the PST
and provide faculty an opportunity to gauge their depth of understanding. Written support
provides documentation and a reference for PSTs.
Methodology
7
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
This study looks to uncover how university faculty can effectively integrate high quality feedback
practices into practicum experiences. Specifically, what can be learned from PSTs’ perceptions of
feedback practices utilized in teacher preparation programs? What modifications or adaptations
can be made to current feedback practices and structures in teacher preparation programs to
enhance teacher efficacy and classroom readiness? In the context of this study, not only were
PSTs’ experiences with feedback considered, but also how these experiences and perceptions align
with high quality feedback practices.
Design and Participants
Researchers used semi-structured interviews to provide a comparison of qualitative data and an
opportunity for open ended questioning (Yin, 2016). The 30-minute interviews were recorded and
transcribed for analysis in Fall 2020. Participation was voluntary and researchers used purposeful
sampling (Yin, 2016) from a pool of participants in their first year of teaching. Researchers
selected beginning teachers because they are most relative to the practicum experiences since they
are recent graduates. Additionally, all participants experienced the same interruptions in teaching
during March 2020. Researchers sought a range of participant perspectives; therefore, the study
consisted of 31 beginning teachers who spanned seven school districts and 24 schools within a
midwestern metropolitan environment. All teachers held a bachelor’s degree and teaching
certification from a 4-year university or college. Representation included two private institutions
and three public institutions. All participants were female apart from one male. Grade levels
spanned preschool through eighth grade with five special education perspectives spanning grades
preschool through sixth grade. The school districts are in one state and serve approximately onethird of their state’s total student population (over 100,000 students). Demographic information is
presented in Table 1.
https://ro.uow.edu.au/jutlp/vol18/iss8/14
8
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Table 1
Characteristics of Participants
Teaching Endorsement
Teachers
N = 31 teachers
PreK-K
5
First - Third
10
Fourth - Sixth
8
Middle School
3
Special Education
5
Teaching Environment
District Representation
N = 7 districts, 24 schools
Suburban
51%
Rural
6%
Urban
42%
Data Collection & Analysis
Questions asked during the interviews addressed previous experiences with feedback during
practicums. Application was also addressed in reference to how it influenced teaching behaviors
and actions. More than one researcher took part in the collection, analysis, and interpretation of the
data. Both researchers were involved in the preparation of the questions and in the data analysis.
Using descriptive analysis to interpret the data obtained from the semi structured interviews,
researchers identified themes using the following process to construct theory: 1) review of the
transcribed interviews, 2) open coding, 3) identification of categories and/or themes, and 4) data
abstraction (Lawrence & Tar, 2013). Since researcher one conducted the interviews, researcher
two reviewed all the transcripts to familiarize themself with the content. Next, open coding
9
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
determined themes in participant answers. Patterns in the data showed consistency in ideas
(Eisenhardt, 1989; Orlikowski, 1993) and researchers identified overall themes amongst the
answers. Once established, researchers coded the remaining transcripts independently. Since
coding semi structured interviews involves determining the intent or meaning behind questions
answered, researchers also addressed intercoder reliability and agreement (Campbell et. al., 2013).
Both noted the same themes with only 20% discrepancy or 80% agreement. Using negotiated
agreement, researchers adjudicated the coding disagreements through negotiation for concordance.
After reconciling the initial disagreements, researchers coded the transcripts using the identified
themes. Inter-rater reliability was 97%.
Results
Results indicated three themes. All stemmed from participant perspectives of beneficial practices
and what they found value within or wanted more of during their PST experiences. Out of 31
participants, 29 were coded with at least one of the three themes. Participants who mentioned
more than one theme were counted as part of each theme mentioned; 11 of the 31 mentioned more
than one identified theme. See Table 2.
Table 2
Themes found in the feedback
Beneficial Practice
Frequency and structure of the feedback
Percent (n = 44)
40%
Example Comment: This respondent reflected on the difference between a few visits and multiple. “Let me come
observe you and give you tips here and there” as compared to someone providing feedback multiple times a
week.
The need for explicit and quality feedback
https://ro.uow.edu.au/jutlp/vol18/iss8/14
30%
10
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Example Comment: This respondent reflected on how grace and time are not always the most beneficial. My
institution “just gave a lot of grace and comfort and even during student teaching … I really enjoy getting told
what I can improve on because there’s always room for improvement and I like the different ideas.”
The need for conversation linked to feedback
30%
Example Comment: The respondent believed that “conversations more focused on do you think the students
understood the concept? How do you feel that it went?” would help PSTs engage in daily reflective practice and
goal setting.
Timing
Frequency was the most cited need at 40% and noted by 55% (n = 17) of respondents.
Overwhelmingly, participants referred to the feedback received as pre-service teachers as
“minimal”. Other phrases included “too spaced out”, “lumped together at the end” and “few”.
Multiple participants mentioned having only been provided feedback following an observation
only once or twice. Even when the feedback provided the next steps towards improvement,
participants still felt it was too late. “It’s like … now I can’t implement that until next semester” or
“Here’s the feedback. Remember when you get a job.” Participants felt the timing of the feedback
negatively affected the implementation. They wanted more consistency with small tips in real time
throughout the experience.
Explicit, Quality Feedback
A need for explicit and quality feedback was cited next at 30% and noted by 42% (n = 13) of
respondents. “I always like it straight forward. I want all of the feedback that I can get because I
feel like that's going to help me grow”. Another noted that they wanted specific feedback on areas
to improve instead of “a lot of grace and comfort.” They additionally noted building confidence
without the skills to back it, does not lead to improvement. Another commented that university
faculty was “really really nice but the feedback was all positive like she was kind of scared to give
constructive feedback.” One commented how she thought the feedback would provide her things
11
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
to work on, but instead the feedback was “you’re doing what you’re supposed to be doing.”
Participants wanted feedback to provide more direction and insight to enhance instructional
performance. Feedback only highlighting the positive aspects or acknowledging “no room for
growth” was not useful or beneficial. One respondent noted, I “hardly ever sat down to discuss
how I was doing. It was more in passing that the feedback took place.” This led to the third theme.
Delivery
A need for conversation linked to feedback was cited next at 30% and noted by 42% (n = 13) of
responders. Tied to this conversation was the need for explicit feedback mentioned above.
Participants struggled with the broad categories on rubrics which highlight multiple behaviors. “I
feel like not all rubric feedback is accurate”. This led some to request more specific targets. They
felt this could be reached through reflective conversations. One noted the importance of the
conversation when helping PSTs reflect on practice and setting goals. The respondent believed
that “conversations more focused on do you think the students understood the concept? How do
you feel that it went?” would help PSTs engage in daily reflective practice and goal setting. Others
noted how conversations allowed for “collaboration and brainstorming” and how conversations
better support the reflection process. Dialogue can be beneficial in the moment and authentic,
although it was noted that written conversation and feedback can be just as powerful when open
ended and used as a communication tool.
Participants noted the importance of written feedback as it provided opportunities to reflect and
respond. Also, it gave participants insight and context as to what was happening while they were
teaching. “I don't realize everything good that I'm doing or what I need to improve on. So, when
university faculty take notes, it really helps me see what I'm actually doing.” Another talked about
university faculty keeping a notebook. The two used it as a communication tool for written
https://ro.uow.edu.au/jutlp/vol18/iss8/14
12
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
conversations which the participant “thought was really helpful because … I can look back and
see what she wrote, and I feel like it was a little more immediate.”
The results indicated that PSTs believe that timely and explicit feedback are beneficial in both
goal setting and enhancing their instructional performance. Results also indicated that PSTs find
both dialogue and written feedback to be useful reflective tools. As teacher preparation programs
consider feedback structures and the levels of support, these are important implications to consider
when creating meaningful practicum experiences.
Discussion
Reflection is an expectation in teacher preparation (Brookfield, 1995; Darling-Hammond, 2006;
Liu, 2013). The link between reflection and learning is not new (Dewey 1933; Schön, 1983;
Ziechner, 1996) as studies highlight that reflection involves emotions and is a context-dependent
process impacted by social constructs. PSTs are expected to recognize when adjustments are
needed and make them to effectively meet the needs of the students they serve. A cycle of
observation, action, and reflection can help PSTs adjust their teaching. This is most effective when
the cycle is individualized, collaborative, and embeds frequent opportunities to make meaning of
the information for future use (Vartuli, et al., 2014). Current feedback loops and structures can
inhibit PSTs' ability to make meaning from the information and move their learning and
instruction forward. As teacher preparation programs work to establish more dialogic approaches
to feedback that provide PSTs with multiple opportunities to reflect individually and
collaboratively with university faculty, timing, purpose, and delivery are necessary components to
consider. See Figure 1.
13
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Figure 1
Feedback Structure for Pre-service Teachers
What is the timing of the delivery?
When considering the results, frequency plays a large role in how PSTs view and utilize feedback.
It was clear that PSTs desire more frequent, immediate feedback to enhance their instructional
performance. Immediate feedback results in quicker acquisition of effective teacher behaviors and
greater overall accuracy in the implementation of those behaviors than when delayed feedback is
provided (Coulter & Grossen, 1997; O’Reilly et al., 1992; 1994). Though some question if
immediate feedback might interfere with the learning environment and reduce instructional
https://ro.uow.edu.au/jutlp/vol18/iss8/14
14
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
momentum, advancements in technology make the ability to provide immediate feedback both
manageable and efficient for both university faculty and pre-service teachers. Devices such as the
“bug in the ear” (BIE) have been used to provide immediate feedback in a variety of situations.
Results from various studies show these technologies effectively supported university faculty in
providing concise, immediate feedback to pre-service teachers to increase their ability to respond
to the various needs of students and alter or stop ineffective practices in the moment (Coulter &
Grossen, 1997; Scheeler et al., 2009). As teacher preparation programs consider how to increase
efforts for university faculty to provide specific, immediate feedback, technical devices have great
potential to increase desired teaching behaviors and students’ academic performance.
Do receivers of the feedback understand the purpose?
Pre-service teachers request explicit, quality feedback, but there is a clear disconnect between this
concept and the PSTs perceptions of the purpose of the feedback provided. The ties to evaluation
and the need for directive solutions will not change, so how can mindsets shift to better understand
the purpose? One way to do this is through strengthening PSTs’ assessment feedback literacy.
PSTs need opportunities and a repertoire of skills to engage with feedback in authentic ways,
make sense of the information provided, and determine how the information can be productively
implemented in future lessons (Carless & Boud, 2018; Price et al., 2010; Smith and Lowe, 2021).
Feedback literacy can strengthen reflective capacity as students have more opportunities to
engage, interact with, and make judgments about their own practice (Carless & Boud, 2018;
Sambell, 2011; Smith and Lowe, 2021). To close the feedback loop, PSTs must acquire the ability
to process the comments and information received and then act upon the feedback for future
instruction. Students must learn to appreciate feedback and their role in the process, develop and
refine their ability to make judgements, and develop habits that strive for continuous improvement
(Boud & Molloy, 2013). Designing a program curriculum that emphasizes the importance of the
15
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
feedback process and creates opportunities for pre-service teachers to self-evaluate their practice is
crucial in building capacity for them to make sound judgments. Equally as important is creating
space for pre-service teachers to co-construct meaning of the feedback and demonstrate how they
use the information to inform or enhance future instruction (Carless & Boud, 2018; O’Donovan et
al., 2016). Building programs grounded in feedback literacy provide opportunities to critically
reflect on choices and draw clear connections between feedback and its purpose.
Does the delivery clarify the content to support reflection?
Another consideration worth noting is the need for feedback that prompts both reflection and
growth of pre-service teachers. Participants in this study indicated that feedback from university
faculty was not always useful because it could not be applied immediately. They also noted the
feedback provided did not always prompt reflection that resulted in changes or modifications to
their future instructional practices or teaching methods. While this discrepancy could be attributed
to the readiness level of the pre-service teacher, it could also be that the feedback loops and
structures designed do not create informative pathways that move students learning forward.
As university faculty continue to explore how to provide explicit feedback, delivery methods that
support reflection and pre-service teacher’s growth are important to consider. With the purpose of
feedback being to help reduce the discrepancy between the intended goal and outcome, pre-service
teachers must have easy access and retrieval of feedback. While we know that reflective coaching
conversations are beneficial in helping pre-service teachers reflect on their teaching practices and
to determine alternate methods of instruction that may be more effective, time and availability of
university faculty may limit these meaningful interactions from taking place. To overcome this
barrier, teacher preparation programs should consider how they might couple traditional forms of
written feedback and reflective conversations with digital tools that facilitate collaborative
https://ro.uow.edu.au/jutlp/vol18/iss8/14
16
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
discussion and grant easier access to feedback allowing pre-service teachers space and opportunity
to engage in both collaborative and independent reflection and problem solving. Providing preservice teachers with multiple sources of feedback can be a way to increase the visibility of
feedback for pre-service teachers and encourage them to consistently revisit the information to
make future instructional decisions and professional judgments.
Implications
Current literature highlights the gap between providing feedback and the receiver’s interpretation
(O’Connor & McCurtin, 2021). This gap creates growth limitations when the learner is not
gaining what is needed from the feedback. This is especially important in higher education as
institutions develop students for professional careers which require lifelong learning, critical
thinking and problem solving, such as education. Therefore, we propose the following framework
and action steps to support the understanding of and implementation of feedback for PSTs. We
also assert that this framework could span multiple disciplines and professional contexts.
17
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Figure 2
Framework to Support Pre-service Teacher Capacity Building for Feedback
Limitations and Implications for Future Research
Although the results of this study provide insight into PSTs feedback experiences, they must be
interpreted within the limitations of the study. The first limitation is that all participants in this
study only represent 5 universities across 3 states. We recognize that this limitation in our sample
does not represent the scope of teacher preparation programs across the country but believe that
the results provide worthwhile insights into PSTs experiences with feedback in practicum
experiences. Future studies including participants across numerous states and teacher preparation
programs would allow for more diverse experiences and perspectives to be represented.
https://ro.uow.edu.au/jutlp/vol18/iss8/14
18
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Another limitation in this study is that all participants experienced disruptions in their
undergraduate practicum experiences. These disruptions likely resulted in condensed or altered
experiences which could have impacted the opportunities and quality of feedback provided by
university faculty. Future studies that include participants whose experiences consist of traditional
structures and timelines of practicum experiences may better reflect the experiences of PSTs'
experiences with feedback and practices used by university faculty.
Conclusion
Teacher preparation institutions need to reevaluate current feedback practices with PSTs.
Participants indicated that more frequent conversations would make guidance more explicit and
support development of practice and reflection. Although this is based on a limited number of
participants and in one country, the findings are generalizable in most countries. The concept of
feedback literacy needs to be taught, modeled, and PSTs need to be practicing it throughout their
course of study for them to better understand the connection between feedback and practice. By
focusing on timing, delivery, and purpose, teacher preparation institutions can take one step closer
to developing reflective practitioners who embody the knowledge and skills to positively impact
learning for every student.
19
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
References
Aguilar, E. (2013). The art of coaching: Effective strategies for school transformation. Wiley.
Ajjawi, R., & Boud, D. J. (2017). Researching feedback dialogue: an interactional analysis
approach. Assessment and Evaluation in Higher Education, 42(2), 252–265.
https://doi.org/10.1080/02602938.2015.1102863
American Association of Colleges of Teacher Education [AACTE] Clinical Practice Commission
(2018). A pivot toward clinical practice, its lexicon, and the renewal of teacher
preparation. Retrieved from https://aacte.org/resources/ clinical-practicecommission#related-resources
Archer, J. C. (2010). State of the science in health professional education: Effective feedback.
Medical Education, 44(1), 101–108. https://doi.org/10.1111/j.1365-2923.2009.03546.x.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education:
Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
Bloomberg, P., & Pitchford, B. (2017). Leading impact teams: Building a culture of efficacy.
Corwin.
Boud, D., & Molloy, E. (Eds.). (2013). Feedback in higher and professional education:
understanding it and doing it well. Routledge.
Brookfield, S. D. (1995). Becoming a critical reflective teacher. Jossey-Bass Publishers.
Campbell, J. L, Quincy, C., Osserman, J., & Pedersen, O. K. (2013). Coding in-depth semi
structured interviews: Problems of unitization and intercoder reliability and agreement.
Sociological Methods & Research, 42(3), 294-320.
https://doi.org10.1177/0049124113500475
Carless, D., Salter, D., Yang, M., & Lam, J. (2010). Developing sustainable feedback practices.
Studies in Higher Education, 36(4), 395-407.
https://doi.org/10.1080/03075071003642449
https://ro.uow.edu.au/jutlp/vol18/iss8/14
20
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Carless, D., & Boud, D. (2018). The development of student feedback literacy: enabling uptake of
feedback. Assessment & Evaluation in Higher Education, 43(8), 1315-1325.
https://doi.org/10.1080/02602938.2018.1463354
Chaffin C., & Manfredo J. (2009). Perceptions of preservice teachers regarding feedback and
guided reflection in an instrumental early field experience. Journal of Music Teacher
Education, 19(2), 57-72. https://doi.org/10.1177/1057083709354161
Chan, P. E., Konrad, M., Gonzalez, V., Peters, M. T., & Ressa, V. A. (2014). The critical role of
feedback in formative instructional practices. Intervention in School and Clinic, 50(2),
96-104. https://doi.org/10.1177/1053451214536044
Chesley, G. M., & Jordan, J. (2012). What’s missing from teacher prep. Educational Leadership,
69(8), 41-45.
Cheng, M. M., Tang, S. Y., & Cheng, A. Y. (2012). Practicalising theoretical knowledge in
student teachers' professional learning in initial teacher education. Teaching and Teacher
Education, 28(6), 781-790. https://doi.org/10.1016/j.tate.2012.02.008
Cimen, O., & Cakmak, M. (2020). The effect of feedback on preservice teachers’ motivation and
reflective thinking. Elementary Education Online, 19(2), 932943. https://doi.org/10.17051/ilkonline.2020.695828
Cleary, M. L., & Walter, G. (2010). Giving feedback to learners in clinical and academic settings:
Practical considerations. The Journal of Continuing Education in Nursing, 41(4), 153154. https://doi.org/10.3928/00220124-20100326-10
Costa, A. L., & Garmston, R. (2002). Cognitive coaching: A foundation for renaissance schools.
Christopher-Gordon Publishers.
Coulter, G. A., & Grossen, B. (1997). The effectiveness of in-class instructive feedback versus
after-class instructive feedback for teachers learning direct instruction teaching behaviors.
Effective School Practices, 16(4), 21–35.
21
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of
Educational Research, 58(4), 438-481. https://doi.org/10.3102/00346543058004438
Cusella, L., (2017). The effects of feedback on intrinsic motivation: A propositional
extension of cognitive evaluation theory from an organizational communication
perspective. Annals of the International Communication Association, 4(1), 367-387.
https://doi.org/10.1080/23808985.1980.11923812
Daniels, A. C., & Bailey, J. S. (2014). Performance management: Changing behavior that drives
organizational effectiveness (5th ed.). Atlanta, GA: Performance Management
Publications.
Darling-Hammond, L. (2012). Powerful teacher education: Lessons from exemplary programs.
John Wiley & Sons.
Darling-Hammond, L. (2006). Powerful teacher education. San Francisco: Jossey-Bass.
Darling-Hammond, L., & MacDonald, M. (2000). Where there is learning there is hope: The
preparation of teachers at the Bank Street College of Education. In L. Darling-Hammond
(Ed.), Studies of excellence in teacher education: Preparation at the graduate level (1-95).
American Association of Colleges for Teacher Education.
Dewey, J. (1933). How we think: A restatement of the relation of reflective thinking to
the educative process. Henry Regnery.
Eisenhardt, K. M., (1989). Building theories from case study research. Academy of Management
Review, 14(4), 532-550. www.jstor.org/stable/258557
Ericsson, K. A. (2002). Attaining excellence through deliberate practice: Insights from the study
of expert performance. In M. Ferrari (Ed.), The pursuit of excellence in education (pp.
21-55). Erlbaum.
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of
Educational Research, 83(1), 70-120. https://doi.org/10.3102/0034654312474350
https://ro.uow.edu.au/jutlp/vol18/iss8/14
22
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Fletcher, S. (2000). Mentoring in schools: A handbook of good practice. Kogan Page.
Flushman, T., Guise, M., & Hegg, S. (2019). Improving supervisor written feedback: Exploring
the what and why of feedback provided to pre-service teachers. Issues in Teacher
Education, 28(2), 46–66.
Grossman, P., Hammerness, K., & McDonald, M. (2008). Redefining teaching, re-imagining
teacher education. Teachers and Teaching: Theory and Practice, 15(2), 273-289.
https://doi.org/10.1080/13540600902875340
Hammerness, K., Darling-Hammond, L., Bransford, J., Berliner, D., Cochran-Smith, M.,
McDonald, M., & Zeichner, K. (2005). How teachers learn and develop. In L. DarlingHammond & J. Bransford (Eds.), Preparing teachers for a changing world: What
teachers should learn and be able to do (358-389). Jossey-Bass.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77
(1), 81-112. https://doi.org/10.3102/003465430298487
Hudson, P. (2014). Feedback consistencies and inconsistencies: Eight mentors’ observations on
one preservice teacher’s lesson. European Journal of Teacher Education, 37(1), 63–73.
https://doi.org/10.1080/02619768.2013.801075
Killion, J. (2015). Attributes of an effective feedback process. In: The feedback process:
Transforming feedback for professional learning. Oxford, Ohio: Learning Forward.
Knight, J. (2016). Better conversations: Coaching ourselves and each other to be more credible,
caring, and connected. Corwin.
Lawrence, J., & Tar, U. (2013). The use of grounded theory technique as a practical tool for
qualitative data collection and analysis. The Electronic Journal of Business Research
Methods, 11(1), 29-40.
Liakopoulou, M. (2012). The role of field experience in the preparation of reflective teachers.
23
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Australian Journal of Teacher Education, 37(6), 42-54.
https://doi.org/10.14221/ajte.2012v37n6.4
Liu, K. (2013). Critical reflection as a framework for transformative learning in teacher
education. Educational Review, 67(2), 135-157.
https://doi.org/10.1080/00131911.2013.839546
Lester, A., & Lucero, R. (2017). Clinical practice commission shares proclamations, tenets at
AACTE forum. Ed Prep Matters. http://edprepmatters.net/2017/04/clinical-practicecommission-shares-proclamations-tenets-at-aacte-forum/
Mahmood, S., Mohamed, O., Mustafa, S. M. B. S., & Noor, Z. M. (2021). The influence of
demographic factors on teacher-written feedback self-efficacy in Malaysian secondary
school teachers. Journal of Language and Linguistic Studies, 17(4).
Mangiapanello, K., & Hemmes, N. (2015). An analysis of feedback from a behavior analytic
perspective. The Behavior Analyst, 38(1), 51–75. doi:10.1007/s40614-014-0026-x.
McGlamery, S., & Harrington, J. (2007). Developing reflective practitioners: The importance of
field experience. The Delta Kappa Gamma Bulletin, 73(3), 33-45.
Mulholland, J., & Wallace, J. (2001). Teacher induction and elementary science teaching:
Enhancing self-efficacy. Teaching and Teacher Education, 17(2), 243–261.
https://doi.org/10.1016/s0742-051x(00)00054-8
National Council for Accreditation of Teacher Education. (2010). Transforming
teacher education through clinical practice: A national strategy to prepare effective
teachers. Retrieved from
http://www.ncate.org/LinkClick.aspx?fileticket=zzeiB1OoqPk%3d&tabid=715
New Teacher Center (2017). Instructional Mentoring. Retrieved from
https://newteachercenter.org/.
Nicol, D. J,. & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A
https://ro.uow.edu.au/jutlp/vol18/iss8/14
24
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
model and seven principles of good feedback practice. Studies in Higher Education,
31(2), 199-218. https://doi.org/10.1080/03075070600572090
O’Connor, A., McCurtin, A. A feedback journey: employing a constructivist approach to the
development of feedback literacy among health professional learners. BMC Med Educ
21, 486 (2021). https://doi.org/10.1186/s12909-021-02914-2
O’Donovan, B., Rust, C., & Price, M. (2016). A scholarly approach to solving the feedback
dilemma in practice. Assessment & Evaluation in Higher Education, 41(6), 938-949.
https://doi.org/10.1080/02602938.2015.1052774
O'Reilly, M. F., Renzaglia, A., & Lee, S. (1994). An analysis of acquisition, generalization and
maintenance of systematic instruction competencies by preservice teachers using
behavioral supervision techniques. Education and Training in mental Retardation and
Developmental disabilities, 29(1), 22-33. https://www.jstor.org/stable/23879183
O'Reilly, M. F., Renzaglia, A., Hutchins, M., Koterba-Buss, L., Clayton, M., Halle, J. W., & Izen,
C. (1992). Teaching systematic instruction competencies to special education student
teachers: An applied behavioral supervision model. Journal of the Association for
Persons with Severe Handicaps, 17(2), 104-111.
https://doi.org/10.1177/154079699201700205
Orlikowski, W. J. (1993). CASE tools as organizational change: Investigating incremental and
radical changes in systems development. MIS Quarterly, 17(3), 309-340.
https://doi.org/10.2307/249774
Rots, I., Aelterman, A., Vlerick, P., & Vermeulen, K. (2007). Teacher education, graduates’
teaching commitment and entrance into the teaching profession. Teaching and Teacher
Education, 23(5), 543–556. https://doi.org/10.1016/j.tate.2007.01.012
Pena, C., & Almaguer, I. (2007). Asking the right questions: online mentoring of student teachers.
International Journal of Instructions Media, 34(1), 105-113.
25
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
Pinger, P., Rakoczy, K., Besser, M. & Klieme, E. (2016). Implementation of formative assessment
– effects of quality of programme delivery on students’ mathematics achievement and
interest. Assessment in Education: Principles, Policy & Practice, 25(2), 160-182.
https://doi.org/10.1080/0969594x.2016.1170665
Price, M., Handley, K., Millar, J. & O’Donovan, B. (2010). Feedback: all that effort, but what is
the effect? Assessment & Evaluation in Higher Education, 35(3), 277-289.
https://doi.org/10.1080/02602930903541007
Sambell, K. (2011). Rethinking feedback in higher education. ESCalate.
Scheeler, M. C., Ruhl, K. L., & McAfee, J. K. (2004). Providing performance feedback to
teachers: A review. Teacher Education and Special Education: The Journal of the
Teacher Education Division of the Council for Exceptional Children, 27(4), 396-407.
Scheeler, M. C., Bruno, K., Grubb, E., & Seavey, T. L. (2009). Generalizing teaching techniques
from university to K-12 classrooms: Teaching preservice teachers to use what they learn.
Journal of Behavioral Education, 18(3), 189-210. https://doi.org/10.1007/s10864-0099088-3
Schön, D. A. (1983). The reflective practitioner. Basic Books.
Schwartz, C., Walkowiak, T. A., Poling, L., Richardson, K., & Polly, D. (2018). The nature of
feedback given to elementary student teachers from university supervisors after
observations of mathematics lessons. Mathematics Teacher Education & Development,
20(1), 62–85.
Schunk, D., & Pajares, F. (2009). Self efficacy theory. In Handbook of Motivation at School (pp.
35–54). New York: Routledge.
Smith, M., & Lowe, C. (2021). DIY assessment feedback: Building engagement, trust and
transparency in the feedback process. Journal of University Teaching and Learning
Practice, 18(3), 9-14. https://doi.org/10.53761/1.18.3.9
https://ro.uow.edu.au/jutlp/vol18/iss8/14
26
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery
Steelman, L., Levy, P., & Snell, A., (2004). The feedback environment scale: Construct definition,
measurement and validation. Educational and Psychological Measurement, 64(1), 165184.
Sweeney, D. R. (2010). Student-centered coaching: A guide for K-8 coaches and principals.
SAGE Publications.
The Employers Edge, (2018). Feedback to boost motivation. Retrieved
fromhttp://www.theemployersedge.com/providing-feedback/
Tulgar, A. (2019). Four Shades of Feedback: The Effects of Feedback in Practice Teaching on
Self-Reflection and Self-Regulation. Alberta Educational Journal of Research, 65(3),
258-277.
Van Houten, R. (1980). Learning through feedback. Human Sciences Press.
Van Rooij, E.C.M, Fokkens-Bruinsma, M., & Goedhart, M. (2019). Preparing science
undergraduates for a teaching career: Sources of their teacher self-efficacy. The Teacher
Educator, 54(3), 270-294. https://doi.org/10.1080/08878730.2019.1606374
Vartuli, S., Bolz, C., & Wilson, C. (2014). A learning combination: Coaching with CLASS and
the project approach. Early Childhood Research & Practice, 16(1), 1.
Vasquez, C. (2004). “Very carefully managed”: Advice and suggestions in post observation
meetings. Linguistics and Education, 15(1-2), 33-58.
https://doi.org/10.1016/j.linged.2004.10.004
Woolfolk, A. (1993). Educational psychology. Allyn & Bacon.
Yang, M., & Carless, D. (2013). The feedback triangle and the enhancement of dialogic feedback
processes. Teaching in Higher Education, 18(3), 285–297.
Yin, R.K. (2016). Qualitative research from start to finish, Second Edition. The Guilford Press.
Zeichner, K. (1996). Teachers as reflective practitioners and the democratization of school reform.
In K. Zeichner, S. Melnick, & M. L. Gomez (Eds.), Currents of reform in preservice
27
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14
teacher education (pp. 199-214). Teachers College Press.
https://ro.uow.edu.au/jutlp/vol18/iss8/14
28
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2050-7003.htm
JARHE
10,4
University faculty’s perceptions
and practices of student centered
learning in Qatar
514
Alignment or gap?
Saed Sabah
Received 10 November 2017
Revised 27 December 2017
Accepted 11 February 2018
Department of Educational Sciences, College of Education,
Qatar University, Doha, Qatar and
Hashemite University, Zarqa, Jordan, and
Xiangyun Du
Department of Educational Sciences, College of Education,
Qatar University, Doha, Qatar and
UNESCO Center for PBL, Aalborg University, Aalborg, Denmark
Abstract
Purpose – Although student-centered learning (SCL) has been encouraged for decades in higher education, to
what level instructors are practicing SCL strategies remains in question. The purpose of this paper is to investigate
a university faculty’s understanding and perceptions of SCL, along with current instructional practices in Qatar.
Design/methodology/approach – A mixed-method research design was employed including quantitative
data from a survey of faculty reporting their current instructional practices and qualitative data on how these
instructors define SCL and perceive their current practices via interviews with 12 instructors. Participants of
the study are mainly from science, technology, engineering and mathematics (STEM) field.
Findings – Study results show that these instructors have rather inclusive definitions of SCL, which range from
lectures to student interactions via problem-based teamwork. However, a gap between the instructors’ perceptions
and their actual practices was identified. Although student activities are generally perceived as effective teaching
strategies, the interactions observed were mainly in the form of student–content or student-teacher, while
student–student interactions were limited. Prevailing assessment methods are summative, while formative
assessment is rarely practiced. Faculty attributed this lack of alignment between how SCL could and should
be practiced and the reality to external factors, including students’ lack of maturity and motivation due to the
Middle Eastern culture, and institutional constraints such as class time and size.
Research limitations/implications – The study is limited in a few ways. First regarding methodological
justification the data methods chosen in this study were mainly focused on the faculty’s self-reporting. Second
the limited number of participants restricts this study’s generalizability because the survey was administered
in a volunteer-based manner and the limited number of interview participants makes it difficult to establish
clear patterns. Third, researching faculty members raises concerns in the given context wherein extensive
faculty assessments are regularly conducted.
Practical implications – A list of recommendations is provided here as inspiration for institutional support
and faculty development activities. First, faculty need deep understanding of SCL through experiences as learners
so that they can become true believers and implementers. Second, autonomy is needed for faculty to adopt
appropriate assessment methods that are aligned with their pedagogical objectives and delivery methods. Input
on how faculty can adapt instructional innovation to tailor it to the local context is very important for its longterm effectiveness (Hora and Ferrare, 2014). Third, an inclusive approach to faculty evaluation by encouraging
faculty from STEM backgrounds to be engaged in research on their instructional practice will not only sustain
the practice of innovative pedagogy but will also enrich the research profiles of STEM faculty and their institutes.
Journal of Applied Research in
Higher Education
Vol. 10 No. 4, 2018
pp. 514-533
Emerald Publishing Limited
2050-7003
DOI 10.1108/JARHE-11-2017-0144
© Saed Sabah and Xiangyun Du. Published by Emerald Publishing Limited. This article is published
under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute,
translate and create derivative works of this article ( for both commercial and non-commercial
purposes), subject to full attribution to the original publication and authors. The full terms of this
licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
The authors would like to thank the participants in the study: the authors’ colleagues, who
supported this study.
Social implications – The faculty’s understanding and perceptions of implementing student-centered
approaches were closely linked to their prior experiences—experiencing SCL as a learner may better shape
the understanding and guide the practice of SCL as an instructor.
Originality/value – SCL is not a new topic; however, the reality of its practice is constrained to certain social
and cultural contexts. This study contributes with original and valuable insights into the gap between
ideology and reality in implementation of SCL in a Middle Eastern context.
Keywords Qatar, Assessment, Student-centered learning, Instructional practices, STEM faculty
Paper type Research paper
1. Introduction
In general, higher education (HE) faces challenges in providing students with reasoning and
critical thinking skills, problem formulation and solving skills, collaborative skills and the
competencies required to cope with the complexity and uncertainty of modern professions
(Henderson et al., 2010; Seymour and Hewitt, 1997; Martin et al., 2007; Smith et al., 2009). HE
research often reports that traditional lecture-centered education does not provide
satisfactory solutions to these challenges (Du et al., 2013; Smith et al., 2009), thereby failing
to facilitate students’ meaningful learning of their subjects (Henderson et al., 2010). In some
cases, it has resulted in a deficit of university graduates from certain fields, in particular,
science, technology, engineering and mathematics (STEM) fields (Graham et al., 2013;
Seymour and Hewitt, 1997; Watkins and Mazur, 2013). A change in instructional practices is
believed to be necessary to provide students with the requisite skills and competencies, and
could potentially serve as a retention strategy in these particular fields such as STEM
(Graham et al., 2013; Seymour and Hewitt, 1997; Watkins and Mazur, 2013). Therefore, it is
essential to innovate the pedagogical methods and practices used in these fields (American
Association for the Advancement of Science (AAAS), 2013; Henderson et al., 2010).
Instructional change has resulted in a variety of pedagogical reform initiatives that have
been encouraged in STEM classroom practices, including active learning, inquiry-based
learning, collaborative learning in teams, interactive learning, technology-enhanced learning,
and peer instruction. A substantial body of literature has reported research results regarding
how these innovative instructional strategies affect student learning (Graham et al., 2013;
Henderson et al., 2010; Watkins and Mazur, 2013). Despite a worldwide trend in instructional
change toward student-centered learning (SCL), to what extent university instructors are
implementing or practicing these strategies and how they perceive this change is still in
question. The international literature has reported that lecture remains the prevailing
instructional practice in STEM classrooms despite the waves of pedagogical innovation
encouraged at an institutional level (Hora and Ferrare, 2014; Froyd et al., 2013; Prince and
Felder, 2006; Walczyk and Ramsey, 2003). In addition, STEM faculty may discontinue its
practice of certain types of instructional innovation at certain stages of innovation diffusion
due to various reasons including institutional challenges such as a heavy work load and large
class sizes, and the lack of individual interests (Henderson and Dancy, 2009). Furthermore, the
fidelity of the implementation of SCL approaches is also in question (Borrego et al., 2013).
Therefore, this study aims to investigate how faculty who work as instructors in STEM
undergraduate programs report their instructional practices and how they perceive the
implementation of SCL instructional strategies in their situated contexts.
2. Literature review
Over the past few decades, a global movement emerges and calls for a new model of learning
for the twenty-first century and several key elements are highlighted including solving
complex problems, communication, collaboration, critical thinking, creativity, responsibility,
empathy, management, among others (NEA, 2010; Scott, 2015). Following this trend,
university teaching and learning has transformed from being lecture based and teacher
centered to focusing more on engaging and enhancing student learning (Barr and Tagg, 1995;
Student
centered
learning
in Qatar
515
JARHE
10,4
516
Kolmos et al., 2008; Slavich and Zimbardo, 2012). In the process of this transformation, SCL
has become a well-used concept. Defined as an approach that “allows students to shape their
own learning paths and places upon them the responsibility to actively participate in making
their educational process a meaningful one” (Attard et al., 2010, p. 9), SCL is focused on
providing an active-learning environment in flexible curricula with the use of learning
outcomes to understand student achievements (pp. 10-12). Rooted in a constructivist approach
that moves beyond mere knowledge transmission, such learning is conceived as a process
whereby learners search for meaning and generate meaningful knowledge based on prior
experiences (Biggs and Tang, 2011; Dewey, 1938).
In the STEM fields, instructional practices of instructors are changing from teacherdirected approaches to student-centered approaches to improve the quality of
undergraduate education ( Justice et al., 2009). A substantial number of studies have
reported the positive effects of a variety of approaches to student-centered pedagogy in
STEM HE, such as active learning (Felder et al., 2000; Freeman et al., 2014), small-group
learning (Felder et al., 2000; Freeman et al., 2014; Springer et al., 1999; Steinemann, 2003),
and inquiry-based pedagogy (Anderson, 2002; Curtis and Ventura-Medina, 2008; Duran
and Dökme, 2016; Ketpichainarong et al., 2010; Martin et al., 2007; Simsek and Kabapinar,
2010). Furthermore, problem- and project-based pedagogy has been well documented as
an effective way to help students not only construct subject knowledge meaningfully, but
also develop the skills necessary for many professions, including critical thinking,
problem solving, communication, management and collaboration (Bilgin et al., 2015;
Du et al., 2013; He et al., 2017; Kolmos et al., 2008; Lehmann et al., 2008; Steinemann, 2003;
Zhao et al., 2017).
Definitions of these terminologies vary and the term SCL in particular is not always used
with consistent meaning. However, a few points of agreement can be summarized (Rogers,
2002): who the learners are, the context, the learning activities and the processes. Weimer
(2002) identifies five key areas for change in the process of transformation from teachercentered to learner-centered classrooms: the balance of power, the function of content, the
role of the teacher, the responsibility for learning, and the purpose and process of evaluation.
In relation to the practice and implementation of a student-centered approach, Brook (1999)
provides a list of guiding principles for the development of constructivist teachers who
prioritize SCL strategies in HE. These are: using problems that are relevant to students,
constructing learning around principal concepts, eliciting and appreciating students’
perspectives, adjusting the curriculum and syllabus to address students’ prior experience,
and linking assessment to learning goals and student learning.
A wide range of perspectives has been addressed in previous studies on SCL in HE. Brook
(1999), Rogers (2002) and Weimer (2002) provide a synthesis of guiding principles suggesting
three dimensions of focus: instructors (how they understand and perceive the instructional
innovation they are expected to adopt), student activity and interaction, and assessment.
The instructor represents an important and challenging aspect of instructional change,
particularly, regarding innovative pedagogy and SCL (Ejiwale, 2012; Kolmos et al., 2008;
Weimer, 2002). In a teacher-centered environment, instructors play the dominant role in
defining objectives, content, student activities and assessment. Whereas in an SCL
environment, instructors facilitate learning via providing opportunities for students to be
involved in decision-making regarding goals, content, activities and assessment.
Nevertheless, in the reality of instructional practice, instructors face the dilemma of, on
the one hand, giving students the freedom to make decisions on their own, and on the other
hand, retaining control of classroom activities (Du and Kirkebæk, 2012). In addition, how
instructors handle the changes in their relationships with students is a determining factor in
the extent to which SCL can be established. In their meta-analysis of student-teacher
relationships in a student-centered environment, Cornelius-White (2007) suggests that
positive teaching relationship variables, such as empathy, warmth, encouragement and
motivation, are more associated with learner participation, critical thinking, satisfaction,
drop-out prevention, positive motivation and social connection. In their proposal for
developing pedagogical change strategies in STEM, Henderson et al. (2010) emphasize that
the beliefs and behaviors of individual instructors should be targeted because they are
essential to any strategy for changing the classroom practices and environment. In general,
the existing literature agrees that for pedagogical change strategy development, it is
essential to work with the instructors and to understand their current instructional practices
as well as their perceptions of the change.
A student-centered approach emphasizes providing students with opportunities to
participate and engage in activities while interacting with the subject matter, the teacher
and each other. Student responsibility and ownership of their own learning is regarded as
essential in facilitating classroom interactions. Self-governance of the interactions can be
enhanced through collaborative group work when students are expected to negotiate and
reach consensus on how to work and learn together. Instead of meeting an objective set by
the instructors, students should take responsibility for organizing learning activities in
order to reach goals they themselves set (Du et al., 2013; Weimer, 2002). The function of
teaching content lies in aiding students in learning how to learn, rather than in the
transmission of factual knowledge (Du and Kirkebæk, 2012).
Student-centered instructional strategies and practices require a change of assessment
methods. Formative assessment, which refers to assessment methods that are intended to
generate feedback on learner performance to improve learning, is often used to facilitate selfregulated learning (Nicol and Macfarlane-Dick, 2006). In their review of formative
assessment, Black and William (1998) summarize the effectiveness of this method in relation
to different types of outcomes, educational levels and disciplines. As they emphasize, the
essential aspect that defines the success of formative assessment is the quality of the
feedback provided to learners, both formally and informally. Furthermore in formative
assessment, the process of learning through feedback and dialogue between teachers and
students and among students is highly accentuated. Various formative assessment methods
have been reported as additional or alternative methods to the prevailing summative
assessment methods in STEM in order to align assessment constructively with the
implementation of SCL (Downey et al., 2006; Prince and Felder, 2006).
To plan and implement meaningful initiatives for improving undergraduate instruction,
it is important to collect data on the instructors’ instructional practices (Williams et al.,
2015). Nevertheless, the existing literature has mainly focused on students’ attitudes,
performance and feedback on SCL. A limited number of studies have examined the
outcomes of faculty development activities that encourage research-based instructional
strategies for SCL. These studies report a good level of faculty knowledge and awareness of
various alternative instructional strategies in the fields of physics education (Dancy and
Henderson, 2010; Henderson et al., 2012) and engineering and science education (Brawner
et al., 2002; Borrego et al., 2013; Froyd et al., 2013). However, instructors’ adoption of
teaching strategies varies according to individual preferences and beliefs, the contexts of
disciplines, and institutional policy (Borrego et al., 2013; Froyd et al., 2013), and their
persistence in the adoption and current use of these strategies (Hora and Ferrare, 2014;
Henderson and Dancy, 2009; Walter et al., 2016) and their fidelity (how closely the
implementation follows its original plan) (Borrego et al., 2013) are still in question.
Therefore, there is a need for additional studies addressing instructors’ understanding,
beliefs and perceptions about practicing SCL that impact their instructional design for
classroom interactions, and how they construct assessment methods to align with their
adoption of instructional strategies. Further research should examine how instructors
perceive their roles and experiences in the process of instructional change.
Student
centered
learning
in Qatar
517
JARHE
10,4
518
3. Present study
The state of Qatar has the vision of transforming itself into a knowledge-producing economy
(General Secretariat for Development Planning, 2008; Rubin, 2012). Accordingly, advancement in
the fields of science and technology is a critical goal, as is promoting pedagogical practices that
support engagement in science and technology education (Dagher and BouJaoude, 2011). Qatar
University (QU) is the country’s foremost institution of HE and aims to become a leader in
economic and social development in Qatar. In its strategic plan for 2013–2016 (Qatar University
(QU), 2012), the leadership of QU has called for instructional innovation toward SCL by developing
“the skills necessary in the 21st century such as leadership, teamwork, communications,
problem-solving, and promoting a healthy lifestyle” (QU, 2012, p. 13). It is expected that these
initiatives will be implemented at the university level, particularly in the STEM fields.
Research on general university instructional practices in Qatar remains sparse, with little
information available on current instructional practices and to what extent student-centered
teaching and learning strategies are being implemented. In a recent study, the first on
university instructional practices in Qatar, Al-Thani et al. (2016) reported that across
disciplines, instructors’ prioritized lecture-based and teacher-centered instructional practices.
For example, most participants stressed lecture and content clarity as the most important and
effective practices. In contrast, instructors mentioned less about student–student interaction,
the integration of technology and instructional variety received less interest, according to the
perceptions of the participants. However, little is known about either actual classroom
practices or the instructors’ perception of SCL, in particular in STEM fields.
To develop feasible change strategies that could be applied in the Qatar context with the
aim of facilitating innovation in HE in general and STEM education in particular, it is
essential to understand current instructional practices and how instructors perceive SCL, as
well as what strategies are being implemented (Henderson et al., 2010). Therefore, this study
aims to investigate STEM faculty’s perceptions and instructional practices of SCL and in
Qatar. The purpose is to generate knowledge on the research-based evaluation of STEM
faculty’s instructional practices. The study formulates the following research questions:
RQ1. What are the instructional practices of STEM faculty in Qatar?
RQ2. To what extent are instructors’ current practices student-centered?
RQ3. How do STEM faculty perceive SCL, possibilities for implementation and
challenges in classroom practice?
4. Research methods
4.1 Research design
Ideally, the study of STEM instructional practices involves the use of multiple techniques. The
methods commonly used to investigate university teaching practices include interviews with
instructors and students, portfolios written by instructors, surveys of instructors and
students, and observations in educational settings (AAAS, 2013). However in reality, research
conditions limit the choice of data collection methods (Creswell, 2013). Although classroom
observation and portfolios are widely practiced in schools and can be a potential method for
improving university teaching and learning, these rarely occur in practice except in cases of
faculty promotion, evaluation or professional development requests (AAAS, 2013). In addition,
peer and protocol-based observations demand significant resources of human labor, materials,
equipment and physical conditions, which makes them challenging to implement on a larger
scale (Walter et al., 2016). Therefore, a mixed-methods research design combining the
strengths of quantitative and qualitative data – surveys and interviews – was employed as
the major data generation method in this study (Creswell, 2002).
4.2 Participants
An open invitation was sent to the entire faculty in the science, engineering, mathematics and
health sciences fields, asking them to consider participating on a voluntary basis. A sample of
65 faculty members (23.4 percent female and 76.4 percent male) completed the questionnaire.
4.3 Data generation methods
Survey and instruments. A self-reported questionnaire survey is one of the most efficient
ways to gain information due to its accessibility, convenience to administrate and relative
time efficiency (AAAS, 2013, p. 7). Despite the common concern that the faculty may
inaccurately self-report their teaching practices, recent literature reports that some
aspects of instruction can be accurately reported by instructors (Smith et al., 2014); this
approach helps to identify instructional practices that are otherwise difficult to observe
(Walter et al., 2016).
The Postsecondary Instructional Practices Survey (PIPS) (Walter et al., 2016) is a newly
developed instrument aimed at investigating university teaching practices cross-disciplinarily
from the perspective of instructors. The PIPS was developed on the basis of a conceptual
framework constructed from a critical analysis of existing survey instruments (Walter et al.,
2015), the observation codes of the Teaching Dimensions Observational Protocol (Hora et al.,
2012), and the Reformed Teaching Observation Protocol (Piburn et al., 2000). The PIPS has
been proven to be valid and reliable while providing measurable variables, and results from
initial studies have shown that PIPS self-reported data are compatible with the results of
several Teaching Dimensions Observational Protocol codes (Walter et al., 2016).
The PIPS includes 24 items for statements and reports regarding instructional practice
and demographic questions on items such as gender, rank and academic titles. An intuitive,
proportion-based scoring convention is used to calculate the scores. Two models are used
for the supporting analysis – a two-factor or five-factor solution. Factors in the five-factor
model include: six items for student–student interactions, four items for content delivery,
four items for formative assessment, five items for student–content engagement and four
items for summative assessment. Factors in the two-factor model include: nine items for
instructor-centered practices and 13 items for student-centered practices. The responses
from participants were coded as (0) not at all descriptive of my teaching, (1) minimally
descriptive of my teaching, (2) somewhat descriptive of my teaching, (3) mostly descriptive
of my teaching and (4) very descriptive of my teaching.
In-depth interviews. An interview can provide opportunities to explore teaching practices
through interactions with the participants. It can also provide the space for in-depth
questions on specific teaching practices as well as perceptions, beliefs, opinions and
potentially unexpected findings (Creswell, 2013). During the interviews (interview
guidelines see Appendix), participants were asked questions about their understanding of
and past experiences with SCL, their perceptions of the effectiveness of practicing SCL in
general and in their current environments in particular, what challenges and barriers they
had experienced, and what institutional support is needed.
4.4 Procedure
The questionnaire was sent to all participants in early spring 2017 and was administered by
Qualtrics. An explanation of the goals of the survey, namely, to understand their current
practices without intention of assessment, was provided to the participants. A pilot test was
conducted with a several colleagues who were not participants to ensure that the questions
were unambiguous and addressed the goals.
A sample of 65 faculty members (23.4 percent female and 76.4 percent male) completed
the questionnaire. These were from the schools of sciences, health sciences, pharmacy
Student
centered
learning
in Qatar
519
JARHE
10,4
520
and engineering. The average HE teaching experience of the participants was 14.5 years.
About 15.6 percent of participants were full professors, 39.1 percent were associate
professors, 31.3 percent were assistant professors and 14 percent were instructors or
lecturers. About 58.6 percent of the participants did not have a leadership role (e.g. head of
department, chair of curriculum committee).
In total, 12 (4 female and 8 male) of the 65 faculty members who completed the
questionnaire responded positively to the individual interview request. The interview
participants include a representative range of STEM faculty members by academic titles (three
professors, three associate professors and six assistant professors) and gender ( four female
and eight male). Table I shows details of interview participants’ background information.
5. Analyses and findings
5.1 Quantitative data analysis and results
To answer the first research question, the mean and standard deviation of each item were
calculated to identify the practices that best describe STEM faculty teaching in the given context.
Name
Academic
Gender rank
Abdullah
Male
Mohammad Male
Assistant
Professor
Assistant
Professor
Professor
Burhan
Male
Amin
Male
Ali
Male
Ibrahim
Male
Assistant
Professor
Ihab
Male
Professor
Alia
Female Associate
Professor
Faris
Male
Sara
Female Assistant
Professor
Iman
Female Assistant
Professor
Duaa
Female Professor
Associate
Professor
Associate
Professor
Assistant
Professor
Table I.
Interview participants’
background
Note: All names are anonymous
information
Previous pedagogical experiences
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment in 4 countries
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment in 2 countries
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment and in
active-learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment and in
inquiry-based learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment and in
project-based learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment and in
project-based learning environment
Student experiences in lecture-based learning environment
Teaching experiences in lecture-based learning environment and in
problem-based learning environment
Student experiences in lecture-based learning environment and
problem-based learning environment
Teaching experiences in problem-based learning environment
Student experiences in lecture-based learning environment and
problem-based learning environment
Teaching experiences in problem-based learning environment
Student experiences in lecture-based learning environment and
problem-based learning environment
Teaching experiences in lecture-based learning environment and
problem-based learning environment in 3 countries
The grand mean for each factor was also calculated. The descriptive statistics for participants’
responses to the PIPS are presented in Table II.
The participants reported that the items of factor 2 (F2), content-delivery practices, were
mostly descriptive of their teaching (x ¼ 3.14). That is, the items stating that their syllabus
contains the specific topics that will be covered in every class session (x ¼ 3.58), they
structure the class session to give students good notes (x ¼ 3.18), and they guide students as
they listen and take notes (x ¼ 2.89) were mostly descriptive of their content delivery.
The grand mean of student–content engagement (F4) was relatively high (x ¼ 3.07). This
means that, for example, instructors frequently ask students to respond to questions during
class time (x ¼ 3.49) and frequently structure problems so that students are able to consider
multiple approaches to finding a solution.
As to the student–student interaction factor (F1), the grand mean (x ¼ 2.18) was
relatively low compared to the other factors. The item means ranged from 1.9 to 2.51, with
Factor
Factor 1: student–student interaction
P10. I structure class so that students explore or discuss their understanding of new concepts
before formal instruction
P12. I structure class so that students regularly talk with one another about course concepts
P13. I structure class so that students constructively criticize one another’s ideas
P14. I structure class so that students discuss the difficulties they have with this subject with
other students
Grand mean of factor 1
Factor 2: content delivery practices
P01. I guide students through major topics as they listen and take notes
P03. My syllabus contains the specific topics that will be covered in every class session
P05. I structure my course with the assumption that most of the students have little useful
knowledge of the topics
P11. My class sessions are structured to give students a good set of notes
Grand mean of factor 2
Factor 3: formative assessment
P06. I use student assessment results to guide the direction of my instruction during the semester
P08. I use student questions and comments to determine the focus and direction of classroom
discussion
P18. I give students frequent assignments worth a small portion of their grade
P20. I provide feedback on student assignments without assigning a formal grade
Grand mean of factor 3
Factor 4: student–content engagement
P02. I design activities that connect course content to my students’ lives and future work
P07. I frequently ask students to respond to questions during class time
P09. I have students use a variety of means (models, drawings, graphs, symbols, simulations,
etc.) to represent phenomena
P16. I structure problems so that students consider multiple approaches to finding a solution
P17. I provide time for students to reflect about the processes they use to solve problems
Grand mean of factor 4
Factor 5: summative assessment
P21. My test questions focus on important facts and definitions from the course
P22. My test questions require students to apply course concepts to unfamiliar situations
P23. My test questions contain well-defined problems with one correct solution
P24. I adjust student scores (e.g. curve) when necessary to reflect a proper distribution of grades
The new grand mean of factor 5, excluding P24
Student
centered
learning
in Qatar
521
Mean SD
2.51 1.03
2.77 1.02
2.06 1.07
2.06 1.11
2.18 0.82
2.89 0.99
3.58 0.77
2.89 0.94
3.18 0.83
3.14 0.55
2.98 0.86
2.95
2.7
1.82
2.62
0.86
1.22
1.33
0.69
3.11 0.95
3.49 0.77
2.92
2.94
2.88
3.07
3.03
2.58
2.91
0.89
2.84
1.04
0.88
0.87
0.53
1.00
1.16
1.13
1.19
0.8
Table II.
The descriptive
statistics for
participants’
responses to the PIPS
survey – five-factor
model analysis
JARHE
10,4
522
the maximum possible value being 4. Compared with the other items of this factor, item P13
(“I structure class so that students constructively criticize one another’s ideas”) had the
lowest mean (x ¼ 1.9), which indicates that this practice is somewhat, but not mostly or very
much, descriptive of instructors’ practices. The item concerning structuring the class so that
students discuss the difficulties they have with the subject matter with other students also
had a low mean (x ¼ 2.06).
The formative assessment factor (F3) also had a relatively low grand mean (x ¼ 2.62).
The mean of item P20 was 1.82, indicating that providing feedback on student assignments
without assigning a formal grade was not very descriptive of QU instructors’ practices.
The means for the rest of the items ranged from 2.7 to 2.98. Using student comments and
questions to determine the direction of classroom discussions (x ¼ 2.95) and using student
assessment results to guide the direction of their instruction (x ¼ 2.98) were mostly
descriptive of QU instructors’ practices, as reported by participants.
The summative assessment factor (F5) had a low grand mean (x ¼ 2.35). This relatively
low mean was greatly impacted by item P24 (“I adjust student scores [e.g. curve] when
necessary to reflect a proper distribution of the grades”). In the given context, instructors are
not allowed to adjust student scores, so the result of this item reflects university policy
rather than individual instructor’s preference. An analysis excluding item P24 shows a
different picture: the mean of the summative assessment factor without item P24 becomes
2.84. Thus, the student–student interaction factor and the formative assessment factor
represent the lowest means in this study.
To answer the second research question, a paired samples t-test was conducted to
compare the mean of student-centered items (P02, P04, P06-10, P12-16, P18-20) with the
mean of the instructor-centered items (P01, P03, P05, P11, P17, P21-24). The mean of
the student-centered factors is 2.69 and the mean of the instructor-centered factors is 2.76.
The results of the paired samples t-test found no statistically significant difference
(α ¼ 0.05) between student-centered mean and instructor-centered mean (t ¼ −1.00,
df ¼ 64). However, when item 24 is excluded, the mean of the instructor-centered items
becomes 2.99. A significant difference (α ¼ 0.05) was found between the student-centered
mean and the new (excluding item 24) instructor-centered mean (t ¼ −4.15, df ¼ 64).
An alignment was identified between the results of the five-factor model analysis and the
two-factor model analysis. Quantitative analysis results did not show a correlation between
instructional practices and demographic factors such as academic rank or years of teaching.
However, the results identified significant differences in using student-centered
instructional practices according to the gender of the participant. Based on the data
reported by participants, the mean of using student-centered instructional practices was
2.81 for male participants and 2.37 for female participants. A one-way ANOVA found a
statistically significant difference (α ¼ 0.05) between the student-centered mean of male
participants and that of female participants (F ¼ 7.64, p ¼ 0.008).
5.2 Qualitative data analysis and results
The qualitative analysis provides answers to the third research question. All interviews
were transcribed before being coded and analyzed. The analysis used an integrated
approach combining guiding principles on SCL by Brook (1999), Rogers (2002) and
Weimer (2002), and Kvale and Brinkmann’s (2009) meaning condensation method. The
analyzed identified emerging themes from instructors’ accounts of their opinions,
experiences and reflections.
Instructors’ definitions and perceptions of their roles in SCL. Although all interviewed
instructors believed they were using SCL strategies in their classrooms, they defined the
term SCL in various ways. Three categories of definitions were identified; these are
explained below. Interview data also found a consistency between instructors’ definitions
and their perceptions of their roles in an SCL environment:
Category 1: there were three instructors, one professor and two assistant professors,
all male that believed lecturing to be the best way of teaching and learning.
According to them, a good lecturer is keen to motivate and encourage students to be
free thinkers. When students choose to enter a university, they should be sufficiently
mature and willing to work hard enough to progress through their education.
Therefore, the university “should be student-centric by definition” (Burhan). This
definition was supported by the following remark:
I believe that in our university every instructor is doing SCL in their own way […] but instead of
standing there reading slides, I think it makes it more student-centered by providing an interesting
lecture so that when they leave the room you will hear them say, “Wow, this is inspiring and
interesting.” (Mohammad)
All three of the instructors interviewed conceived of their role as to “inspire and attract
students.” As Abdullah commented:
It is the responsibility of the instructors to find a way to bring in highly interesting lectures to make
students interested […] to do that, we should prioritize research, so we have something really
interesting to bring to the class.
Category 2: instructors in this category included one female associate professor, one
male assistant professor, two male associate professors and one male professor. They
believed that in an SCL environment, the instructor should provide activities for
students to learn hands-on skills and relate theories to certain practices, and that
students should acquire deep knowledge in the field by working together actively on
classroom activities. As Ihab commented, “[I]t is so boring to just fill the class with
me talking and lecturing. It is fun to plan some activities so students can work in a
team so that they can practice the theories; students like these [activities].” In such an
environment, the instructor should play the role of “providing” activities and
“guiding” students to learn the requested, relevant knowledge through these
activities, as most of them suggested.
Category 3: this category included two female assistant professors, one female
professor and one male assistant professor. They believed students should work in
small groups, with no more than ten people per team, on certain targets, such as
solving a problem. Students should be responsible for organizing study activities and
should make decisions on their own to prepare for the requirements of their future
professions. They should also be allowed to make mistakes and should receive help
with reflecting on these mistakes in order to improve. As Faris commented:
I did not like my own student experiences which were filled with lectures and lab work,
I appreciated my past experienced of working in a more student-centered learning environment,
which offered me tools to provide what I think as better learning environment now to my students.
These four instructors used a few different metaphors to describe their roles: “leaders” – “leading
students to work towards their targets” (Sara and Iman), “observers” – “observing students from
a distance and only interfering when they got off-track” (Faris), and “facilitators” – “having
patience when students made mistakes” (Faris), “providing rich resources to students in need of
help and redirecting students when they were in trouble” (Sara and Iman), “assisting students to
be able to make their own decisions on learning goals, what to learn and how to learn it, and
critically evaluate and reflect on their own learning” (Duaa).
Student
centered
learning
in Qatar
523
JARHE
10,4
524
The interview data did not reveal any patterns in teachers’ definitions and perceptions
according to their academic ranking or gender. However, past experiences with SCL seemed
to make a difference in their understanding and choice of strategies. For example,
participants from category 1 mainly experienced lectures as the major source of learning
and form of teaching in their past student and teaching experiences. Those from category 2
experienced different types of SCL environments due to their previous work experiences but
not during their student experiences. Two participants from category 3 experienced SCL in
the form of problem-based learning (PBL) in their past student experiences, and the other
two participants had experiences with SCL both as learners and as instructors prior to their
current jobs. A participant’s past experiences, particularly as a learner, seem to have a close
link to their current instructional practice. As Sara remarked:
Having experienced the Problem-Based Learning in my college time, I truly it is the best way to
learn. Working in team offered us great opportunities to help each other and support each other.
This means a lot in particular for us female in Arabic culture. We never went out to talk with others
before and in such a environment we learned how to interact with others and how to behave
professionally […] we increased our self-confidence and it was very empowering.
Although all three groups mentioned that students should take responsibility for their own
learning, when asked to what level students should be involved in deciding what to learn
and how to learn it, and even how to assess what they have learned, only one instructor (Ali)
said it would be ideal to involve students in these decisions. However, he had neither
experienced this himself nor had he observed any such practice in his immediate
environment. Out of the 12 interviewees, 10 believed that instructors should decide which
activities to provide, what materials to use and how to structure student activity time and
form, and should also ensure students reach “the correct” answers.
While the data are too broad to draw any strong conclusions, the majority of the
classroom activities that the interviewed instructors exemplified focused on students
working in groups to fulfill an assignment designed by the instructor or students answering
questions from the instructor in a teacher-student one-to-one form. The roles described by
all the instructors involved offering directions and structures. As most of instructors
mentioned, given the time pressure to deliver all the required content for their courses, they
had to ensure students progressed through the mandated learning checkpoints.
Assessment. The interviewed instructors agreed that assessment played an essential role
in evaluating student learning. One instructor said, an exam “is the best way to engage them
to learn because they work so hard just before it” (Ibrahim). With the exception of one
instructor, the respondents gave multiple-choice questions plus short-answer questions as the
major forms of assessment they used. However, their opinions on what should be included in
and what should be the focus of the assessment diverged. The instructors provided examples
that included; “To prepare [students] for their future profession, exams in universities should
focus on lots of hand-on skills” (Alia); “More writing skills are needed for the exam” (Amin);
and “Students need to be posed exams that can question their thinking skills” (Faris).
Two major reasons for the choice of assessment were provided. First, the assessment
committee within a college or across colleges defined the assessments as exams for some
undergraduate courses, particularly general courses. This limited the options for instructors
to design exams different from common exam used in these classes. Second, when
instructors did have the freedom to design exams for their courses, it is most convenient to
use assessment forms that can “examine the knowledge students have mastered” and are
the “least time-consuming” for grading purposes, as 8 out of the 12 interview participants
expressed. As one participant said, “It takes a few hours to grade multiple-choice question
exams. With the busy schedule we have, you don’t want to spend several days to grade and
provide feedback for a few hundred essays” (Ihab).
Two of the interviewed instructors (Faris and Duaa) expressed their views on how
formative assessment should be further enhanced in order to better facilitate SCL, only one
of them enhanced their assessments in daily practice, as Duaa commented:
Real SCL should involve students not only in deciding on what activities they take in the classroom,
but also in defining assessment methods, but I can see the students are shocked when I invite them
to give opinions on how they should be assessed […] it will take more time before more people
understand that involving students in defining assessment is to motivate them to be more
responsible instead of cheating.
Given this challenge, this instructor mainly relied in practice on asking students to identify
and structure their own projects and problems.
Challenges. The majority of the instructors believed that students are the most
challenging factor in implementing ideal SCL in the given context. A major reason cited
for this is the Arabic culture. Out of the 12 interviewed instructors, 11 believed that most
students were raised in Arabic families deeply rooted in Middle Eastern culture, where
family plays an important role in one’s daily life, meaning that most teenagers do not have
opportunities to live alone and make decisions independently. In addition, their high
school experiences did not help them become independent learners, as in that setting they
are used to lectures and taking arranged assignments without asking any “why”
questions and exams that are mostly in the form of multiple-choice questions that test
their memories. Students are familiar with being provided with information and
instruction and having their time arranged and they even prefer it that way. As an
instructor said, “This is how the students grow up; they are used to it and they cannot take
responsibilities on their own. They are not motivated to do things independently, no
matter how the instructor works hard to push them, they are not really ready for a true
SCL” (Alia).
Large classroom sizes were identified as another major challenge for implementing
student–student activities because the students easily slip into a chaotic and “out of control”
mode, according to some teachers. Interestingly, this was used as an argument for “offering a
really interesting lecture as an effective approach to provide SCL,” as Abdullah commented.
Finally, the busy schedule of university faculty remains a reason to limit what they do:
“if we don’t have so much teaching load, we may have more time to do what could have
been more student-centered strategies such as letting students identify problems and
learning needs on their own” (Ali). Although teaching plays an important role in the
appraisal system at QU, research products, such as publications, remain the major tool to
evaluate faculty performance. Ibrahim mentioned “when we apply for promotion, which is
particularly crucial for assistant professors, all what is to be evaluated is the publication
in one’s own field, as long as we can prove we are able to teach, it is not highly critical how
we teach.”
Support needed. Three participants expressed their desire for an institutionalized
approach to changing the assessment system, allowing for more faculty autonomy to design
assessment methods that are appropriate for their courses. Most of the suggestions for
support referred to actions focusing on faculty and students. In total, 11 participants
suggested more workshops and training sessions for faculty to gain the necessary skills to
facilitate SCL. Five participants suggested student tutoring programs to help first-year
undergraduate students learn personal responsibility and to “grow up by following
suggestions from experienced students” (Faris). One participant even suggested that
attention to pedagogy should be reduced for now because “We give too much attention to
the students, nearly like spoon-feeding, worrying too much about whether they are happy or
not in studying […] students should stand on their own feet, and sometimes they learn by
being thrown into the deep sea” (Burhan).
Student
centered
learning
in Qatar
525
JARHE
10,4
6. Discussion
In this section, we compare the qualitative data findings and the quantitative study results
and discuss them in relation to the three dimensions of focus in SCL previously summarized in
this paper: instructors’ perceptions and roles, student activity and interaction and assessment.
This is followed by a discussion of STEM instructors’ views on challenges to implementation.
526
6.1. STEM instructors’ understanding and perceptions of SCL
Improving the quality of teaching and learning in the STEM fields necessitates exploring
the conceptions that faculty instructors hold regarding the learning environment and the
context of teaching since teaching approaches are strongly influenced by the underlying
beliefs of the teacher (Kember, 1997). The participants in this study hold different beliefs
about and attitudes toward SCL strategies. Connections can be identified between the
participants’ understandings and perceptions of SCL and their prior experiences with it.
Those who had experienced SCL as learners tended to make more of an effort to implement
the strategies effectively in their own teaching practice. This finding echoes previous
studies suggesting that in order to maximize their capability of facilitating PBL faculty
should be provided with opportunities to experience PBL as learners (Kolmos et al., 2008).
Comparing results from the quantitative and qualitative data, this study identifies gaps
between what the instructors consider to be SCL and what they actually practice.
As suggested by Paris and Combs (2006), the broad and wide-ranging definitions of SCL
legitimize the instructors’ actual practices. This gap can serve as an alert when a large-scale
change initiative is being implemented in the given context. As Henderson et al. (2011)
note, awareness and knowledge of SCL strategies cannot guarantee their actual practice.
6.2 Student activity and interaction
This study reported that instructors have a general awareness of using student-centered
strategies. Student activities are regarded as essential in instructional practices.
Nevertheless, this study also shows that, in the given context, most classroom
interactions are in the form of student–content and student-teacher interactions whereas
student–student interactions remain limited. In practice, a generally low level of SCL can be
concluded, according to the PIPS instrument (Walter et al., 2016) and the definition of SCL in
previous studies (Brook, 1999; Rogers, 2002; Weimer, 2002). Student interaction with the
content and instructor may be directly related to the common concept of instruction and
may reflect a lecture-centered pedagogic approach. This finding is in line with the report
from a previous study showing that instructors in Qatar tend to focus on content delivered
through lectures as an efficient way of teaching (Al-Thani et al., 2016). Previous studies
(Borrego et al., 2013; Henderson and Dancy, 2009; Walter et al., 2016) also report that the
levels of implementing instructional practices vary according to different aspects; for
example, STEM faculties reported limited use of certain strategies such as group work and
solving problems collaboratively in daily practice despite their high level of knowledge and
awareness. Instructor’s lack of professional vision on collaborative group work can lead to
their lack of practice (Modell and Modell, 2017). An often-reported reason is that instructors
give priority to content delivery due to limited class time (Hora and Ferrare, 2014; Walter
et al., 2016). Another explanation may be instructors’ lack of confidence in letting students
take full responsibility for organizing their own learning activities outside of instructors’
control (Du and Kirkebæk, 2012).
Student–student interaction received relatively less attention and consideration from the
participants in this study. Previous studies have found that the length of classes and class
size were often the most important barrier for the implementation of student-centered
instructional practices (Froyd et al., 2013). In the context of this study, this may be one of
the factors limiting the possibility of using student interaction in the classroom. In the
undergraduate programs, the length of classes is 1 h and 15 minutes, which is counted as a
two-study-hour class. This limits instructors’ confidence in their ability to deliver heavy
curriculum content while also providing opportunities to engage students with interactive
activities. Another possible reason is the bias of the instructors’ knowledge regarding SCL
strategies; some instructors believe it is sufficient to deliver SCL by simply asking students
to do something that is different from lecturing (Paris and Combs, 2006; Shu-Hui and Smith,
2008). Linking the results in the aspect of the instructors’ definition of SCL to their perceived
roles of teaching, as the participants described in interviews, the instructors also lack the
belief that interactive student activities can lead to actual learning. Participants consider it
important that instructors maintain control of classroom activities. For example, Borrego
et al. (2013) found a strong correlation between instructors’ beliefs regarding problem
solving and the time students spent on collaborative activities, such as discussing problems.
6.3 Unaligned assessment
Although the participants demonstrated an awareness of SCL in general and willingness to
implement certain SCL strategies, they reported limited critical reflection on assessment
systems in the given context. Their limited understanding and practice of formative
assessment is an impediment to the effectiveness of practicing SCL by aligning instruction
and assessment. Instructional innovation demands changes not only in classroom practices
but also, more importantly, in assessment methods. Williams et al. (2015) noted that
formative assessment is a factor that is often ignored or forgotten, even by many of the
researchers who have developed instruments to describe instructional practices. This study
similarly found that the summative-oriented prevailing assessment methods at the
university level remain unchallenged by the instructors. This may be due to their lack of
knowledge and experience of formative assessment, or due at least in part to the
convenience of using what they are asked to as well as what they are accustomed. Changing
teaching methods without a constructive alignment with assessment methods will limit the
effectiveness of any instructional innovation (Biggs and Tang, 2011).
6.4 Factors that make a difference
Previous studies (Dancy and Henderson, 2007, 2010; Froyd et al., 2013; Henderson
and Dancy, 2009; Henderson et al., 2012) have reported that a faculty member’s use of
student-centered strategies is often related to demographic factors such as gender, academic
rank and years of teaching. The results of this study only identified a correlation between
instructional practices and gender. In contrast to the findings of previous studies, namely,
that female instructors tend to use student-centered methods more often than male
instructors and that younger instructors tend to show more interest in adopting new
pedagogical initiatives, quantitative data of this study found that male participants reported
higher levels of employing student-centered approaches than female participants, but found
no patterns regarding academic rank and years of teaching. A major reason may be the
small number of participants in this study. A possible reason for the gender difference may
be the imbalanced gender ratio among the overall participants in this study (the proportion
of female participants was 23.4 percent). Nevertheless, qualitative data did not identify any
patterns due to gender and academic rank, but rather, identified a connection between the
instructor’s prior experience with SCL and their understanding, perception and practices, as
previously discussed.
6.5 Challenges
Two categories of instructor concerns and barriers to their sustainable use of instructional
innovation were identified. Students’ lack of maturity, motivation and responsibility was
Student
centered
learning
in Qatar
527
JARHE
10,4
528
considered the major challenge by most of the interviewed participants, except for those
who had experienced SCL as a student. Regarding students as the source of the problem and
blaming students for their own poor performance can be seen as another symptom to be
associated with a lecturer-centered approach.
Another major challenge is institutional constraints such as the insufficiency of
classroom time. Instructors tend to have different opinions regarding the amount of time it
takes to include interactive student–student activities. Large class size is often a barrier for
instructors hoping to use interactive student–student activities. Female faculty members
and younger faculty members are found to have a higher rate of innovative instruction use
and continuation.
6.6 Recommendations
As previous studies (Froyd et al., 2013) have suggested, when an instructional strategy is
adopted at a low level, it means that it is either not mature or will never achieve full
adoption. Institutionalized faculty development and support are essential for the further
implementation of innovative instructional strategies and the persistence and continuation
of the implementation, as Dancy and Henderson (2007) pointed out, while institutional
barriers can limit instructional innovations when structures have been set up to function
well with traditional instruction. The following list of recommendations is provided as
inspiration for institutional support and faculty development activities:
First, faculty members need to develop a deep understanding of SCL through
experiences as learners so that they can become true believers and implementers.
Second, autonomy is needed for faculty to adopt appropriate assessment methods
that are aligned with their pedagogical objectives and delivery methods. Input on
how faculty can adapt instructional innovation to tailor it to the local context is very
important for its long-term effectiveness (Hora and Ferrare, 2014).
Third, an inclusive approach to faculty evaluation by encouraging faculty from
STEM backgrounds to be engaged in research on their instructional practice will not
only sustain the practice of innovative pedagogy but will also enrich the research
profiles of STEM faculty and their institutes.
7. Conclusion
This study examined university STEM instructors’ understanding and perceptions of SCL
as well as their self-reported current practices. Results of the study provide insights on how
institutional strategies of instructional change are continually practiced. The study
identified a lack of alignment between instructors’ perceptions and their actual practices of
SCL. Despite agreement on perceiving SCL as an effective teaching strategy, the instructors’
actual practices prioritize content delivery, the teachers’ role in classroom control, and
defining student learning activities as well as summative assessment. Student–student
interactions and formative assessment are limited. The participants tended to blame the
limited use of SCL on the lack of motivation and readiness among students and on
institutional constraints. Another perspective to explain this gap may be the diverse yet
inclusive definitions of SCL espoused by faculty, which tend to legitimate their practices,
reflecting a rather low level of implementation compared to the literature. This study also
suggests that faculty’s understanding and perceptions of implementing student-centered
approaches were closely linked to their prior experiences – experiencing SCL as a learner
may better shape the understanding and guide the practice of SCL as an instructor.
Thereafter, recommendations are provided for faculty development activities at an
institutional level for sustainable instructional innovation.
The study has a few limitations. First, regarding methodological justification, the data
methods chosen in this study were mainly focused on the faculty’s self-reporting.
Although such methods are frequently employed for studying faculty beliefs, perceptions
and instructional practices (Borrego et al., 2013), data sources from other sources, such as
observation, may offer information from new perspectives for instructional development
(Henry et al., 2007). Second, the limited number of participants restricts this study’s
generalizability because the survey was administered on a volunteer-based manner and
the limited number of interview participants makes it difficult to establish clear patterns.
Third, researching faculty members raises concerns in the given context, wherein
extensive faculty assessments are regularly conducted. Although special considerations
regarding ethical concerns were taken in this study – for example, participants were
provided with a clear explanation of the goals and consequences of the study and
were shown that it had no relation to the university’s annual faculty performance
assessment – the potential sensitivity may have caused a certain amount of reservation
among participants regarding sharing further information; this may have limited the
results of the study.
In conclusion, the results reported in this paper provide a first impression of the present
instructional practices in the STEM field in the context of Qatar. Findings of the study,
although limited to the given context, may have implications for other countries in the Gulf
Region and Arabic speaking contexts, and potentially an even broader contexts, since
instructional change toward SCL in STEM classrooms remains a general challenge
worldwide (Hora and Ferrare, 2014; Froyd et al., 2013). The results imply that more attention
should be given to faculty development programs to enhance instructor awareness,
knowledge and skills related student–student interaction and formative assessment. This
study contributes to further instructional change implementation by introducing a roadmap
toward change on broader levels, such as strategies of institutional change for instructional
innovation, as well as toward the establishment of a research-based and evidence-based
approach to faculty development and institutional change.
References
Al-Thani, A.M., Al-Meghaissib, L.A.A.A. and Nosair, M.R.A.A. (2016), “Faculty members’ views of
effective teaching: a case study of Qatar University”, European Journal of Education Studies,
Vol. 2 No. 8, pp. 109-139.
American Association for the Advancement of Science (AAAS) (2013), “Describing and measuring
STEM teaching practices: a report from a national meeting on the measurement of
undergraduate science, technology, engineering, and mathematics (STEM) teaching”, American
Association for the Advancement of Science, Washington, DC, available at: http://ccliconference.
org/files/2013/11/Measuring-STEM-Teaching-Practices.pdf (accessed November 15, 2006).
Anderson, R.D. (2002), “Reforming science teaching: what research says about inquiry”, Journal of
Science Teacher Education, Vol. 13 No. 1, pp. 1-12.
Attard, A., Di Loio, E., Geven, K. and Santa, R. (2010), Student Centered Learning: An Insight into
Theory and Practice, Partos Timisoara, Bucharest.
Barr, R.B. and Tagg, J. (1995), “From teaching to learning: a new paradigm for undergraduate
education”, Change: The Magazine of Higher Learning, Vol. 27 No. 6, pp. 12-26.
Biggs, J.B. and Tang, C. (2011), Teaching for Quality Learning at University: What the Student Does,
McGraw-Hill Education, Berkshire.
Bilgin, I., Karakuyu, Y. and Ay, Y. (2015), “The effects of project-based learning on undergraduate
students’ achievement and self-efficacy beliefs towards science teaching”, Eurasia Journal of
Mathematics, Science & Technology Education, Vol. 11 No. 3, pp. 469-477.
Student
centered
learning
in Qatar
529
JARHE
10,4
Black, P. and William, D. (1998), “Assessment and classroom learning”, Assessment in Education:
Principles, Policy & Practice, Vol. 5 No. 1, pp. 7-74.
530
Brawner, C.E., Felder, R.M., Allen, R. and Brent, R. (2002), “A survey of faculty teaching practices and
involvement in faculty development activities”, Journal of Engineering Education, Vol. 91 No. 4,
p. 393.
Borrego, M., Froyd, J.E., Henderson, C., Cutler, S. and Prince, M. (2013), “Influence of engineering
instructors’ teaching and learning beliefs on pedagogies in engineering science courses”,
International Journal of Engineering Education, Vol. 29 No. 6, pp. 1456-1471.
Brook, J.G. (1999), In Search of Understanding: The Case for Constructivist Classrooms, Association for
Supervision & Curriculum Development, Alexandria.
Cornelius-White, J. (2007), “Learner-centered teacher-student relationships are effective: a metaanalysis”, Review of Educational Research, Vol. 77 No. 1, pp. 113-143.
Creswell, J.W. (2002), Educational Research: Planning, Conducting, and Evaluating Quantitative and
Qualitative Research, Pearson Education, Upper Saddle River, NJ.
Creswell, J.W. (2013), Qualitative Inquiry and Research Design: Choosing among Five Approaches, Sage.
Curtis, R. and Ventura-Medina, E. (2008), An Enquiry-Based Chemical Engineering Design Project for
First-Year Students, University of Manchester, Centre for Excellence in Enquiry-Based
Learning, Manchester.
Dagher, Z. and BouJaoude, S. (2011), “Science education in Arab states: bright future or status quo?”,
Studies in Science Education, Vol. 47, pp. 73-101.
Dancy, M. and Henderson, C. (2007), “Framework for articulating instructional practices and
conceptions”, Physical Review Special Topics: Physics Education Research, Vol. 3 No. 1, pp. 1-12.
Dancy, M. and Henderson, C. (2010), “Pedagogical practices and instructional change of physics
faculty”, American Journal of Physics, Physics, Vol. 78 No. 10, pp. 1056-1063.
Dewey, J. (1938), Experience and Education, Collier and Kappa Delta Phi, New York, NY.
Downey, G.L., Lucena, J.C., Moskal, B.M., Parkhurst, R., Bigley, T., Hays, C. and Lehr, J.L. (2006), “The
globally competent engineer: working effectively with people who define problems differently”,
Journal of Engineering Education, Vol. 95 No. 2, pp. 107-122.
Du, X.Y. and Kirkebæk, M.J. (2012), “Contextualizing task-based PBL”, Exploring Task-Based PBL in
Chinese Teaching and Learning, pp. 172-185.
Du, X.Y., Su, L. and Liu, J. (2013), “Developing sustainability curricula using the PBL method in a
Chinese context”, Journal of Cleaner Production, Vol. 61 No. 15, pp. 80-88.
Duran, M. and Dökme, İ. (2016), “The effect of the inquiry-based learning approach on students’ criticalthinking skills”, Eurasia Journal of Mathematics, Science & Technology Education, Vol. 12 No. 12.
Ejiwale, J.A. (2012), “Facilitating teaching and learning across STEM fields”, Journal of STEM
Education: Innovations and Research, Vol. 13 No. 3, pp. 87-94.
Felder, R.M., Woods, D.R., Stice, J.E. and Rugarcia, A. (2000), “The future of engineering education II.
Teaching methods that work”, Chemical Engineering Education, Vol. 34 No. 1, pp. 26-39.
Freeman, S., Eddy, S.L., McDonough, M., Smith, M.K., Okoroafor, N., Jordt, H. and Wenderoth, M.P.
(2014), “Active learning increases student performance in science, engineering, and
mathematics”, Proceedings of the National Academy of Sciences, Vol. 111 No. 23, pp. 8410-8415.
Froyd, J., Borrego, M., Cutler, S., Henderson, C. and Prince, M. (2013), “Estimates of use of
research-based instructional strategies in core electrical or computer engineering courses”, IEEE
Transactions on Education, Vol. 56 No. 4, pp. 393-399.
General Secretariat for Development Planning (2008), Qatar National Vision 2030, General Secretariat
for Development Planning, Doha, available at: http://qatarus.com/documents/qatar-nationalvision-2030/ (accessed November 15, 2016).
Graham, M.J., Frederick, J., Byars-Winston, A., Hunter, A.B. and Handelsman, J. (2013), “Increasing
persistence of college students in STEM”, Science, Vol. 341 No. 6153, pp. 1455-1456.
He, Y., Du, X., Toft, E., Zhang, X., Qu, B., Shi, J. and Zhang, H. (2017), “A comparison between the
effectiveness of PBL and LBL on improving problem-solving abilities of medical students using
questioning”, Innovations in Education and Teaching International, Vol. 55 No. 1, pp. 44-54,
available at: https://doi.org/10.1080/14703297.2017.1290539
Henderson, C. and Dancy, M. (2009), “The impact of physics education research on the teaching of
introductory quantitative physics in the United States”, Physical Review Special Topics: Physics
Education Research, Vol. 5 No. 2, pp. 1-15.
Henderson, C., Beach, A. and Finkelstein, N. (2011), “Facilitating change in undergraduate STEM
instructional practices: an analytic review of the literature”, Journal of Research in Science
Teaching, Vol. 48 No. 8, pp. 952-984.
Henderson, C., Dancy, M. and Niewiadomska-Bugaj, M. (2012), “The use of research-based instructional
strategies in introductory physics: where do faculty leave the innovation-decision process?”,
Physical Review Special Topics – Physics Education Research, Vol. 8 No. 2, pp. 1-9.
Henderson, C., Finkelstein, N. and Beach, A. (2010), “Beyond dissemination in college science teaching:
an introduction to four core change strategies”, Journal of College Science Teaching, Vol. 39 No. 5,
pp. 18-25.
Henry, M.A., Murray, K.S. and Phillips, K.A. (2007), Meeting the Challenge of STEM Classroom
Observation in Evaluating Teacher Development Projects: A Comparison of Two Widely Used
Instruments, Henry Consulting, St Louis, MA.
Hora, M.T. and Ferrare, J.J. (2014), “Remeasuring postsecondary teaching: how singular categories of
instruction obscure the multiple dimensions of classroom practice”, Journal of College Science
Teaching, Vol. 43 No. 3, pp. 36-41.
Hora, M.T., Oleson, A. and Ferrare, J.J. (2012), Teaching Dimensions Observation Protocol (TDOP)
User’s Manual, Wisconsin Center for Education Research, University of Wisconsin–Madison,
Madison, WI.
Justice, C., Rice, J., Roy, D., Hudspith, B. and Jenkins, H. (2009), “Inquiry-based learning in higher
education: administrators’ perspectives on integrating inquiry pedagogy into the curriculum”,
Higher Education, Vol. 58 No. 6, pp. 841-855.
Kember, D. (1997), “A reconceptualisation of the research into university academics’ conceptions of
teaching”, Learning and Instruction, Vol. 7 No. 3, pp. 255-275.
Ketpichainarong, W., Panijpan, B. and Ruenwongsa, P. (2010), “Enhanced learning of biotechnology
students by an inquiry-based cellulose laboratory”, International Journal of Environmental &
Science Education, Vol. 5 No. 2, pp. 169-187.
Kolmos, A., Du, X.Y., Dahms, M. and Qvist, P. (2008), “Staff development for change to problem-based
learning”, International Journal of Engineering Education, Vol. 24 No. 4, pp. 772-782.
Kvale, S. and Brinkmann, S. (2009), Interviews: Learning the Craft of Qualitative Research, SAGE,
Thousand Oaks, CA.
Lehmann, M., Christensen, P., Du, X. and Thrane, M. (2008), “Problem-oriented and project-based
learning (POPBL) as an innovative learning strategy for sustainable development in engineering
education”, European Journal of Engineering Education, Vol. 33 No. 3, pp. 283-295.
Martin, T., Rivale, S.D. and Diller, K.R. (2007), “Comparison of student learning in challenge-based and
traditional instruction in biomedical engineering”, Annals of Biomedical Engineering, Vol. 35
No. 8, pp. 1312-1323.
Modell, M.G. and Modell, M.G. (2017), “Instructors’ professional vision for collaborative learning
groups”, Journal of Applied Research in Higher Education, Vol. 9 No. 3, pp. 346-362.
NEA (2010), “Preparing 21st Century students for a global society: an educator’s guide to ‘the four Cs’ ”,
National Education Association, Washington, DC, available at: www.nea.org/tools/52217
(accessed December 20, 2017).
Nicol, D.J. and Macfarlane-Dick, D. (2006), “Formative assessment and self-regulated learning: a model
and seven principles of good feedback practice”, Studies in Higher Education, Vol. 31 No. 2,
pp. 199-218.
Student
centered
learning
in Qatar
531
JARHE
10,4
Paris, C. and Combs, B. (2006), “Lived meanings: what teachers mean when they say they are learnercentered”, Teachers & Teaching: Theory and Practice, Vol. 12 No. 5, pp. 571-592.
Piburn, M., Sawada, D., Falconer, K., Turley, J., Benford, R. and Bloom, I. (2000), Reformed Teaching
Observation Protocol (RTOP), Arizona Collaborative for Excellence in the Preparation of
Teachers, Tempe.
532
Prince, M.J. and Felder, R.M. (2006), “Inductive teaching and learning methods: definitions,
comparisons, and research bases”, Journal of Engineering Education, Vol. 95 No. 2, pp. 123-138.
Qatar University (QU) (2012), “Qatar university strategic plan 2013–2016”, available at: www.qu.edu.
qa/static_file/qu/About/documents/qu-strategic-plan-2013-2016-en.pdf (accessed June 10, 2017).
Rogers, A. (2002), Teaching Adults, 3rd ed., Open University Press, Philadelphia, PA.
Rubin, A. (2012), “Higher education reform in the Arab world: the model of Qatar”, available at: www.
mei.edu/content/higher-education-reform-arab-world-model-qatar (accessed December 15, 2016).
Scott, L.C. (2015), “The futures of learning 2: what kind of learning for the 21st century?”, UNESCO
Educational Research and Foresight Working Papers, available at: http://unesdoc.unesco.org/
images/0024/002429/242996E.pdf (accessed December 22, 2017).
Seymour, E. and Hewitt, N.M. (1997), Talking About Leaving: Why Undergraduates Leave the Sciences,
Westview, Boulder, CO.
Shu-Hui, H.C. and Smith, R.A. (2008), “Effectiveness of interaction in a learner-centered paradigm
distance education class based on student satisfaction”, Journal of Research on Technology in
Education, Vol. 40 No. 4, pp. 407-426.
Simsek, P. and Kabapinar, F. (2010), “The effects of inquiry-based learning on elementary students’
conceptual understanding of matter, scientific process skills and science attitudes”, ProcediaSocial and Behavioral Sciences, Vol. 2 No. 2, pp. 1190-1194.
Slavich, G.M. and Zimbardo, P.G. (2012), “Transformational teaching: theoretical underpinnings, basic
principles, and core methods”, Educational Psychology Review, Vol. 24 No. 4, pp. 569-608.
Smith, K.A., Douglas, T.C. and Cox, M. (2009), “Supportive teaching and learning strategies in STEM
education”, in Baldwin, R. (Ed.), Improving the Climate for Undergraduate Teaching in STEM
Fields. New Directions for Teaching and Learning, Vol. 117, Jossey-Bass, San Francisco, CA,
pp. 19-32.
Smith, M.K., Vinson, E.L., Smith, J.A., Lewin, J.D. and Stetzer, K.R. (2014), “A campus-wide study of
STEM courses: new perspectives on teaching practices and perceptions”, CBE Life Sciences
Education, Vol. 13, pp. 624-635.
Springer, L., Stanne, M.E. and Donovan, S.S. (1999), “Effects of small-group learning on
undergraduates in science, mathematics, engineering, and technology: a meta-analysis”,
Review of Educational Research, Vol. 69 No. 1, pp. 21-51.
Steinemann, A. (2003), “Implementing sustainable development through problem-based learning:
pedagogy and practice”, Journal of Professional Issues in Engineering Education and Practice,
Vol. 129 No. 4, pp. 216-224.
Walczyk, J.J. and Ramsey, L.L. (2003), “Use of learner-centered instruction in college science and
mathematics classrooms”, Journal of Research in Science Teaching, Vol. 40 No. 6, pp. 566-584.
Walter, E.M., Beach, A.L., Henderson, C. and Williams, C.T. (2015), “Measuring postsecondary teaching
practices and departmental climate: the development of two new surveys”, in Burgess, D.,
Childress, A.L. and Slakey, L. (Eds), Transforming Institutions: Undergraduate STEM in the 21st
Century, G.C. Weaver, W. Purdue, IN, Purdue University Press, West Lafayette, IN, pp. 411-428.
Walter, E.M., Henderson, C.R., Beach, A.L. and Williams, C.T. (2016), “Introducing the Postsecondary
Instructional Practices Survey (PIPS): a concise, interdisciplinary, and easy-to-score survey”,
CBE – Life Sciences Education, Vol. 15 No. 4, pp. 1-11.
Watkins, J. and Mazur, E. (2013), “Retaining students in science, technology, engineering, and
mathematics (STEM) majors”, Journal of College Science Teaching, Vol. 42 No. 5, pp. 36-41.
Weimer, M. (2002), Learner-Centered Teaching: Five Key Changes to Practice, Jossey-Bass,
San Francisco, CA.
Williams, C.T., Walter, E.M., Henderson, C. and Beach, A.L. (2015), “Describing undergraduate STEM
teaching practices: a comparison of instructor self-report instruments”, International Journal of
STEM Education, Vol. 2 No. 18, pp. 1-14, doi: 10.1186/s40594-015-0031-y.
Zhao, K., Zhang, J. and Du, X. (2017), “Chinese business students’ changes in beliefs and strategy use in
a constructively aligned PBL course”, Teaching in Higher Education, Vol. 22 No. 7, pp. 785-804,
doi: 10.1080/13562517.2017.1301908.
Appendix
Interview guidelines
(1) How do you understand/define SCL? What are important characteristics of SCL in your
opinion?
(2) What are your past experiences of using SCL?
(3) How do you see the role of instructor in an SCL environment, and in which ways is this role
descriptive of your current practice?
(4) What are your preferred assessment methods within your current teaching practices and
why?
(5) What should be the ideal assessment methods in an SCL environment?
(6) What are the challenges of practicing SCL in your current environment?
(7) In your opinion, what institutional supports are needed to implement SCL in Qatar?
Corresponding author
Saed Sabah can be contacted at: ssabah@qu.edu.qa
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
Student
centered
learning
in Qatar
533
Enhancing Quality of Teaching in the Built Environment
Higher Education, UK
Muhandiramge Kasun Samadhi Gomis
School of Architecture and Built Environment, University of Wolverhampton,
Wolverhampton, UK,
Mandeep Saini
School of Architecture and Built Environment, University of Wolverhampton,
Wolverhampton, UK,
Chaminda Pathirage
School of Architecture and Built Environment, University of Wolverhampton,
Wolverhampton, UK,
Mohammed Arif
Architecture, Technology and Engineering, University of Brighton, Brighton, UK
1
Abstract
Purpose – The issues in the current Built Environment Higher Education (BEHE) curricula
recognise a critical need for enhancing the quality of teaching. This paper aims to identify the
need for a best practice in teaching within Built Environment Higher Education (BEHE)
curricula and recommend a set of drivers to enhance the current teaching practices in the Built
Environment (BE) education. The study focused on section one of the National Student Survey
(NSS) – Teaching on my course; with a core focus on improving student satisfaction, making
the subject interesting, creating an intellectually stimulating environment, and challenging
learners.
Methodology- The research method used in this study is the mixed method, 1.) A document
analysis consisting of feedback from undergraduate students, and 2.) A closed-ended
questionnaire to the academics in the BEHE context. More than 375 student feedback were
analysed to understand the teaching practices in BE and fed forward to developing the closedended questionnaire for 23 academics, including a Head of school, a Principal lecturer, Subject
leads and lecturers. The data was collected from Architecture, Construction Management, Civil
Engineering, Quantity Surveying, and Building surveying disciplines representing BE context.
The data obtained from both instruments were analysed with content analysis to develop 24
drivers to enhance quality of teaching. These drivers were then modelled using the Interpretive
Structural Modelling (ISM) method to identify their correlation and criticality to NSS section
one themes.
Findings – The study revealed 10 independent, 11 dependent and 3 autonomous drivers,
facilitating the best teaching practices in BEHE. The study further recommends that the drivers
be implemented as illustrated in the level partitioning diagrams under each NSS section one to
enhance the quality of teaching in BEHE.
Practical implications: The recommended set of drivers and the level partitioning can be set
as a guideline for academics and other academic institutions to enhance quality of teaching.
This could be further used to improve student satisfaction and overall NSS results to increase
the rankings of academic institutions.
Originality/Value: New knowledge can be recognised with the ISM analysis and level
partitioning diagrams of the recommended drivers to assist academics and academic
institutions in developing quality of teaching.
Keywords – Enhancing Teaching Quality, Built Environment Higher Education, Learning in
post-COVID, National Student Survey (NSS), Teaching on my course.
2
Introduction
The United Kingdom’s Higher Education (HE) sector is focused on improving the
quality of teaching (Santos et al., 2020; Tsiligiris and Hill, 2019; Matthews and Kotzee, 2019).
HE providers continuously attempt to enhance learning standards by assuring teaching
developments within courses. Hence, knowledge providers make considerable efforts to
develop pedagogy within BE academia (Van Schaik et al., 2019). However, developing
teaching within a discipline-specific is challenging (Ovbiagbonhia et al., 2020; McKnight et
al., 2016). Moreover, Tsiligiris and Hill (2019) and Welzant (2015) noticed an eminent
knowledge gap in enhancing quality within the current HE curricula. The global COVID
pandemic has exacerbated the challenges related to teaching and learning within higher
education (Allen et al., 2020). Both the learners and academics face challenges in maintaining
quality in HE, especially within the current focus on digitised and Virtual Learning
Environment (VLE) teaching (Arora and Srinivasan, 2020; Bao, 2020). This study explores
best practices to improve the quality of teaching across the Built Environment Higher
Education (BEHE). Thus, the study investigates section one in NSS questions, namely “The
Teaching on my Course”. The main emphasis is given to four central themes within the NSS
section one of the questionnaires reflects on whether “the staff is good at explaining things”,
“made the subject interesting”, how “the course is intellectually stimulating”, and “how the
course has challenged students to achieve the best work”. Many contemporary learning and
teaching strategies are present in curriculum development (Tsiligiris and Hill, 2019). However,
a significant knowledge gap is present in identifying the best use of each theme under NSS
Section one and developing a best practice to enhance quality of teaching. The data obtained
by section one of NSS in 2019, 2020 and 2021 highlights the need to enhance teaching in the
BE curricula. NSS records that the satisfaction level has reduced by 6% in the average
minimum scoring criteria of “teaching in my course” in 2021 (Office for Students, 2020). It
further identifies that the average percentile of NSS section one of 2021 was 84% for all
subjects, whereas BE scored only 79%. This score provides insight into how BE performs
compared to other subjects within the UK's HE context. Issues in teaching and the COVID
pandemic may have influenced the significant reduction in NSS score (Arora and Srinivasan,
2020; Allen et al., 2020). Therefore, this study aims to identify best practices and enhance the
quality of teaching in BEHE.
3
1.0 Literature Review
1.1 Explaining the subject
Increasing understanding in an area of expertise is vital in providing pedagogical
education. Literature (Ferguson, 2012) suggests that teaching helps identify cognition within
human behaviour and gain insight into relevant information while relating to exposure and
experience within the subject area within various levels of learning. The levels of learning lead
to further considerations of self-academic development in students. Findings from Gollub
(2002) suggest that a better understanding of learning is facilitated around the concepts and
principles of the subject matter. Moreover, Andersson et al. (2013) highlight that the students
tend to generate more knowledge by acquiring prerequisite knowledge and utilising them to
increase their understanding of the subject. In addition, Andersson et al. (2013) suggest that
learners embrace prior learning to understand interactive learning better. However, in a
classroom context, the multi-disciplinary orientation of BE makes it challenging to address
prerequisite knowledge and provide in-depth understanding to learners (Waheed et al., 2020;
Dieh et al., 2015). Thus, BE academics need to devise module delivery aligning to the subject
area while reflecting previous knowledge in enhancing knowledge.
Moreover, Lai (2011) and McKnight et al. (2016) stressed the importance of interactive
learning within pedagogical education. These studies highlight that learners find that
knowledge is constructive when peer-reviewed; thus, providing a better environment to
embrace enhanced knowledge of BE understanding is essential. Moreover, Guo and Shi (2014)
further explain the uses of collaboration which increases understanding using active strategies.
However, Guo and Shi (2014) overlook that innovation embedded in learning effectively
brings collaboration and utilises modern approaches within the classroom context.
Furthermore, the current pandemic has encouraged active strategies such as blended learning
and digitised technologies (Allen et al., 2020) within a VLE. However, challenges were
identified in the definitive use of VLE, which did not advocate sub-teaching concepts such as
interactive learning and context-based knowledge (Waheed et al., 2020). The "silent"
classrooms are not appropriate for the transfer and sharing of technical knowledge in BEHE.
Ultimately, the prospect of teaching signifies innovative approaches and the extent of using
VLEs (Virtual Learning Environment)’ to make a subject interesting to foster interactive
learning through the co-creation of knowledge to promote a clear explanation of a BE subject.
1.2 Making the Subject Interesting
The students do not engage in situations where they will no longer see value or interest
in the content taught (Fraser 2019). Lozano et al. (2012) state that analytical competency is
achieved using theory taught relevant to industrial capacity, creating a platform for students to
participate in learner engagement. Both Fraser (2019) and Lozano et al. (2012) suggest that
collaboration between academics and learner is significant to active learner engagement in
developing interest in subjects learnt. Therefore, engagement and collaboration are considered
the most critical challenges in an active learning environment (Hue and Li 2008 and Scott
2020). However, a knowledge gap exists in measuring collaboration that shows competitive
learning and the cooperation of learners with the academic. The social, psychological, and
academic characteristics build learners’ perception of collaborative work (Uchiyama and
Radin, 2008). Out of the above, Hmelo-Silver et al. (2008) established the importance of the
social entity of collaboration. That suggests associating the benefits of social support by
establishing a positive atmosphere within collaborative learning. Also, implementing a
collaborative approach to learning enhances diversity within the BEHE.
4
Furthermore, engagement benefits the learners' psychological aspects, reflecting on
academic performance and mental well-being (Clough and Strycharczyk 2012). It signifies
student-centric education reflecting on the psychological characteristic of developing students'
self-esteem, thus increasing interest in the subject. Secondary elements in BE teaching, such
as site visits, guest lectures and other innovative concepts, could be denoted as examples (Van
Schaik, 2019). Although Clough and Strycharczyk (2012) consider psychological
characteristics, the study does not signify the prominence of critical thinking obtained from
collaboration. Bye, et al. (2007) imply that critical thinking is needed to make content more
meaningful and collaborative. However, collaborative teaching methods have been limited in
considering teaching during the COVID pandemic (Blundell et al., 2020); thus, more research
is needed to identify the means of developing learner engagement in VLEs. In addition, this
study identifies learner engagement and fostering collaboration demands stimulating learners
and making the subject interesting. However, a significant knowledge gap exists in addressing
the findings to make a subject interesting in the current BEHE context.
1.3 Intellectual stimulation of learners
Studies identify that learners become stimulated when the subject is interesting and
motivated to overcome the challenging nature of the course structure (Bolkan and Goodboy,
2010). Moreover, student motivation and intellectual stimulation increase when subject matter
reflects learner interests (Baeten et al., 2010). Furthermore, intellectual stimulation improves
when academics provide authentic, current, industry-related practices relevant to learners’
academic learning. Bolkan et al. (2011) suggested implementing active learning to enhance
learners’ intellectual effort. Thus, intellectual stimulation needs to be integrated through
problem-solving teaching methods, context-based learning, realistic case studies and setting
clear expectations and motivation for student excellence.
Chickering and Gamson (1999) suggested that summarising ideas, reviewing problems,
assessing the level of understanding and concluding on learning outcomes at the end of a
learning session stimulates learners. Furthermore, Tirrell and Quick (2012) outlined
opportunities to direct learners by contrasting fundamental theories and applying theory to real
life. However, the researchers overlook the fact that stimulation could be provided outside the
learning environment. The current practice within BE academia involves guest lectures and
arranging site visits to explain the classroom bandwidth and stimulate learners (Chen and Yang,
2019). Furthermore, Educational Development Association (2013) signifies the influence of
Professional Standards and Regulatory Bodies (PSRBs) within BE learning. The use of PSRBs
deems the guarantee in using industry-appropriate knowledge delivered. In addition to making
the subject interesting, PSRBs would further stimulate the learner to develop academic skills
and competencies. Thus, the learners tend to foresee the industry-standard reflecting the
theories, advocating intellectual stimulation. Nonetheless, these strategies are disrupted by the
COVID pandemic's current measures for virtual module delivery (Allen et al., 2020). Thus, the
use of the strategies was to be integrated into digitisation platforms and integrated with the
VLE teaching methods. In contrast, a measure of best practice is eminent in contemplating
using VLE platforms' strategies in addressing the COVID situation and further development in
the BEHE curriculum.
The stimulation provided at the elementary level in BE learning is vital for interaction
between the learners (Jabar and Albion 2016). The collaboration between learners and
knowledge providers is vital for intellectual stimulation, and the use of concepts such as VLE
further promotes stimulation (Block, 2018; Marshalsey and Madeleine, 2018). However,
identifying fundamental digitisation approaches and innovative teaching methods such as
5
blended learning or flipped classroom will signify the commitment toward stimulated learning.
Stimulation through quizzes and experimental studies will improve the clarity of knowledge
provided through VLE. In addition, stimulation in a VLE through various digital learning
strategies for students can promote challenging learners. However, some views on the current
teaching practices in the COVID era denote that VLE is not the perfect solution for academic
development (Bao, 2020). Academics need to know to what extent VLE should be integrated
and how the best practice in BE teaching should be developed.
1.4 Challenging Learners
Knowledge providers who promote intellectual stimulation create a challenging
learning environment that empowers the learners and promotes cognitive and affective learning
(Bolkan and Goodboy, 2010). Kohn Rådberg et al. (2018) discuss that intellectual stimulation
depends on the intrinsic motivation to be challenged in critical learning contexts. Thus, the
learners require encouragement in identifying intellectual stimulus in acknowledging the
knowledge gained in HE curricula. Altomonte et al. (2016) explain how learners persist in their
learning process much longer in a challenging environment than in a traditional learning
environment. A plethora of more contemporary literature (Avargil et al., 2011; Chen and Yang,
2019) addresses specific learning strategies such as project-based and context-based learning,
which acts as a stimulus in developing challenging environments in the current BE learning
context.
A study carried out by Han and Ellis (2019) has detailed revelation on in-depth learning
approaches to learning and 'higher learning outcomes'. However, it fails to identify the
relationship between challenging learners and their impact on academic and cognitive learning
strategies. Learners often respond more to challenges made via competitive elements such as
quizzes, polls, and other simpler assessments in module delivery (Chen and Yang, 2019). It is
vital to understand that a challenging learning environment is not a mere self-testing method
for assessment in curricula but rather an instrument for continuous academic improvement
(Darling-Hammond et al., 2019). Further, learners will benefit from self-preparing concerning
the knowledge content discussed in the classroom. It further influences advanced knowledge
gained through research rather than knowledge transmission provided in the classroom.
Challenging learners create more opportunities to collaborate and increase intellectual
stimulation (Boud et al., 2018; Gomis et al., 2021). However, Boud overlooks counter
motivation created by learners in challenging, which results in innovation. Furthermore,
challenging students could be identified to apprehend stimulation and provide informative
judgment on their academic experience. By challenging the learner, the academic could
evaluate the aptitude and growth (Hamari et al., 2016). The current practice in academia during
the COVID pandemic deemed the use of VLE in setting out quizzes and other evaluation
methods to stimulate and challenge learners (Block, 2018; Bao, 2020). Hence, using digitised
platforms in an active learning environment is paramount in advancing teaching in BE.
However, these VLE instruments could be further integrated with the module delivery plan to
optimise challenging learners and enhance academic development.
2.0 Methodology
2.1 Participants & Materials
‘Teaching on my course’ of the NSS questionnaire emphasises four questions related
to ‘explain things, make the subject interesting, create an intellectually stimulating
environment, and challenge the learners’. Documental analysis and questionnaire surveys with
6
separate samples were identified as the potential research tools optimal for the study. Document
analysis is adopted to analyse a sample of 375 Mid-Module Reviews (MMR) from the students
from level three to level six in contemplating the finding from literature focusing on the four
questions in NSS section one. The documental data were categorised into themes where
students identified how the teaching helped them establish the key elements that were positive
about the module. This analysis uses 375 samples, assuming the confidence level of 95% and
the margin of error at 5%.
The themes identified from the documental analysis were used to identify and develop
the survey framework and questionnaire conducted for the academics. The closed-ended
questionnaire survey refined the documental data findings and established the gap between the
existing and best practices. Departments of Architecture, Construction Management, Civil
Engineering, Quantity Surveying, and Building surveying were selected to represent the BE
discipline to obtain validated and reliable data making the survey sample 20 academics. Four
sets of academics were selected under each discipline based on their title, including a
Professor/Reader, two Senior Lecturers and a Lecturer from each BE discipline. This approach
helped to recruit four participants from each discipline in BE. Additionally, three participants,
a Head of the school, a Principal lecturer, and a Subject lead, were included, bringing the
sample size to 23 participants. A critical focus of the latter three participants was to eliminate
unconscious bias in feedback received from students and endorse validity, reliability and
transferability of the data collected and modelled through ISM analysis. The data obtained from
the questionnaire assisted in developing the drivers in enhancing the best practice of teaching
in the BEHE context.
2.2 Research Procedure
A systematic approach to data collection incorporating the literature review, document
analysis, and questionnaire survey has allowed an in-depth understanding of current BEHE
teaching and learning. The substantial data collected from documental analysis and
questionnaire survey needed to be correlated with the NSS theme establishing relationships on
improving BEHE teaching and learning. Thus, the data was modelled using the Interpretive
Structural Modelling (ISM) tool to find critical drivers and correlation of each driver to the
theme of NSS section one. The drivers identified through the data analysis were used in the
ISM analysis. Afterwards, a reachability matrix was developed from modelling the drivers
through a “Structural Self-interaction Matrix” (SSIM). A “Matrice d’Impacts CroisesMultiplication Appliqúe a Classement” (MICMAC) was further developed to identify what
factors need to be emphasised in enhancing teaching strategies ascertaining the degree of the
relationships between the drivers found through SSIM. The MICMAC enabled categorising
data obtained into independent, dependent and autonomous clusters to establish a best practice
framework for teaching enhancement in BEHE. The data derived from each analysis was
factored in when developing the level partitioning of each driver. Moreover, the ISM level
partitioning illustrated a critical correlation of each driver under NSS themes and emphasised
implications in the BEHE context. Finally, this study's general conclusions are drawn from the
level partitioning and presented as the recommended strategies for developing teaching
enhancement in BEHE.
3.0 Analysis
Three Hundred and seventy-five (375) MMRs (Mid Module Reviews) were examined.
Students were given three questions; how the module is undergoing; what is good/bad, and
suggestions to improve module delivery. A subjective evaluation by academics was made of
the reviews provided, and themes were identified in the given student suggestions. This
7
evaluation identifies 24 drivers directly influencing the teaching practices highlighted by the
four NSS questions. The identified drivers were collated and categorised into the specific NSS
questions/themes, and an ISM analysis was carried out. A pair-wise relationship is mapped to
the Structural self-interaction matrix (SSIM) using a binary matrix based on the above data
gathered through the closed-ended questionnaire survey from the teaching staff. The binary
matrix was used to create the MICMAC graph in recognising the influential drivers that
enhance HE teaching. Furthermore, a level partitioning was carried out to find the interrelationship of each driver and recognise the sequential order of implications within the BEHE
context. Based on the characteristics of the independent cluster, these drivers are considered
fundamental to the system. These drivers are considered incredibly important for enhancing
teaching in BEHE. The drivers based on the characteristics of the dependent cluster are
considered a necessity for accommodating the independent drivers. Thus, dependant drivers
directly influence the planning and module development rather than being fundamental to
teaching. The drivers based on the characteristics of the autonomous cluster are considered
fundamental unimportant in the system.
The study reveals that critical emphasis needs to be given to promote active learning
and provide in-depth understanding when the academic explains module content. Promoting
collaboration, student engagement and focussing on student-centric approaches occurred in the
independent cluster to make the subject interesting. Promoting intellectual stimulation by
enhancing interaction between the learner and the academic was considered fundamental in
enhancing active learner stimulation. Challenging the learner by providing motivation,
promoting self-assessment for continuous improvement, challenging learning culture through
learner motivation and helping the learner develop an action plan on career progression was
illustrated in the independent cluster making the drivers deemed fundamental. Thus,
implementing these drivers would facilitate the best practice in HE teaching.
Furthermore, dependent drivers identified through the study will be beneficial in
facilitating the independent drivers mentioned above. An interim assessment opportunity and
guidance given through a formative feedback session were recognised as dependent drivers in
explaining the module content. Use of various media in explaining the subject content,
executing cognitive approaches, arranging site visits (where applicable) or site walk-throughs,
guest lecturers, augmentation in lecture material, and presenting real-world examples in
lectures were identified as dependent drivers in making the subject interesting. Intellectual
stimulation by challenging learners in problem-based learning and assessment guidance
through assessment rubrics and question-based learning were identified under the dependent
cluster. Contrary to widespread belief, revisiting previous knowledge and reflecting on module
content with the pathway provided by PSRB in explaining module content and reflecting more
on the industry-led practices in intellectually stimulating students were in the autonomous
cluster. However, this is not because the said drivers have little influence on the system, but
the drivers are facilitated by other (both dependent in independent) drivers.
To generalise the critical findings from the MICMAC analysis, the following Table 1
illustrates the fundamental drivers (independent), facilitating drivers (dependent), and noninfluential/already accommodated drivers (autonomous) in enhancing teaching in HE. The
drivers are categorised into the four performance indicators depicted by Section 1 of the NSS
to clarify and ease interpretation. Thus, academics and academic institutions can implement
these drivers to promote teaching practices within BEHE.
8
Table 1: Categorisation of Drivers
Section 1: The teaching on my course
NSS Section
Q1 –
Staff is good
at explaining
things.
Q2 –
Staff have
made the
subject
interesting.
Q3 –
The course is
intellectually
stimulating.
Q4 –
My course
has
challenged
me to
achieve my
best work.
Drivers identified through the study
D1 - Promoting active learning
D2 - Providing an in-depth
understanding
D3 - Revisiting previous knowledge.
D4 - Interim assessment opportunity.
D5 - Guidance given through
formative feedback session.
D6 - Reflecting module content with
the pathway provided by PSRB.
D7 - Promoting collaboration
D8 - Focussing on student-centric
approaches.
D9 - Promoting student engagement.
D10 - Use of a variety of media in
explaining the subject content.
D11 - Executing cognitive approaches.
D12 - Arranging site visits (where
applicable) or site walkthroughs.
D13 - Guest lecturers
D14 - Augmentation in lecture
material
D15 - Presenting real-world examples
in lectures.
D16 - Promoting intellectual
stimulation.
D17 - Enhance interaction between the
learner and the academic.
D18 - Reflecting more on industry-led
practices.
D19 - Challenging learners in
problem-based learning.
D20 - Promoting self-assessment for
continuous improvement.
D21 - Challenging learning culture
through learner motivation.
D22 - Assessment guidance through
assessment rubrics.
D23 - Question-based learning.
D24 - Having an action plan on career
progression.
SISM Coordinates (I,j)
10
17
2
11
13
24
6
6
13
10
10
7
10
13
6
10
21
14
15
15
5
11
18
17
4
4
18
2
19
4
9
19
8
15
11
9
13
11
9
13
7
19
12
16
6
11
5
20
MICMAC
Categorisation
Independent
Independent
Autonomous
Dependent
Dependent
Autonomous
Independent
Independent
Independent
Dependent
Dependent
Dependent
Dependent
Dependent
Dependent
Independent
Independent
Autonomous
Dependent
Independent
Independent
Dependent
Dependent
Independent
9
4.0 Discussion and Recommendations
This study recognises the significant need to enhance quality of teaching in BEHE.
Both the literature and primary data collection recognised a substantial number of suggestions
for enhancing teaching practices. The strategies/drivers obtained from primary and secondary
data are categorised into themes and analysed according to their influence/driver capability
with questions put forth by NSS section 1. The outcome of the discussion will be the level
partitioning of the identified drivers, which will illustrate the accurate implementation in
increasing quality of HE teaching. The below section further finds the identified drivers and
their correspondence with the NSS themes under section one.
3.1 Explaining the subject
The root of explaining the subject depends on how the learner clarifies the knowledge
criteria. Gollub (2002), Ferguson (2012), and McKnight et al. (2016) prove that active learning
is highly dependent on the levels of understanding. Providing a higher understanding of the
subject matter, the context of knowledge transferred, revisiting the experience learnt and
promoting interactive learning are critical academic performance enhancers (McKnight et al.,
2016; Guo and Shi, 2014; Eames and Birdsall, 2019). The level partitioning developed from
the research findings shown in figure 1 below identifies that revisiting knowledge (D3) and
reflecting on the (D6) PSRB pathway was the least priority at level III. Even though they are
at level III, they will aid other drivers with in-depth understanding (D2) to better explain
module content. Both literature (Lozano et al., 2012; Ovbiagbonhia et al., 2020) and data state
that the module leader needs to identify how to merge academic and professional competency
gaps in providing an in-depth understanding of BE curricula. However, the research findings
highlight the importance of the availability of interim assessment guidance. The use of interim
assessment opportunities (D4) and guidance given through formative feedback (D5) should be
considered significant in developing the module. Emphasis is on module leaders, and
academics need to develop and deliver the module content facilitating formative
assessment/feedback. The study identifies that promoting active learning and in-depth
understanding is fundamental and at Level I in enhancing knowledge delivery. The current
studies (Allen et al., 2020) as pedagogic theories and platforms such as VLE in promoting
active learning by using quizzes and other media to engage students have deemed the best
strategies in enhancing active learning.
Figure 1: Level partitioning of Drivers on NSS Q1 - Staff is good at explaining the subject
10
3.2 Making the subject interesting
The literature establishes that the learning culture of the modern-day classroom has
evolved. Hue and Li (2008) and Hmelo-Silver et al. (2008) identified the core context of
collaboration and its’ effect on subject engagement. The widespread belief that the current
pedagogical paradigm on digitised practices promotes collaborations (Siew, 2018; Hamari et
al., 2016) influences authentic, industry-related content, especially within the BE curricula.
Moreover, the literature review identifies that BE knowledge providers promote digitised
learning concepts in HE. Findings from primary data also recognise approaches in
accommodating augmented concepts and focusing on digitised learning environments
facilitating such learning. The level partitioning developed from the research findings shown
in figure 2 below illustrates both facilitating drivers and fundamental drivers. The facilitating
drivers are: execute cognitive approaches (D11), arrange site visits or site walk-throughs (D12),
guest lecturers (D13), augmentation/digitisation in lecture material (D14), and present realworld examples in lectures (D15). Since these drivers are positioned at level II, these drivers
(D10 to D15) are considered to facilitate module delivery's fundamental drivers. However, it
is identified that D13, D10 and D14 facilitate each other and help facilitate D11 and D15, which
facilitate D7 and D9, respectively. The study further strengthens the argument that promoting
student collaborations (D7), engagement (D9) and focussing on student-centric approaches
(D8) are fundamental in making the subject content interesting. It further revealed that both D7
and D9 facilitated D8 in making the subject interesting. The ISM level partitioning positioned
them at Level I due to their fundamental influence in making the subject interesting.
A critical finding from the study is that using a variety of media (D10) to explain the
subject brings innovation to the classroom. The research findings signify that digitisation must
be considered a key facilitator but not a fundamental element in pedagogic development.
Further to the evidence of earlier studies, blended learning and flipped classroom techniques
are considered paramount in carrying out collaborative knowledge in group learning (Allen et
al., 2020). Documental analysis insists on combining traditional and digitised media to deliver
module content. Findings from documental analysis reveal that students prefer traditional
module delivery aligning with digitised recordings for revisiting knowledge. Thus, digitisation
needs to be a facilitator rather than being promoted to a fundamental driver in teaching HE. It
is further applicable to the current COVID learning context, where online learning has
dominated pedagogical implementation (Bao, 2020). This study presents critical evidence that
digitisation is not the case in enhancing teaching practices but rather an opportunity to facilitate
independent drivers in enhancing HE learning.
Figure 2: Level partitioning of Drivers on NSS Q2 - Staff have made the subject interesting
11
3.3 Intellectual stimulation of learners
Baeten et al. (2010), Bolkan et al. (2011), and Jabar and Albion (2016) identify that
intellectual stimulation critical in HE student progression. Both literature by Baeten et al.
(2010) and Bolkan et al. (2011) and research findings reveal that a straightforward ‘lecturing’
where the knowledge is being pushed to the learner with less reflection and context is
considered adverse to academic progress and performance. Data and literature (Van Schaik,
2019) disagree with adopting industry-led practices (D18) to deliver the module content, thus,
positioning it at Level III. findings reveal that this is due to drivers such as site visits, guest
lectures, and focusing on real-world context were already adhered to make the subject
interesting. However, these drivers are prominent in challenging learners by using problembased (D19) and industry-led contexts in learning. Tirrell and Quick (2012) and Jabar and
Albion (2016) further emphasised innovative teaching and effective teaching methods, such as
problem-based learning (D19). However, the research findings emphasise that such practice is
not fundamental but crucial in increasing intellectual stimulation since it is positioned at Level
II in ISM level partitioning. However, it recognises the influence of D19 in facilitating both
D16 and D17. The study emphasises intellectual stimulation (D16) in module development and
that enhancing learner-academic interaction (D17) is fundamental and is self-facilitating to
make the course intellectually stimulating. The ISM level partitioning has positioned them in
Level I, which denotes fundamental influence over intellectual stimulation. The findings
further show the benefits of utilising digitised tools or in-class activities to promote intellectual
stimulus, especially within the COVID pandemic (Arora and Srinivasan 2020) and for
disciplines such as BE, where a vast knowledge content (e.g. architectural, engineering,
surveying and management) needs to be reflected.
Figure 3: Level partitioning of Drivers on NSS Q3 - The course is intellectually stimulating
12
3.4 Challenging Learners
The literature review (Darling-Hammond et al., 2019 and Boud et al., 2018) identifies
those challenging students could increase the probability of academic progression. However,
Kohn Rådberg et al. (2018) stressed the deficiencies in academic progression regarding the
lack of motivation and drivers, which does not aid intellectual stimulation. The literature
provides many strategies for promoting a challenging culture within the learning environment;
however, the surplus of theories makes the implementation complicated and time-consuming
(Boud et al., 2018; Bolkan, 2010). Assessment guidance through assessment rubrics (D22) and
question-based learning (D23) are at Level II at ISM Level portioning. Contradicting the
literature (Ellis and Hogard, 2018), the research findings illustrate that D22 and D23 were not
fundamental to challenging students but influential in facilitating D21 in enabling students to
achieve their best work. Also, this could be due to digitalisation being a prominent aspect in
enabling these drivers into the HE curriculum. This study identifies that the fundamental
drivers as promoting self-assessment opportunities (D20), motivating the student through a
challenging culture of knowledge provision (D21), and developing an action plan on career
progression/continuous improvement (D24) is positioned at Level I in ISM analysis. It further
highlights that D21 and D24 facilitate D20, promoting continuous student improvement. Thus,
the analysis deems that the module leader/lead academic needs to consider the self-assessment
techniques, challenging learning culture, and action plan for career development in developing
the module and enhancing teaching in HE.
Figure 4: Level partitioning of Drivers on NSS Q4 - Course has challenged to achieve the best work
6.0 Conclusions
This study establishes drivers to enhance the quality of teaching in BEHE across the
range of students that reflects on the results of section 1 of NSS. The findings are novel as the
study discusses drivers and illustrates implementation to improve quality of teaching within
the four NSS themes. The main findings from the literature review set up a significant room
for improvement in teaching and pedagogy to enhance student performance in BEHE. The
practical implications of this study are that the identified drivers could help academics and
students increase understanding in conjunction with the lectures that deliver in-depth
knowledge through practical sessions. As illustrated in the figures, the level partitioning will
enable academics to focus on significant pedagogical themes and enforce strategies. As the
theme refers to the NSS guidelines, the drivers developed could assist HE institutions in
13
obtaining better results for the NSS survey. Finally, the combined set of figures could form a
framework for enhancing quality of teaching within HE curricula.
The suggestions for student engagement, developing a stimulating learning
environment, and challenging students need various collaborative online and face-to-face
teaching approaches. The literature set up another critical part in providing context on module
background and content. Drivers further reinforced that promoting active learning and in-depth
understanding was fundamental in improving teaching in the BEHE context. Moreover, the
study's primary data proved that teaching and learning, resources, standards, and assessments
could provide a better understanding to students and could be further facilitated by the abovementioned independent drivers.
This interpretation contrasts that implementing innovative practices in knowledge
transfer such as blended learning, flipped classroom and group learning are vital for stimulating
learners. Promoting collaboration, student engagement, and focusing on student-centric
approaches were considered independent, but these drivers facilitate other drivers in making
the subject interesting. Moreover, promoting intellectual stimulation, enhancing interaction
between the learner and the academic, promoting self-assessment for continuous improvement,
challenging learning culture through learner motivation, and having an action plan on career
progression are recognised as independent drivers in advancing teaching in BEHE.
The study identified several dependent factors, such as aligning the module content
with the PSRB requirements and emphasising personal and career development benefits.
However, the current learning practices need to be integrated with the online delivery platforms
to provide knowledge and challenge learners for better learning practice. Enforcing quizzes
and real-world examples through a digital platform proves vital in helping independent drivers
for intellectual stimulation and challenging the learner for an active learning atmosphere.
Finally, a unique finding is that online delivery in the current situation (COVID 19)
brings more challenges since the lectures are either blended or delivered online. All the
independent and dependant drivers for engaging students, increasing understanding, inspiring
and challenging learners remain unchanged. The current situation also demands training for
the lecturers on various tools that can help engage, challenge, stimulate, and increase the
learners' understanding. However, the lecturers may now need to use multimedia tools to
accommodate the suggestions from this study and facilitate the independent drivers to enhance
quality of teaching in BEHE. Further research could be carried out by involving a higher
sample from different HE institutes around the globe to develop a global framework. Also,
further research is needed to reflect on how quality of teaching influences student learning
opportunities, assessment and feedback, academic support, and learning resources.
Acknowledgement
The data obtained for the below paper was based on a project guided by a steering
committee within the University of Wolverhampton, chaired by Professor Mohammed Arif.
Among the committee members, credit needs to be given to Dr David Searle, Dr Alaa Hamood
and Dr Louise Gyoh for their significant input on the data collection. Furthermore, the student
and academic participants at the University of Wolverhampton need recognition for their
insightful comments.
14
7.0 Reference List
Allen, J., Rowan, L. and Singh, P., 2020. Teaching and teacher education in the time of COVID-19.
Asia-Pacific Journal of Teacher Education, 48(3), pp.233-236.
Altomonte, S., Logan, B., Feisst, M., Rutherford, P. and Wilson, R. (2016). Interactive and situated
learning in education for sustainability. International Journal of Sustainability in Higher
Education, 17(3), pp.417-443.
Andersson, P., Fejes, A. and Sandberg, F., 2013. Introducing research on recognition of prior learning.
International Journal of Lifelong Education, 32(4), pp.405-411.
Arora, A. and Srinivasan, R., 2020. Impact of Pandemic COVID-19 on the Teaching – Learning
Process: A Study of Higher Education Teachers. Prabandhan: Indian Journal of Management,
13(4), p.43.
Avargil, S., Herscovitz, O. and Dori, Y., 2011. Teaching Thinking Skills in Context-Based Learning:
Teachers’ Challenges and Assessment Knowledge. Journal of Science Education and
Technology, 21(2), pp.207-225.
Baeten, M., Kyndt, E., Struyven, K. and Dochy, F., 2010. Using student-centred learning environments
to stimulate deep approaches to learning: Factors encouraging or discouraging their
effectiveness. Educational Research Review, 5(3), pp.243-260.
Bao, W., 2020. COVID ‐19 and online teaching in higher education: A case study of Peking University.
Human Behavior and Emerging Technologies, 2(2), pp.113-115.
Block, B., 2018. Digitalization in engineering education research and practice. 2018 IEEE Global
Engineering Education Conference (EDUCON).
Blundell, C., Lee, K. and Nykvist, S., 2020. Moving beyond enhancing pedagogies with digital
technologies: Frames of reference, habits of mind and transformative learning. Journal of
Research on Technology in Education, 52(2), pp.178-196.
Bolkan, S. and Goodboy, A. (2010). Transformational Leadership in the Classroom: The Development
and Validation of the Student Intellectual Stimulation Scale. Communication Reports, 23(2),
pp.91-105.
Bolkan, S., Goodboy, A., and Griffin, D. (2011). Teacher Leadership and Intellectual Stimulation:
Improving Students' Approaches to Studying through Intrinsic Motivation. Communication
Research Reports, 28(4), 337-346. doi: 10.1080/08824096.2011.615958
Boud, D., Ajjawi, R., Dawson, P. and Tai, J. (2018). Developing Evaluative Judgement in Higher
Education. 1st ed. London: Routledge.
Bowen, T. (2017). Assessing visual literacy: a case study of developing a rubric for identifying and
applying criteria to undergraduate student learning. Teaching in Higher Education, 22(6),
pp.705-719.
Bye, D., Pushkar, D., and Conway, M. (2007). Motivation, Interest, and Positive Affect in Traditional
and Nontraditional Undergraduate Students. Adult Education Quarterly, 57(2), 141-158. doi:
10.1177/0741713606294235
Chen, C. and Yang, Y., 2019. Revisiting the effects of project-based learning on students’ academic
achievement: A meta-analysis investigating moderators. Educational Research Review, 26,
pp.71-81.
15
Chickering, A. W., & Gamson, Z. F. (1999). Development and adaptations of the seven principles for
good practice in undergraduate education. New Directions for Teaching and Learning, 80, 75–
81.
Clough, P., and Strycharczyk, D. (2012). Developing mental toughness (1st ed.). London: KoganPage.
Darling-Hammond, L., Flook, L., Cook-Harvey, C., Barron, B. and Osher, D., 2019. Implications for
educational practice of the science of learning and development. Applied Developmental
Science, 24(2), pp.97-140.
Dieh, M., Lindgren, J. and Leffler, E., 2015. The Impact of Classification and Framing in
Entrepreneurial Education: Field Observations in Two Lower Secondary Schools. Universal
Journal of Educational Research, 3(8), pp.489-501.
Eames, C., and Birdsall, S. (2019). Teachers’ perceptions of a co-constructed tool to enhance their
pedagogical content knowledge in environmental education. Environmental Education
Research, 1-16. doi: 10.1080/13504622.2019.1645445
Ellis, R. and Hogard, E., 2018. Handbook of Quality Assurance for University Teaching, Routledge,
London.
Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of
Technology Enhanced Learning, 4(5/6), 304. doi: 10.1504/ijtel.2012.051816
Fram, S., and Margolis, E. (2011). Architectural and built environment discourses in an educational
context: the Gottscho and Schleisner Collection. Visual Studies, 26(3), 229-243. doi:
10.1080/1472586x.2011.610946
Fraser, S., 2019. Understanding innovative teaching practice in higher education: a framework for
reflection. Higher Education Research & Development, 38(7), pp.1371-1385.
French, A. and O'Leary, M. (2017). Teaching Excellence in Higher Education:|b Challenges, Changes
and the Teaching Excellence Framework. Bingley: Emerald Publishing Limited.
Gollub, J. (2002). Learning and understanding. Washington, DC: National Academy Press.
Gomis, K., Saini, M., Pathirage, C. and Arif, M., 2021. Enhancing learning opportunities in higher
education: best practices that reflect on the themes of the national student survey, UK. Quality
Assurance in Education, 29(2/3), pp.277-292.
Guo, F. and Shi, J. (2014). The relationship between classroom assessment and undergraduates' learning
within Chinese higher education system. Studies in Higher Education, 41(4), pp.642-663.
Hamari, J., Shernoff, D., Rowe, E., Coller, B., Asbell-Clarke, J. and Edwards, T., 2016. Challenging
games help students learn: An empirical study on engagement, flow and immersion in gamebased learning. Computers in Human Behavior, 54, pp.170-179.
Han, F. and Ellis, R. (2019). Identifying consistent patterns of quality learning discussions in blended
learning. The Internet and Higher Education, 40, pp.12-19.
Hmelo-Silver, C., Chernobilsky, E. and Jordan, R., 2008. Understanding collaborative learning
processes in new learning environments. Instructional Science, 36(5-6), pp.409-430.
Hue, M., and Li, W. (2008). Classroom Management: Creating a Positive Learning Environment (Hong
Kong teacher education). Hong Kong: Hong Kong University Press, HKU.
16
Jabar, S. and Albion, P., 2016. Assessing the Reliability of Merging Chickering & Gamson’s Seven
Principles for Good Practice with Merrill’s Different Levels of Instructional Strategy
(DLISt7). ERIC Online Learning, 20(2).
Kohn Rådberg, K., Lundqvist, U., Malmqvist, J. and Hagvall Svensson, O. (2018). From CDIO to
challenge-based learning experiences – expanding student learning as well as societal impact?.
European Journal of Engineering Education, 45(1), pp.22-37.
Lai, K. (2011). Digital technology and the culture of teaching and learning in higher education.
Australasian Journal of Educational Technology, 27(8). doi: 10.14742/ajet.892
Lozano, J., Boni, A., Peris, J. and Hueso, A., 2012. Competencies in Higher Education: A Critical
Analysis from the Capabilities Approach. Journal of Philosophy of Education, 46(1), pp.132147.
Marshalsey, L, and Madeleine S. (2018). “Critical Perspectives of Technology-Enhanced Learning in
Relation to Specialist Communication Design Studio Education Within the UK and Australia.”
Research in Comparative and International Education 13 (1): 92–116. doi:
10.1177/1745499918761706
Matthews, A. and Kotzee, B., 2019. The rhetoric of the UK higher education Teaching Excellence
Framework: a corpus-assisted discourse analysis of TEF2 provider statements. Educational
Review, pp.1-21.
McKnight, K., O'Malley, K., Ruzic, R., Horsley, M., Franey, J. and Bassett, K. (2016). Teaching in a
Digital Age: How Educators Use Technology to Improve Student Learning. Journal of Research
on Technology in Education, 48(3), pp.194-211.
Moore, D. and Fisher, T., 2017. Challenges of Motivating Postgraduate Built Environment Online
Teaching and Learning Practice Workgroups to Adopt Innovation. International Journal of
Construction Education and Research, 13(3), pp.225-247.
Office for Students, 2020. National Student Survey Results 2020. London, UK.
Ovbiagbonhia, A., Kollöffel, B. and Den Brok, P., 2020. Teaching for innovation competence in higher
education Built Environment engineering classrooms: teachers’ beliefs and perceptions of the
learning environment. European Journal of Engineering Education, 45(6), pp.917-936.
Santos, G., Marques, C., Justino, E. and Mendes, L., 2020. Understanding social responsibility’s
influence on service quality and student satisfaction in higher education. Journal of Cleaner
Production, 256, p.120597.
Scott, L. (2020). Engaging Students' Learning in the Built Environment Through Active Learning.
Claiming Identity Through Redefined Teaching in Construction Programs, pp.1-25.
Staff and Educational Development Association, (2013). Measuring The Impact Of The UK
Professional Standards Framework For Teaching And Supporting Learning (UKPSF). Higher
Education Academy.
Tirrell, T., and Quick, D. (2012). Chickering's Seven Principles of Good Practice: Student Attrition in
Community College Online Courses. Community College Journal of Research and Practice,
36(8), 580-590. doi: 10.1080/10668920903054907
Tsiligiris, V. and Hill, C., 2019. A prospective model for aligning educational quality and student
experience in international higher education. Studies in Higher Education, 46(2), pp.228-244.
Uchiyama, K. and Radin, J., 2008. Curriculum Mapping in Higher Education: A Vehicle for
Collaboration. Innovative Higher Education, 33(4), pp.271-280.
17
Van Schaik, P., Volman, M., Admiraal, W., and Schenke, W. (2019). Approaches to co-construction of
knowledge in teacher learning groups. Teaching And Teacher Education, 84, 30-43. doi:
10.1016/j.tate.2019.04.019
Waheed, H., Hassan, S., Aljohani, N., Hardman, J., Alelyani, S. and Nawaz, R., 2020. Predicting
academic performance of students from VLE big data using deep learning models. Computers
in Human Behavior, 104, p.106189.
Welzant, H., Schindler, L., Puls-Elvidge, S., & Crawford, L. (2015). Definitions of quality in higher
education: A synthesis of the literature. Higher Learning Research Communications, 5 (3).
doi:10.18870/hlrc.v5i3.244
18
English Language Teaching; Vol. 11, No. 1; 2018
ISSN 1916-4742
E-ISSN 1916-4750
Published by Canadian Center of Science and Education
The Affection of Student Ratings of Instruction toward EFL
Instructors
Yingling Chen1
1
Center for General Education, Orieantal Institute of Technology, New Taipei City, Taiwan
Correspondence: Yingling Chen, Center for General Education, Oriental Instutte of Technology, New Taipei
City, Taiwan. Tel: 886-909-301-288. E-mail: cil0226@mail.oit.edu.tw
Received: October 27, 2017
doi: 10.5539/elt.v11n1p52
Accepted: December 3, 2017
Online Published: December 5, 2017
URL: http://doi.org/10.5539/elt.v11n1p52
Abstract
Student ratings of instruction can be a valuable indicator of teaching because the quality measurement of
instruction identifies areas where improvement is needed. Student ratings of instruction are expected to evaluate
and enhance the teaching strategies. Evaluation of teaching effectiveness has been officially implemented in
Taiwanese higher education since 2005. Therefore, this research investigated Taiwanese EFL university
instructors’ perceptions toward student ratings of instruction and the impact of student ratings of instruction on
EFL instructors’ classroom teaching. The data of this quantitative study was collected by 21 questionnaires. 32
qualified participants were selected from ten universities in the northern part of Taiwan. The results indicate
those EFL instructors’ perceptions and experiences toward student ratings of instruction affects their approach to
teaching, but EFL instructors do not prepare lessons based on the results of student ratings of instruction.
Keywords: student ratings of instruction, EFL, instruction
1. Introduction
The Ministry of Education (MOE) authorizes universities and colleges to determine whom to hire in the college
system according to the Taiwanese College Regulation 21. Moreover, the MOE (2005) concluded that
developing a system for teacher evaluation is necessary in each college and university. As a result, schools have
more power in deciding the qualification of educators. Wolfer and Johnson (2003) emphasized that one must be
clear about the purpose of a course evaluation feedback since it may determine the kind of data required.
Moreover, teacher evaluation should include the key element for not only promotion, tenure, and reward, but
also performance review and teaching improvement. In addition, student ratings of instruction become an
essential element to evaluate teachers’ success for ensuring the quality of teaching. Students’ opinions are
fundamental sources for forming the quality of instruction in higher education. Murray (2005) stated that more
than 90% of U.S. colleges and universities pay attention to student evaluation of teachers in order to assess
teaching. Besides, about 70% of college instructors recognize the need of student input for assessing their
classroom instruction (Obenchain, Abernathy, & Wiest, 2001). Teacher decision making toward curriculum
design and teacher expectancy of student achievement have a significant influence on the results of curricular
and instructional decisions. However, most of the research focus on how to assist and improve students’ learning
through SRI, how to improve teaching effectiveness through SRI, issues of SRI, or student achievement toward
SRI; few of them address how do instructors use the feedback from SRI or how do instructors improve teaching
through the results of SRI (Beran, Violato, Kline, & Frideres, 2005). Accordingly, instructors’ perceptions of
student ratings become valuable in presenting a better insight for improving teacher performances because
understanding how instructors are impacted by SRI is influential.
1.1 Literature
1.1.1 The Use of Student Ratings of Instruction
The implementations of SRI at colleges and universities have not only been employed for purposes of improving
teaching effectiveness, but also have been used for personnel decisions such as tenure. SRI is widely practiced in
colleges and universities across Canada and the United States (Greenwald, 2002) .In fact, student ratings is not a
new topic in higher education. Researchers, Remmers and Brandenburg published their first research studies on
student ratings at Purdue University in 1927. Also, Guthrie (1954) stated that students at the University of
52
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
Washington filled out the first student rating forms seventy-five years ago. Nevertheless, SRI is a pertinent topic
for researchers to study because students still fill out the evaluation forms which produce vital information on
teaching quality. Administrators take SRI into consideration to determine the effectiveness of instruction and
personnel promotions as well. There were 68% of American colleges reported using student ratings in Sedin’s
1983 survey. Meanwhile, there were 86 percent of American colleges reported using student rating surveys in
colleges in 1993 (Sedin, 1993a). Seldin’s (1993b) surveys reflected the growing number of use of student rating
as an instrument for teaching evaluation in higher education.
1.1.2 Student Rating of Instruction in Higher Education in Taiwan
“During the 1990s, most education systems in the English-speaking world moved towards some notion of
performance management” (West-Burnham, O’Neill, & Bradbury, 2001, p. 6). The widespread use of the
performance management concept contributes to the education system, which focuses on specific measurement
of classroom instruction delivery. The quality of teaching influences students not only academically, but also
psychologically. With regard to the value of teacher evaluation, the Taiwanese Ministry of Education has
mandated that colleges and universities monitor the quality of teaching because the quality of teachers and
instructions impact students’ academic achievement and the reputation of the school. Chang (2002) declared that
approximately 76 percent of public universities and 85 percent of private universities have implemented SRI in
Taiwan. As a result, teacher evaluation has become an instrument for examining instructors’ classroom
presentation. Liu (2011) stated that teachers’ classroom presentation is equivalent to teacher appraisal and
teacher performance. Furthermore, Liu (2011) found the following:
Since 28th December 1995, the 21st Regulation of the University Act stated that a college should formulate a
teacher evaluation system that decides on teacher promotion, and continues or terminates employment based on
college teachers’ achievement in teaching, research and so forth. (p. 4) SRI has been wildly accepted by
universities and colleges in Taiwan and has become a practical tool for enhancing teaching performance and
developing an effective trigger to examine factors that relate to educational improvement.
SRI stimulates organizational level effects by providing information from evaluation practice such as diagnosing
organizational problems. SRI raises environmental level effects such as hiring, retention, and dismissal which is
highly public acts justified through the evaluation process (Cross, Dooris, & Weinstein, 2004).
1.2 State Hypotheses and Their Correspondence to Research Design
1.2.1 Null Hypotheses
The independent variable in this study was SRI. The dependent variables were northern Taiwanese EFL
university instructors’ perception and the influence of SRI on northern Taiwanese EFL university instructors. The
null hypotheses was designed for testing the association between EFL instructors’ perceptions and SRI, SRI and
the classroom instruction, and the impact of SRI and the classroom instruction. A Chi-Square was used to test the
associations of the null hypotheses. A Chi-square probability of .05 or less was used to reject the null hypotheses.
The following hypotheses addressed the research question:
1.2.2 Research Questions
1). What are Taiwanese EFL university instructors’ perceptions toward SRI?
H10: No association exists between EFL university instructors’ perceptions and SRI
(at the .05 level of significance).
2). What impact does SRI have on EFL university instructors’ classroom instructions?
H20: No association exists between the impact of SRI and classroom instruction
(at the .05 level of significance).
2. Method
2.1 Participant
All participating EFL instructors have master or doctoral degrees from the foreign universities or local
Taiwanese universities. The subjects’ ages were between thirty-five to seventy years old. Each participating
experienced instructor has received at least three years of results from SRI.
2.2 Sampling Procedures
The researcher used random sampling strategy to gain participants from 10 universities in the northern part of
Taiwan for the quantitative data. The key to random sampling is that each university in the population has an
53
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
equal probability of being selected in the sample (Teddlie & Yu, 2007). Using random sampling strategy helped
the researcher prevent biases from being introduced in the sampling process by drawing names or numbers. 32
Taiwanese university EFL instructors and were conducted from ten universities in the Northern part of Taiwan.
2.3 Sample
The target participants for the quantitative phase were thirty-two Chinese speaking English instructors from 10
northern universities. All participating EFL instructors have master or doctoral degrees from the foreign
universities or local Taiwanese universities. Each participating experienced instructor has received at least three
years of results from SRI.
2.4 Measurment
The quantitative data was collected and identified through a demographic survey and EFL instructors’ perception
of SRI questionnaire. A questionnaire covering instructors’ perceptions toward SRI and a demographic
questionnaire were used to explain the result of the quantitative data.
2.5 Research Design
The researcher randomly selected ten northern universities, which offer the English or applied foreign language
major by drawing from twenty-eight schools.
2.6 Data Analisis
The first step of data analysis was the analyzing of the quantitative data. The researcher assigned codes to all
questionnaires so that the participants’ information was ensured. Then, the information was transferred into the
Statistical Package for the Social Sciences (SPSS 21.0). Also, the researcher was correctly enter quantitative data
into SPSS in order to run a Cronbach's alpha test to create internally consistent, reliable, and valid tests and
questionnaires for enhancing the accuracy of the survey. Furthermore, a Chi-Square test was implemented for
testing hypotheses using a non-parametric test. Cooper and Schindler (2006) stated that Non-parametric tests are
used to test the significance of ordinal and nominal data. A Chi-Square was used to compare SRI to the
dependent variables. The Chi-Square statistical analysis was used to determine if an association exists between
SRI and EFL instructors’ perceptions,
3. Results
In the Results section, summarize the collected data and the analysis performed on those data relevant to the The
results were reported in two main parts: (1) background information of quantitative survey participants, (2) a
Chi-Square test was used to compare SRI to response dependent variables.
3.1 Gender and Age
Table 1 showed the distribution of gender and age for participants who taught in the department of English and
Applied Foreign Language in the universities. Among the 32 EFL university instructor participants, 57% (n= 19)
of the participants were female and 43% (n=13) percent of the participants were male. In addition, 3% (n=1) of
participants were between 25-29 years old, 29% of the participants (n=9) were between 30-39 years old, 37% of
the participants (n=12) were between 40-49 years old, 25% of the participants (n=8) were between 50-59 years
old, and 6% of the participants (n=2) were between 60-69 years old.
Table 1. Frequency distribution of gender and age
Gender
Frequency
Overall
Percentage
Overall
Female
Male
Total
19
13
32
57%
43%
100%
Age
Frequency
Overall
Frequency
Overall
25-19
30-39
40-49
1
9
12
3%
29%
37%
54
elt.ccsenet.org
50-59
60-69
70+
Total
English Language Teaching
8
2
0
32
Vol. 11, No. 1; 2018
25%
6%
0
100%
Note. n=32.
3.2 Years of Teaching
Table 2 reported the distribution of years of teaching for the participants who taught in the department of English
and applied foreign language in the universities under EFL settings. The years of teaching varied from
participants to participants. The distribution of years were the following: 1 for 1-3 years of experience, 2 for 4-6
years of experience, 7 for 7-10 years of experience, 5 for 11-15 years of experience, 6 for 16-20 years of
experience, 4 for 21-25 years of experience, 6 for 26-30 years of experience, and 1 for more than 30 years of
teaching experience.
Table 2. Frequency distribution of years of teaching
Years of Teaching
Frequency
Overall
Percentage
Overall
Less than 1 year
1-3 years
4-6 years
7-10 years
11-15 years
16-20 years
21-25 years
26-30 years
More than 30 years
Total
0
1
2
7
5
6
4
6
1
32
0
3.1%
6.2%
21.9%
16%
19%
12%
19%
3.1%
100%
Note. n=32.
3.3 EFL Instructors’ Highest Level of Education
Table 3 showed the distribution of EFL instructors’ highest level of education among the 32 participants, 24
participants held doctoral degrees and 8 participants had master’s degrees. Furthermore, 26 participants earned
the highest level of formal education in a foreign country and 6 participants got the highest level of formal
education in Taiwan.
Table 3. Frequency distribution of the educational background
Highest Degree
Frequency
Overall
Percentage
Overall
Master Degree
Doctoral Degree
Total
Foreign Degree
Domestic Degree
Total
8
24
32
26
6
32
25%
75%
100%
81.25%
18.75
100%
Note. n=32.
55
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
3.4 Employment Status
Table 4 showed the employment status among 32 participants. There were 12 (38%) permanent employment
with on-going contracts without fixed end-points before the age of retirement, 10 (31%) were fixed term
contracts for a period of more than one school year, and 10 (31%) were fixed term contract for a period of one
school year or less. In the mean time, 12% of the participants (n=4) were part-time instructors and 88% of the
participants (n=28) were full-time instructors.
Table 4. Descriptive statistics for participants’ employment status
Employment
status (1)
Permanent
employment
Fixed term contract
of more than one
school year
Fixed term contract
of more than one
school year or less
Total
Participants/Count
Percentage %
12
37.5%
10
31.25%
10
31.25%
32
100%
Employment
status (2)
Part-time
employment
Full-time
employment
Total
Participants/count
Percentage%
8
25%
24
75%
32
100%
Note. n=32.
3.5 Personal Development
Table 5 showed personal development status among 32 participants. There were 25% (n=7) of participants who
had master’s degree were pursuing a doctoral degrees that related to their professional field at present in Taiwan.
There were 75% (n=25) of participants were holding their original degrees without pursuing further degrees.
Table 5. Descriptive statistics for personal development status
Personal
status
development
Participants/Count
Percentage %
Pursuing
a
doctoral
degree at present
Holding
degree
the
original
Total
7 (in Education, TESL,
Linguistics, and English
fields)
22%
25
32
78%
100%
Note. n=32.
3.6 Internal Reliability
Six Likert-scale items (items 1-6) in the first section. The researcher assessed the internal reliability with a pilot
test of item analysis to obtain the Cronbach’s alpha coefficient. Cronbach’s alpha coefficient was utilized to
determine the reliability of 21 items in discovering Taiwanese EFL university instructors’ perceptions toward
student rating of instructions. The subscales were (1) EFL instructors’ perceptions toward SRI (six items,
Cronbach’s Alpha .71); and the influence of SRI on EFL instructors’ classroom instruction (fifteen items,
Cronbach’s alpha .74) (see Table 6). During data collection, participants were verified as part-time and full-time
EFL university instructors. The survey packet was distributed at the office. After each participant had completed
the survey questionnaires, the researcher reviewed the packet for completeness. Fraenkel and Wallen (2003)
defined validity as the degree to which data supports any inferences that a researcher uses based on the evidence
he collects using a specific instrument. Content validity is defined as the level in which an instrument can be
duplicated under the same condition with the same conditions and participants (Sproull, 2002).
56
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
Table 6. Reliability statistics of pilot SRI
Variables
N of Items
Cronbach’s Alpha
EFL instructors’ perceptions toward SRI
6
.71
The influence of SRI on EFL instructors’
classroom instruction
15
.74
3.7 Rating of Instructions
A preliminary analysis was executed to determine Taiwanese EFL university instructors’ perceptions toward
student rating of instruction. Based on primary analysis in Table 7, item 1 reported that 25% of the participants
strongly disagreed and 69% of the participants disagreed with the positive attitude toward SRI; 6% of the
participants were neutral. In item 2, 59% of the participants disagreed with holding enthusiastic and confident
perceptions about the results of SRI. Twenty-two percent of the participants were neutral; 16% of the participants
agreed and 3% of the participants strongly agreed with having enthusiasm and confidence toward the result of
SRI. In item 3, 41% of the participants disagreed that they spend more time preparing their classes according to
SRI results. Fifty-three percent of the participants were neutral and 6% of the participants agreed that they spent
more time preparing courses based on SRI results. Additionally, in item 4, 31% of the participants disagreed that
being open to students’ opinions would help receive more positive results of SRI. Forty-four percent of the
participants were neutral; 22% of the participants agreed and 3% of the participants strongly agreed that being
open to students’ opinions would help receive more positive result of SRI. In item 5, 6% of the participants
disagreed that they care about the quality of SRI. There were 41% of the participants were neutral. Fifty-three
percent of the participants agreed and 16% of the participants strongly agreed that they cared about the quality of
SRI. In item 6, 6% of the participants strongly disagreed and 47% of the participants disagreed that they were
always satisfied with the results of SRI. Forty-one percent of the participants were neutral and 6% of the
participants agreed that they were always satisfied with the result of SRI.
Table 7. Mean, standard deviation, and percentage of Taiwanese EFL university instructors’ perceptions toward
SRI
Item 1-6
Percentage
Strongly
Disagree
%
Disagree
Agree
%
Neutral
%
Agree
%
Strongly
Agree
%
M
SD
1. I have positive attitude
toward SRI.
2. I am enthusiastic and
confident about the result of
SRI.
3. I
spend
more
time
preparing my class according to
SRI results.
4. I think if I am more open
to students’ opinions, the result
will be more positive.
5. I care about the quality of
SRI
6. I am always satisfied with
the result of SRI
25
68.8
6.3
0
0
1.81
.535
0
59.4
21.9
15.6
3.1
2.68
.871
0
40.6
53.1
6.3
0
2.66
.602
0
31.3
43.8
21.9
3.1
2.97
.822
0
6.3
40.6
53.1
0
3.47
.621
6.3
46.9
40.6
6.3
0
2.47
.718
Note. M=Mean; SD=Standard Deviation.
3.8 Descriptive Analyses of the Influence of SRI on Taiwanese University EFL Instructors
According to the analysis in Table 8, item 7, 6% of the instructors strongly disagreed and 43.8% of the
instructors disagreed that SRI was an effective instrument for improving English instructional delivery. There
were 41% of the participants were neutral. There were 9% of the instructors agreed that SRI was an effective
57
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
instrument for improving English instructional delivery. In item 8, 16% of the participants strongly disagreed and
the majority of the participants (56%) disagreed that SRI provides authentic information in developing effective
English lessons. There were 28% of the instructors were neutral.
Furthermore, in item 9, 56% of the instructors strongly disagreed and 34% of the participants disagreed that they
became more supportive in assisting students learning after receiving the result of EFL SRI. There were 9% of
participants were neutral. In item 10, 13% of the instructors strongly disagreed and 41% of the instructors
disagreed that the result of SRI provided positive encouragement for their classes. There were 31% of the
participants were neutral. There were 6% of the participants agreed that the results of SRI provided positive
encouragement for their classes. Moreover, item 11 was worded in reverse, 9% of the participants strongly
disagreed and 50% of the participants disagreed that criticism from the SRI did not influence their English
teaching performance. There were 25% of the instructors were neutral. There were 9% of the participants agreed
that criticism from the SRI did not influence their English teaching performance.
In item 12, 6% of the participants strongly disagreed and 59% of the participants disagreed that EFL SRI was an
efficient communicative bridge between their students and them. There were 25% of the participants were
neutral. Only 9% of the participants agreed that EFL SRI is an efficient communicative bridge between their
students and them. In item 13, 6% of the participants disagreed that students’ feedback gave them ideas for
teaching students with special needs. There were 56% of the participants were neutral and 37% of the
participants agreed that students’ feedback gave them ideas for teaching students with special needs.
In item 14, 34% of the participants disagreed that students’ feedback improves their English classroom
management. 37.5% of the participants were neutral. There were 28% of the participants agreed that students’
feedback improved their English classroom management. Moreover, item 15 was worded in reverse, 3% of the
participants strongly disagreed and 34% of the participants disagreed that they would not change their
knowledge and understanding of English instructional practices after receiving the results of EFL SRI. There
were 46% of the participants were neutral and 16% of the participants agreed that they would not change the
knowledge and understanding of English instructional practices after receiving the result of EFL SRI.
In item 16, 13% of the participants strongly disagreed and the majority of the participants (63%) disagreed that
students provided trustworthy information when evaluating the effectiveness of English classroom instruction.
There were 22% of the participants were neutral. Only 3% of the participants agreed that students provided
trustworthy information when evaluating the effectiveness of English classroom instruction. In item 17, 41% of
the participants disagreed that students’ academic achievements influenced the result of SRI. There were 31% of
the participants were neutral. There were 25% of the participants agreed and 3% of the participants strongly
agreed that students’ academic achievements influenced the result of SRI.
In item 18, 28% of the instructors disagreed that if they improved the quality of their English teaching, they
received higher ratings from students. There were 56% of the participants were neutral and 16% of the
participants agreed that if they improved the quality of their English teaching, they received higher rating from
students. In item 19, 13% of the instructors disagreed that if they received unpleasant rating scores in the past,
they changed their English teaching strategies. There were 56% of the instructors were neutral. There were 25%
of the instructors agreed and 6% of the instructors strongly agreed that they received unpleasant rating scores in
the past, so they changed their English teaching strategies.
In item 20, 9% of the participants disagreed that after they changed their English teaching strategies, they
received better scores of EFL SRI. There were 81% of the participants were neutral and 9% of the participants
agreed that after changing their English teaching strategies, they received better scores of EFL SRI. In addition,
item 21 was worded in reverse, 6% of the participants strongly disagreed and 25% of the participants disagreed
that unpleasant scores of EFL SRI would not decrease their passion toward teaching. Thirteen percent (13%) of
the participants were neutral. There were 25% of the participants agreed and 6% of the participants strongly
agreed that unpleasant scores of EFL SRI would not decrease their passion toward teaching.
58
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
Table 8. Mean. standard deviation and percentage of the influence of SRI on Taiwanese university EFL
instructors’ classroom instruction
Item 7-21
Percentage
7. EFL SRI is an effective
instrument for improving
English instructional delivery.
8. Overall,
EFL
SRI
provides
me
authentic
information in developing
effective English lessons.
9. I
become
more
supportive in assisting student
learning after receiving the
result of EFL SRI.
10. The result of EFL SRI
provides
positive
encouragement for my class.
11. Criticism from the SRI
does not influence my English
teaching performance.
12. EFL SRI is an efficient
communicative
bridge
between my students and me.
13. Students’ feedback gives
me ideas for teaching students
with special needs.
14. Students’
feedback
improves
my
English
classroom management.
15. I will not change the
knowledge and understanding
of
English
instructional
practices after receiving the
result of EFL SRI.
16. Students
provide
trustworthy information when
evaluating the effectiveness of
English classroom instruction.
17. Students’
academic
achievements influence the
result of SRI.
18. If I improve the quality
of my English teaching, I will
receive higher ratings from
students.
19. I received an unpleasant
rating score in the past, so I
changed my English teaching
strategies.
20. After I changed my
English teaching strategies, I
received better scores of EFL
SRI.
21. Unpleasant scores of
EFL SRI will not decrease my
passion toward teaching.
Strongly
Disagree
%
6.3
Disagree%
Neutral
%
Agree
%
43.8
40.6
9.4
Strongly
Agree
%
0
2.53
15.6
56.3
28.1
0
0
2.13
.660
56.3
34.4
9.4
0
0
2.53
.671
12.5
40.6
31.3
15.6
0
2.50
.916
9.4
50.4
25.0
9.4
0
2.44
.840
6.3
59.4
25.0
9.4
0
2.38
.751
0
6.3
56.3
37.5
0
3.31
.592
0
34.4
37.5
28.1
0
2.94
.801
3.1
34.4
45.9
15.6
0
2.75
.762
50
40.6
9.4
0
0
1.59
.665
0
28.1
56.3
15.6
3.1
2.91
.893
0
28.1
46.3
21.9
3.1
2.88
.660
0
12.5
56.3
25
6.3
3.00
.803
0
63.4
50.3
3.4
9.4
3.47
.761
18.8
56.3
18.8
3.1
3.1
2.16
.884
Note. M=Mean; SD=Standard Deviation.
59
M
SD
.761
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
3.9 The Frequency of Distribution of Years of Teaching Experiences in Four Groups
Table 9 presented the frequency of distribution of years of teaching experiences in four groups. The researcher
divided the participants into four different groups based on their years of teaching experiences. Group 1
represented participants who have been teaching English for 1-6 years (n=3). Group 2 indicated participants who
have been teaching English for 7-15 years (n=12). Group 3 showed participants who have been teaching English
for 16-25 years (n=10). Group 4 expressed participants who have been teaching English for more than 26 years
(n=7).
Table 9. Frequency of distribution of years of teaching experiences in four groups
Groups 1-4
Frequency
Percentage
%
Valid Percentage
%
Cumulative
Percent %
1 (1-6 years)
2 (7-15 years)
3 (16-25 years)
4 (26 year and more)
Total
3
12
10
7
32
9.4
37.5
31.3
21.9
100.0
9.4
37.5
31.3
21.9
100.0
9.4
46.9
78.1
100.0
Note. n=32.
3.10 The Means of the Influences of SRI on Taiwanese EFL University Instructors Based on Their Years of
Teaching Experience
Four open-ended interview questions (Q1, Q3, Q7, and Q8) reflecting the first part of six quantitative survey
questionnaires which were designed to investigate EFL instructors’ perceptions toward SRI. The survey
questionnaires were (1) in general, I have positive attitude toward SRI; (2) I am enthusiastic and confident about
the result of SRI; (3) I spend more time preparing my class according to SRI result; (4) I think if I am more open
to students’ opinions, the results will be more positive; (5) I care about the quality of SRI; (6) I am always
satisfied with the result of SRI. Based on the analysis of participants’ interview transcripts, two themes, four
subthemes and four issues emerged in order to answer the first research question. The findings to the first
research question are structured in Table 10.
Table 10. Structure of the qualitative findings: Research Question 1
Research Question 1
Themes
Theme 1:
The
university
EFL
Instructors’ Perceptions of
SRI
Theme 2:
The role of SRI
Subthemes
 Experiences of receiving the results of
SRI
Issues
Negative
Implementation of SRI in EFL classroom
Objective
 Opinions after receiving the result of
EFL SRI
The purpose of SRI
 Suggestions after receiving the result of
EFL SRI
 The real situation of SRI
in universities in Taiwan.
3.11 Quantitative Findings: Null Hypotheses 1
H10: No association exists between EFL university instructors’ perceptions and SRI.
Table 11 reported that the researcher failed to reject the first null hypothesis which stated that there was not an
association between EFL university instructors’ perceptions and student rating of instructions based on a
significance level of .149 in item 4 (EFL instructors become more open to SRI receive better ratings). The
significance level of .804 in item 5 (EFL instructors care about the quality of SRI) accepted the first null
hypothesis. Besides, the first null hypothesis, which stated that there was not an association between EFL
university instructors’ perceptions and student rating of instructions was rejected based on a significance level
60
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
of .000 in item 1 (EFL instructors have positive attitude toward SRI). The significance level of .000 in item 2
(EFL instructors are confident in the results of SRI) rejected the first null hypothesis. Also, the first null
hypothesis was rejected based on the significance level of .003 in item 3 (EFL instructors prepare lessons based
on the results of SRI). The significance level of .000 in item 6 (EFL instructors satisfy with the results of SRI)
rejected the null hypothesis. As hypothesized, Cillessen and Lafontana (2002) stated that teachers’ perceptions
affect their behavior and classroom practices. The more teachers learn about their students, the more they are
able to design effective experiences that elicit real learning. Borg (2006) noted that understanding teacher
perception is central to the process of understanding teaching. Research also indicated that teachers who are
willing to develop their teaching skills were open-minded in listening to feedback from their students (Chang,
Wang, & Yong, 2003).
Table 11. The summary of chi-square testing for Null Hypothesis 1
Items 1-6
Sig
Null Hypothesis 1
Accept/Reject
1. SRI is an effective instrument for EFL instructors to improve
instructional delivery.
2. The results of SRI provide EFL instructors authentic
information in developing lessons.
3. EFL instructors become more supportive in students’ learning
after receiving the results of SRI.
4. The results of SRI provide positive encouragement for EFL
instructors.
5. Criticism from SRI does not influence EFL instructors’ teaching
performance.
6. SRI is an effective communicative bridge between EFL
instructors and students.
.000
Reject
.000
Reject
.003
Reject
.149
Accept
.804
Accept
.000
Reject
Note. A P-value of .05 or less was used to reject the null hypotheses.
3.12 Quantitative Findings: Null Hypothesis 2
H20: No association exists between the impact of SRI and classroom instruction.
Table 12 reported the summary of the Chi-Square test of second null hypothesis, which stated that there was no
association between the impact of SRI and classroom instruction. The researcher failed to reject the second null
hypothesis which stated that there was not an association between SRI and classroom instruction based on a
significance level of .080 in item 10 (The results of SRI provide positive encouragement for EFL instructors) and
a significance level of .102 in item 14 (SRI improves EFL instructors’ classroom management). The second null
hypothesis, which stated that there was not an association between SRI and classroom instruction was rejected
based on a significance level of .002 in item 7 (EFL instructors have positive attitude toward SRI), a significance
level of .016 in item 8, a significance level of .005 in item 9, a significance level of .004 in item 11, a
significance level of .000 in item 12, a significance level of .002 in item 13 (SRI gives EFL instructors ideas for
teaching students with special needs), a significance level of .002 in item 15(EFL instructors will not change the
knowledge and understanding of instructional practices after receiving the results of SRI), a significance level
of .001 in item 16 (The results of SRI provide trustworthy information for EFL instructors), a significance level
of .021 in item 17 (Students’ achievements influence the results of SRI), a significance level of .016 in item 18
(If I improve the quality of the English instruction, I will receive higher ratings from students), a significance
level of .006 in item 19 (I received an unpleasant rating score in the past, so I changed my English teaching
strategies), a significance level of .001 in item 20 (After I changed English teaching strategies, I received better
results of SRI), and a significance level of .000 in item 21 (Unpleasant scores of SRI will not decrease my
passion toward English teaching).
The current findings concurred with the hypothesis that an association existed between the influence of SRI and
classroom instruction. Teacher evaluation provided information to faculty about teaching effectiveness (Biggs,
2003; Ramsdem, 2003; Yorke, 2003) and to students about how they can improve their learning and how well
they are doing in the course (Carless et al., 2007; Gibbs 2006). Liu (2011) stated that teachers’ classroom
61
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
presentation is equivalent to teacher appraisal and teacher performance. Furthermore, “since 28th December
1995, the 21st Regulation of the University Act states that a college should formulate a teacher evaluation system
that decides on teacher promotion, and continues or terminates employment based on college teachers’
achievement in teaching, research and so forth” (Liu, 2011, p. 4). “Universities started to formulate school
regulations based on the University Act and began executing teacher education. According to the official
documentation, 60% of the colleges stipulate that teachers have to pass the evaluation before receiving a
promotion” (Liu, 2011, p. 4). EFL instructors’ perceptions and experiences toward SRI will affect their approach
to teaching. In other words, assessment attitudes and experiences by EFL students will also influence their way
of learning.
Table 12. The results of chi-square testing for Null Hypothesis 2
Items 7-21
Sig
Null Hypothesis 2
Accept/Reject
7. SRI is an effective instrument for EFL instructors to improve
instructional delivery.
8. The results of SRI provide EFL instructors authentic
information in developing lessons.
9. EFL instructors become more supportive in students’ learning
after receiving the results of SRI.
10. The results of SRI provide positive encouragement for EFL
instructors.
11. Criticism from SRI does not influence EFL instructors’ teaching
performance.
12. SRI is an effective communicative bridge between EFL
instructors and students.
13. SRI gives instructors ideas for teaching students with special
needs.
14. SRI improves EFL instructors’ classroom management.
15. EFL instructors will not change the knowledge and
understanding of instructional practices after receiving the results of
SRI
16. SRI provides trustworthy information for EFL instructors.
17. Students’ achievements influence the results of SRI.
18. If I improve the quality of the English instruction, I will receive
higher ratings from students.
19. I received an unpleasant rating score in the past, so I changed
my English teaching strategies.
20. After I changed English teaching strategies, I received better
results of SRI.
21. Unpleasant scores of SRI will not decrease my passion toward
English teaching.
.002
Reject
.016
Reject
.005
Reject
.080
Accept
.004
Reject
.000
Reject
.002
Reject
.102
.002
Accept
Reject
.001
.021
.016
Reject
Reject
Reject
.006
Reject
.001
Reject
.000
Reject
Note. A P-value of .05 or less was used to reject the null hypotheses.
4. Discussions
The results uncovered that EFL instructors’ teaching attitudes and motivation were being diminished simply
because teachers overwhelmingly expressed that SRI did not provide them useful feedback on their performance
in the classroom. EFL instructors were not willing to take risk in assigning works, carrying out tests, or
addressing needs in supporting student in learning. The results of SRI were hardly for EFL instructors used to
make important decisions for improving the quality of instruction/education. In fact, SRI was considered an
indicator of instructors’ performance when it came time to dismiss them. The findings highlighted the northern
Taiwanese EFL instructors’ perceptions toward SRI and the influence of SRI on EFL instructors’ classroom
62
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
instruction. Faculties were more likely to disagree on the effectiveness of SRI and pointed out the increasing
issues of SRI. Broadly negative feedback accompanied by small numbers objective feedback may provide us
with indicators about the different value perceptions and influences adopted by northern Taiwanese EFL
university instructors. As the results of quantitative data showed 87% of the items from the second part of the
influence of SRI on EFL university instructors had associations between SRI and classroom instruction. It was
interesting to note that EFL instructors seemed to distrust the results of SRI. The possible explanations of the
negative perceptions could indicate that EFL instructors were sensitive to the factors that the results of SRI were
considered for tenure, promotion, and employment status which reflects Cross, Dooris and Weinstein’s theory in
2004. SRI raises environmental level effects such as hiring, retention, and dismissal. They were highly public
acts justified through the evaluation process. Students’ perceptions of SRI may differ from the faculty members
because students may not realize how the results of teacher evaluation may be used by administrators. As a result,
students may not know the consequences of teaching. Administrators and educators need to understand factors
that influence EFL instructors’ classroom instruction so that they will be able to develop a reasonable
environment in merit raises, promotion, and tenure decisions.
References
American Psychological Association. (1972). Ethical standards of psychologists. Washington, DC: American
Psychological Association.
Anderson, C. A., Gentile, D. A., & Buckley, K. E. (2007). Violent video game effects on children and adolescents:
Theory, research and public policy.
Beran, T., Violato, C., Kline, D., & Frideres, J. (2005). The utility of student ratings of instruction for students,
faculty, and administrators: a consequential validity study. The Canadian Journal of Higher Education, 2,
49-70.
Biggs, J. (2003). Teaching for quality learning at university (2nd ed.). Buckingham: Society for Research into
Higher Education/Open University Press.
Borg, S. (2006). Teacher cognition and language education: Research and practice. London: Continuum.
Carless, D., Joughin, G., & Mok. M. M. C. (2007). Learning-oriented assessment: Principles and practice.
Assessment & Evaluation in Higher Education, 31, 395-398.
Chang, J. L, Wang, W. Z., & Yong, H. (2003). Measurement of Fracture Toughness of Plasma-Sprayed Al2O3
Coatings Using a Tapered Double Cantilever Beam Method. Journal of the American Ceramic Society,
86(8), 1437-1439. https://doi.org/10.1111/j.1151-2916.2003.tb03491.x
Chang, T-S. (2002). Student ratings of instruction. Taipei, Taiwan: Yung Zhi.
Cillessen, A. H. N., & Lafontana, K.M. (2002). Children’s perceptions of popular and unpopular peers: A
multimethod
assessment.
Developmental
Psychology,
38(5),
635-647.
https://doi.org/10.1037/0012-1649.38.5.635
Fraenkel, J. R., & Wallen, N. E. (2003). How to design and evaluate research in education (5th ed.). Boston:
McGraw-Hill.
Gibbs, G. (2006). How assessment frames student learning. In C. Bryan, & K. Clegg (Eds.), Innovative
Assessment in Higher Education (pp. 23-36). London: Routledge.
Cooper, D., & Schindler, P. S. (2006). Business research methods (9th ed.). New York: McGraw-Hill Companies,
Inc.
Greenwald, A. G. (2002). Constructs in student ratings of instructors. In H. I. Braun, D. N. Jackson, & D. E.
Wiley (Eds.), The role of constructs in psychological and educational measurement, 24(3), 193-202, New
York: Erlbaum.
Guthrie, E. R. (1954). The evaluation of teaching: A progress report. Seattle: University of Washington.
Liu, C-W. (2011). The implementation of teacher evaluation for professional development in primary education
in Taiwan. (Doctoral dissertation). Retrieved from Dissertation.com, Boca Raton, Florida.
Ministry of Education (Taiwan) (MOE). (2005). Ministry of Education News: college law. Retrieved from
October 31, 2016, from http://tece.heeact.edu.tw/main.php.
Murray, H. G. (2005). Student evaluation of teaching: has it made a difference? In the Annual meeting of the
society for teaching and learning in higher education, June 2005 (pp.1-15). Charlottetown, Prince Edward
63
elt.ccsenet.org
English Language Teaching
Vol. 11, No. 1; 2018
Island, Canada.
Obenchain, K. M., Abernathy, T. V., & Wiest, L. R. (2001). The reliability of students’ ratings of faculty teaching
effectiveness. College Teaching, 49(3), 100-104. https://doi.org/10.1080/87567550109595859
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge.
Seldin, P. (1993a). How colleges evaluate professors: 1983 versus 1993. AAHE Bulletin, 12, 6-8
Seldin, P. (1993b). The use and abuse of student ratings of professors. Bolton, MA:Anker.
Sproull, J. (2002). Personal communication with authors, University of Edinburgh.
Teddlie, C., & Yu, F. (2007). Mixed methods sampling: a typology with examples. Journal of Mixed Methods
Research, 1(1), 77-100. https://doi.org/10.1177/2345678906292430
West-Burnham, J., O’Neill, J., & Bradbury, I. (Eds.) (2001). Performance management in schools: How to lead
and manage staff for school improvement. London, UK: Person Education.
Wolfer, T., & Johnson, M. (2003). Re-evaluating student evaluation of teaching: The Teaching Evaluation Form.
Journal of Social Work Education, 39, 111-121.
Yorke, M. (2003). Formative assessment in higher education: Moves towards theory and enhancement of
pedagogic practice. Higher Education, 45, 477-501. https://doi.org/10.1023/A:1023967026413
Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution
license (http://creativecommons.org/licenses/by/4.0/).
64
Advances in Engineering Education
FALL 2020 VOLUME 8 NUMBER 4
Supportive Classroom Assessment for Remote Instruction
RENEE M. CLARK
MARY BESTERFIELD-SACRE
AND
APRIL DUKES
University of Pittsburgh
Pittsburgh, PA
ABSTRACT
During the summer 2020, when remote instruction became the norm for universities due to
COVID-19, expectations were set at our school of engineering for interactivity and activity within
synchronous sessions and for using technology for engaging asynchronous learning opportunities.
Instructors were asked to participate in voluntary assessment of their instructional techniques, and
this “supportive” assessment was intended to enable growth in remote teaching as well as demonstrate excellence in the School’s instruction. Preliminary results demonstrated what is possible
with voluntary assessment with a “support” focus – namely instructor willingness to participate and
encouragement in the use of desirable teaching practices.
Key words: Assessment, COVID-19, remote learning
INTRODUCTION AND BACKGROUND
For many faculty, the last five weeks of the spring 2020 semester represented a time of “­persisting
through” to the end of the semester after a heavily-unforeseen, rapid change from ordinary campus life and learning to remote education. At the University of Pittsburgh’s Swanson School of
­Engineering, there were different expectations, however, for the summer 2020 semester, as the
­Associate Dean for Academic Affairs established a “new norm” for remote instruction by setting
expectations regarding interactivity and activity in synchronous classroom sessions as well as the
use of technology for creating engaging, high-quality asynchronous learning resources. These expectations were supported by multiple synchronous training sessions for faculty prior to the start
FALL 2020 VOLUME 8 NUMBER 4
1
ADVANCES IN ENGINEERING EDUCATION
Supportive Classroom Assessment for Remote Instruction
of the summer semester. In addition, instructors were asked to participate in voluntary assessment
of their ­summer instruction via interviews with and classroom observation by the School’s Assessment Director. This voluntary activity had a two-fold purpose, namely 1) to perform “supportive,”
as opposed to summative, assessment, to enable growth and development in remote online teaching, and 2) to demonstrate to others excellence in the School’s instruction. The authors believe this
voluntary program was particularly noteworthy because it was considered an assessment program;
however, a very supportive aspect was also involved, namely upfront planning assistance (via an
instructional checklist developed via faculty discussions), in-class coaching and observation, and
follow-up formative verbal and written feedback. Thus, this voluntary “assessment” program had
concomitant supportive aspects.
This supportive assessment program consisted of both 1) one-on-one instructional planning and
coaching intended to encourage participation, and 2) formative assessment and feedback. This
program was rooted in previous work by the Assessment Director (AD), in which she had used an
individualized, social-based approach involving instructional coaching to propagate active learning
within the engineering school [1]. Her previous work was based on the writings of Charles ­Henderson,
Dancy, and colleagues, which advanced the idea that educational change may best occur through
socially-driven and personalized practices, such as informal communication, interpersonal networks,
collegial conversations, faculty communities, and support provided during change and implementation [2–4]. The AD’s previous work was also grounded in the professional development literature
indicating that adult professional learning must be personalized, including support with upfront
planning, during classroom implementation, and via evaluation [5–7]. Classroom observation is one
such form of support during classroom implementation [6–11].
METHODS
In the two weeks prior to the start of the summer semester, synchronous training and information sessions via Zoom video conferencing were held for instructors to promote desired teaching
techniques and approaches in the remote online environment. The training and information sessions, which were one hour in length and conducted during the lunch hour, covered the following
topics: 1) Online Classroom Organization and Communication, 2) Using Zoom for Active Learning,
3) ­Active Learning with Classroom Assessment Techniques (CATs), 4) Inclusive Online Teaching,
and 5) Voluntary Supportive Assessment.
During the information session on voluntary assessment, the Assessment Director described the
plan shown in Table 1, which was based on the framework discussed in Introduction & Background.
2
FALL 2020 VOLUME 8 NUMBER 4
ADVANCES IN ENGINEERING EDUCATION
Supportive Classroom Assessment for Remote Instruction
Table 1. Voluntary Assessment Program.
1. Individual interview with instructor (e.g., Zoom, phone, email)
a. Review Planning and Observational Checklist
b. Discuss plans for classroom observation (if applicable and desired)
c. Discuss plans for other support or review (e.g., review of course materials) if desired
2. Observe class session if applicable
a. Provide written feedback to instructor
3. Provide other review or support as desired
a. Provide written feedback to instructor
4. Provide acknowledgment of instructor participation to Associate Dean
5. Future discussion, interview, or email communications with instructor (as follow-up)
6. Create concise written summary (e.g., table/template) whereby excellence in teaching can be demonstrated
Thus, the assessment program was socially-based and involved one-on-one discussions with each
instructor about his/her instructional plans, classroom observation using the COPUS observational
protocol [12], determination of additional types of review or support desired, provision of written
feedback to the instructor, and future follow-up communications with the instructor. The initial
interview/discussion with the instructor was guided by a customized checklist created by a faculty
team to assist the instructor with his/her planning as well as enable the Assessment Director to
document actual practices observed or otherwise determined. The various sections of the checklist
are as follows: 1) Synchronous instruction and methods for interactivity, activity, and “changing up”
of lecture, 2) Asynchronous instruction, including flipped instruction, and methods such as videos,
readings, accountability quizzes, and in-class exercises, 3) Learning Management System (LMS) use
and organization, 4) Communication methods with students, 5) Assessment of learning approaches,
submission methods, and student feedback plans, and 6) Academic integrity promotion.
Given that the program was voluntary, each instructor’s participation was acknowledged to the
Associate Dean in a weekly bulk email. This email described desirable practices witnessed during
assessment activity with the instructor that week (e.g., via classroom observation). Each instructor discussed in the email was cc’d to drive community among the participants, with the hope of
potentially creating small learning communities.
PRELIMINARY RESULTS
Of the 31 summer instructors, 16 (52%) volunteered to participate in the assessment following the
information session. We believe this participation metric was noteworthy given the program was
FALL 2020 VOLUME 8 NUMBER 4
3
ADVANCES IN ENGINEERING EDUCATION
Supportive Classroom Assessment for Remote Instruction
one of voluntary-based assessment. This “supportive” assessment proactively began immediately at
the start of the summer semester. At approximately five weeks into the summer semester, an initial
interview, classroom observation, and/or “other review” had occurred with 15 instructors and so
the assessment was formative and supportive, versus summative. A plan was made to observe the
remaining instructor later in the summer given the schedule of the course. The following examples of
desirable instructional practices, which were communicated to the Associate Dean, were observed
by the Assessment Director:
• Not only did Instructor 1 create a classroom in which the expectation was activity and
­engagement, but his flipped classroom was notable for the positive environment in which he
thanked students for their responses, randomly asked students if they would mind answering
questions, and always provided positive feedback on the responses. The classroom execution
was flawless, including circulation among 11 breakout rooms for group work.
• Instructor 2 made use of the Top Hat software and simple classroom assessment techniques
(CATs), such as the Minute Paper, to drive interactivity and engagement. He also desired to
use Zoom for this purpose (i.e., Polling or Chat window).
• Instructor 3 created an asynchronous class design using Panopto videos with embedded
accountability quizzes and reflective questions, all exceptionally laid out for students in
Canvas. She held a live Zoom Q&A session to highlight the week’s material, pose questions, and answer questions. The students responded to questions and asked their own
questions.
• Instructor 4 ran a blended classroom, in which he conducted both synchronous Zoom lecture
sessions and provided content videos via Panopto. Students took a quiz in Canvas to drive
accountability with the videos during class. There was interactive lecture, in which students
were highly responsive by asking and answering questions via chat and verbally.
These sample results demonstrate what is possible with a voluntary assessment program with
a “support” focus given strong leadership that provides learning and training opportunities for
instructors – namely instructor willingness to participate as well as support for desirable teaching
practices. An anonymous survey distributed to the instructors near the end of the semester indicated an average rating of 3.88 on a 5-point scale regarding the helpfulness and usefulness of the
classroom observation and other formative feedback offered (57% response rate). In the words of
one participant, “I got a professional review of my strategy for remote teaching, and a check on my
early implementation. Assessment provided me with a positive reinforcement that gave me assurance
and encouraged me to move forward. I was offered a broad range of helpful support that reassured
me that I could rely on opportune help when needed. I do appreciate it very much!” In the words of
another, “…Also, just the act of being evaluated makes me reflect more on my teaching methods.”
4
FALL 2020 VOLUME 8 NUMBER 4
ADVANCES IN ENGINEERING EDUCATION
Supportive Classroom Assessment for Remote Instruction
NEXT STEPS AND FUTURE PLANS
Given the relatively larger number of courses in the fall semester, this assessment program will
be continued on an “as requested” basis for instructors. It is worth noting that there was a time
commitment by the Assessment Director and that (in general), individualized coaching is time-wise
expensive [13]. However, evidence suggests that the effectiveness of professional development
for instructors, including coaching, is positively associated with the intensity of the support [14].
Thus, seeing what was possible with this supportive voluntary assessment program in the summer
suggests that committing the right resources (i.e., both in number and supportiveness) may be an
avenue to propelling remote instruction to higher levels.
REFERENCES
1. Clark, R., Dickerson, S., Bedewy, M., Chen, K., Dallal, A., Gomez, A., Hu, J., Kerestes, R., & Luangkesorn, L. (2020). SocialDriven Propagation of Active Learning and Associated Scholarship Activity in Engineering: A Case Study. ­International
Journal of Engineering Education, 36(5), 1–14.
2. Dancy, M., Henderson, C., & Turpen, C. (2016). How faculty learn about and implement research-based ­instructional
strategies: The case of peer instruction. Physical Review Physics Education Research, 12(1), 010110-010110–17.
3. Dancy, M., & Henderson, C. (2010). Pedagogical practices and instructional change of physics faculty. American
Journal of Physics, 78(10), 1056–1063.
4. Foote, K., Neumeyer, X., Henderson, C., Dancy, M., & Beichner, R. (2014). Diffusion of research-based instructional
strategies: the case of SCALE-UP. International Journal of STEM Education, 1(1), 1–18.
5. Rodman, A. (2019). Personalized Professional Learning: A Job-Embedded Pathway for Elevating Teacher Voice.
­Alexandria, VA: ASCD, pp. 1–9.
6. Desimone, L. M., & Pak, K. (2017). Instructional coaching as high-quality professional development. Theory Into
­Practice, 56(1), 3–12.
7. Rhodes, C., Stokes, M., & Hampton, G. (2004). A practical guide to mentoring, coaching and peer-networking: Teacher
professional development in schools and colleges. London: Routledge, pp. 25, 29–30.
8. Braskamp, L., & Ory, J. (1994). Assessing Faculty Work. San Francisco: Jossey-Bass Inc., 202.
9. Keig, L., & Waggoner, M. (1994). Collaborative peer review: The role of faculty in improving college teaching.
ASHE-ERIC Higher Education Report No. 2. Washington, DC: The George Washington University, School of Education
and Human Development, 41–42.
10. Reddy, L. A., Dudek, C. M., & Lekwa, A. (2017). Classroom strategies coaching model: Integration of formative
­assessment and instructional coaching. Theory Into Practice, 56(1), 46–55.
11. Gallucci, C., Van Lare, M., Yoon, I., & Boatright, B. (2010). Instructional coaching: Building theory about the role
and organizational support for professional learning. American Educational Research Journal, 47(4), 919–963.
12. Smith, M., Jones, F., Gilbert, S., & Wieman, C. (2013). The classroom observation protocol for undergraduate
STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE-Life Sci. Educ., 12(4), 618–627.
FALL 2020 VOLUME 8 NUMBER 4
5
ADVANCES IN ENGINEERING EDUCATION
Supportive Classroom Assessment for Remote Instruction
13. Connor, C. (2017). Commentary on the special issue on instructional coaching models: Common elements of
­effective coaching models. Theory into Practice, 56(1), 78–83.
14. Devine, M., Houssemand, C., & Meyers, R. (2013). Instructional coaching for teachers: A strategy to implement new
practices in the classrooms. Procedia-Social and Behavioral Sciences, 93, 1126–1130.
AUTHORS
Renee M. Clark is Research Assistant Professor of Industrial Engineering and
Director of Assessment for the Swanson School of Engineering at the University
of Pittsburgh. Dr. Clark’s research focuses on assessment of active learning and
engineering professional development initiatives. Her research has been funded
by the NSF and the University of Pittsburgh’s Office of the Provost.
Mary Besterfield-Sacre is Nickolas A. DeCecco Professor, Associate Dean for
Academic Affairs, and Director of the Engineering Education Research Center in
the Swanson School of Engineering at the University of Pittsburgh. Dr. Sacre’s
principal research is in engineering education assessment, which has been
funded by the NSF, Department of Education, Sloan Foundation, Engineering
Information Foundation, and VentureWell.
April Dukes is the Faculty and Future Faculty Program Director for the
­Engineering Education Research Center in the Swanson School of Engineering
at the University of Pittsburgh. Dr. Dukes facilitates professional development
on instructional best practices for current and future STEM faculty for both
synchronous online and in-person environments.
6
FALL 2020 VOLUME 8 NUMBER 4
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
Timeless Principles for Effective Teaching and Learning: A Modern
Application of Historical Principles and Guidelines
R. Mark Kelley1,*, Kim Humerickhouse2, Deborah J. Gibson3 & Lori A. Gray1
1
School of Interdisciplinary Health Programs, Western Michigan University, Kalamazoo, MI, USA
2
Department of Teacher Education, MidAmerica Nazarene University, Olathe, KS, USA
3
Department of Health and Human Performance, University of Tennessee at Martin, Martin, TN, USA
*Correspondence: School of Interdisciplinary Health Programs, Western Michigan University, 1903 W. Michigan
Ave., Kalamazoo, MI, 49008, USA. Tel: 1-269-387-1097. E-mail: mark.kelley@wmich.edu
Received: February 13, 2021
doi:10.5430/wje.v11n3p1
Accepted: May 23, 2021
Online Published: June 2, 2021
URL: https://doi.org/10.5430/wje.v11n3p1
Abstract
The purpose of this study is twofold: (a) to assess the perceived relevance of the Seven Timeless Principles and
guidelines posited by Gregory (1886) for current educators and educators-in-training and (b) to develop and pilot test
the instrument needed to accomplish the former. The “Rules for Teachers” Gregory attributes to each of these laws
were used as guidelines to develop an assessment instrument. Eighty-four educators and future educators across three
universities participated in an online survey using a 4-point Likert scale to evaluate the consistency of Gregory’s
guidelines with modern best-teaching practices. Responses were framed within the Timeless Principles, providing a
measure of pedagogical universality. Total mean scores for all principles and guidelines were greater than 3.0,
suggesting that Gregory had indeed identified foundational principles of teaching and learning that maintain
relevance across academic disciplines and in a variety of settings in which learning occurs.
Keywords: teaching and learning, principles of teaching, historical pedagogy, educational principles
1. Introduction
In 1886, John Milton Gregory published a book entitled The Seven Laws of Teaching that offered a set of principles
to support and strengthen teachers’ capabilities systemically and comprehensively. The primary purpose of this study
was to explore whether Gregory’s principles are consistent with faculty and student perceptions of 21st century best
teaching practices. To accomplish the primary purpose, a secondary goal of the study was to pilot and provide
evidence of reliability and validity of an instrument based on Gregory’s principles and guidelines. The study
evaluated the value and relevance of these 19th century principles to modern teachers via a researcher-developed
instrument using the guidelines established within each of Gregory’s principles and then presented results to validate
concept transferability. After examining the basic structure of The Seven Laws of Teaching in the context of modern
approaches, we suggest that these seven laws represent Timeless Principles of the science and art of teaching.
1.1 Background
Discussions of foundational principles that frame effective teaching are not unique to Gregory, and the educational
literature contains an abundance of suggested principles, strategies, and guidelines. Thorndike (1906) identified three
essential principles that included readiness, exercise, and effect. The Law of Readiness suggested that a child must be
ready to learn in order to learn most efficiently. It is the responsibility of the teacher to develop the readiness to learn
in the student. The Law of Exercise is further divided into the Law of Use and Law of Disuse. Repetition strengthens
understanding, and practice makes perfect. Conversely, if one does not “use it,” they tend to “lose it.” It is the
responsibility of the teacher to ensure practice is interesting and meaningful in order to enhance learning.
Thorndike’s Law of Effect suggests that: (a) actions that elicit feelings of pleasure and satisfaction enhance effective
learning, (b) any action met with frustration and annoyance will likely be avoided, and (c) success breeds success
and failure leads to further failure.
Published by Sciedu Press
1
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
Rosenshine and Furst (1971) conducted what is considered the first literature review of the research addressing
principles for effective teaching. They outlined five “most important” teacher-effectiveness variables, which include:
clarity, variability, enthusiasm, task-oriented behavior, and student opportunity to learn criterion material. Almost 30
years later, Walls (1999) posited four similar criteria, including outcomes, clarity, engagement, and enthusiasm. Walls
stressed that it is important for students to understand the direction in which the teacher is guiding their
learning—and the teacher’s intentions for going there—by providing clear goals and related learning outcomes. It is
vital to build upon what students already know while making material as clear as possible.
In 1987, Chickering and Gamson posited seven principles that they argued are representative of good practice in
undergraduate education: (a) encourage contacts between students and faculty, (b) develop reciprocity and
cooperation among students, (c) use active learning techniques, (d) give prompt feedback, (e) emphasize time on task,
(f) communicate high expectations, and (g) respect diverse talents and ways of learning. These seven principles are
“intended as guidelines for faculty members, students, and administrators to improve teaching” (Chickering &
Gamson, 1987, p. 3).
Walls (1999) agreed with Thorndike (1906) that students must be engaged to learn, stressing the importance of active
learning, which encompasses aspects of Thorndike’s laws. Students must be engaged to learn, as people learn what
they practice (Law of Exercise). Both the student and the teacher should be enthusiastic about the learning (Law of
Effect); if the teacher does not enjoy the teaching, how can students be expected to enjoy the learning?
More recently, distinct approaches have offered an element of novelty but ultimately integrated pre-existing
principles. Perkins (2008) used baseball as a metaphor to depict his principles of teaching. The principles set the
stage for what Perkins further referred to as conditions and principles of transfer. The principles include: (a) play the
whole game (develop capability by utilizing holistic work); (b) make the game worth playing (engage students
through meaningful content); (c) work on the hard parts (develop durable skills through practice, feedback, and
reflection); (d) play out of town (increase transfer of knowledge with diverse application of experiences); (e) play the
hidden game (sustain active inquiry); (f) learn from the team (encourage collaborative learning); and (g) learn the
game of learning (students taking an active role in their learning).
Tomlinson’s (2017) differentiation emphasized the need for teachers to respond dynamically within a given
classroom by varying (“differentiating”) instruction to meet student needs. Conceptually, Tomlinson identified
respectful tasks, ongoing assessment and adjustment, and flexible grouping as general principles driving
differentiation while identifying the primary domains of the teacher (content, process, and product) and the student
(readiness, interests, and learning profile).
Beyond the contributions of individual approaches, the past 20 years have also seen an increase in collaborative,
research-based recommendations for educational principles that draw upon the experiences of educators, researchers,
and policymakers. Workforce entry and academic preparation for college have been the primary aspects of these
recommendations. The InTASC Model Core Teaching Standards delineated competencies based on key principles
that are intended to be mastered by the teacher (Council of Chief State School Officers, 2011). It is anticipated that
proficiency in these standards supports sufficient preparation for K-12 students to succeed in college and to obtain
the skill sets needed for a future workplace. Preparing 21st Century Students for a Global Society set forth four skills
found to be most important, including critical thinking, communication, collaboration, and creativity, and stated,
“What was considered a good education 50 years ago, however, is no longer enough for success in college, career,
and citizenship in the 21st century” (National Education Association, 2012, p. 3).
In specific academic disciplines, similar discussions and statements have been made. For example, in the field of
health education and promotion, Auld and Bishop (2015) stated that “given today’s rapid pace of change and health
challenges, we are called to identify, adapt and improve key elements that make teaching and learning about health
and health promotion successful” (p. 5). Pruitt and Epping-Jordan (2005) discussed the need to develop a new
approach to training for the 21st century global healthcare workforce. Regardless of approach or discipline, there is a
clear desire among educators to identify a universal set of principles to guide effective teaching.
1.2 Overview of the Seven Laws of Teaching
Gregory (1886) drew upon the metaphor of examining natural laws or phenomena to define the foundational
principles that govern effective teaching. In step with what is now recognized as a positivist paradigm, Gregory
believed that in order to understand such laws, one must subject the phenomenon to scientific analysis and identify
its individual components. Gregory (1886) posited that the essential elements of “any complete act of teaching” are
composed of:
Published by Sciedu Press
2
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
Seven distinct elements or factors: (1) two personal factors—a teacher and a learner; (2) two mental factors—a
common language or medium of communication, and a lesson or truth or art to be communicated; and (3) three
functional acts or processes—that of the teacher, that of the learner, and a final or finishing process to test and
fix the result. (p. 3)
Further, he argued that regardless of whether that which to be learned is a single fact requiring a few minutes or a
complex concept requiring a lesson of many hours, all seven of these factors must be present if learning is to occur;
none can be missing. For the purposes of this article, the concept of a “law” of teaching as expressed by Gregory
(1886) has been re-termed to be a “principle.” We also embraced Gregory’s general grouping of these elements as
key dimensions of the Seven Principles (i.e., actors, mental factors, functional processes, and finishing acts).
1.2.1 The Seven Principles Stated
There are a variety of ways that these seven principles can be expressed. Gregory (1886) first stated the overarching
principles, then expressed them as direct statements for teachers to follow in their pursuits. Below are the principles
exactly as Gregory wrote them (emphasis his own):
1) The Principle of the Teacher: A teacher must be one who KNOWS the lesson or truth or art to be taught... [As
expressed to teachers:] Know thoroughly and familiarly the lesson you wish to teach,—teach from a full mind and a
clear understanding.
2) The Principle of the Learner: A learner is one who ATTENDS with interest to the lesson given.… [As expressed
to teachers:] Gain and keep the attention and interest of the pupils upon the lesson. Do not try to teach without
attention.
3) The Principle of the Language: The language used as a MEDIUM between teacher and learner must be
COMMON to both... [As expressed to teachers:] Use words understood in the same way by the pupils and
yourself—language clear and vivid to both.
4) The Principle of the Lesson: The lesson to be mastered must be explicable in terms of truth already known by the
learner—the UNKNOWN must be explained by means of the KNOWN… [As expressed to teachers:] Begin with
what is already well known to the pupil upon the subject and with what [they themselves] experienced,—and
proceed to the new material by single, easy, and natural steps, letting the known explain the unknown.
5) The Principle of the Teaching Process: Teaching is AROUSING and USING the pupil’s mind to grasp the
desired thought... [As expressed to teachers:] Stimulate the pupil’s own mind to action. Keep [their] thoughts as
much as possible ahead of your expression, placing [their] in the attitude of a discoverer, an anticipator.
6) The Principle of the Learning Process: Learning is THINKING into one’s own UNDERSTANDING a new idea
or truth… [As expressed to teachers:] Require the pupil to reproduce in thought the lesson [they are]
learning—thinking it out in its parts, proofs, connections and applications till [they] can express it in [their] own
language.
7) The Principle of Review: The test and proof of teaching done—the finishing and fastening process—must be a
REVIEWING, RETHINKING, RE-KNOWING, REPRODUCING, and APPLYING of the material that has been
taught… [As expressed to teachers:] Review, review, REVIEW, reproducing correctly the old, deepening its
impression with new thought, linking it with added meanings, finding new applications, correcting any false views,
and completing the true. (Gregory, 1886, pp. 5-7)
1.2.2 Essentials of Successful Teaching Using the Seven Principles
There are a variety of understandings that are essential for applying these Seven Principles to effective teaching. The
first understanding is that the Seven Principles are both necessary and sufficient for effective teaching. Gregory
(1886) stated that “these rules, and the laws which they outline and presuppose, underlie and govern all successful
teaching. If taken in their broadest meaning, nothing need be added to them; nothing can be safely taken away” (p. 7).
He posited that when these principles are used in conjunction with “good order,” no teacher need be concerned about
failing as a teacher, provided each principle is paired with effective behavior management. Thus, Gregory indicated
that profound understanding and consistent application of these principles forms the foundation for all successful
teaching and learning experiences.
Another understanding essential for successful teaching with the principles is the deceptiveness of their simplicity. At
first review, it is easy for the reader to conclude that these principles “seem at first simple facts, so obvious as
scarcely to require such formal statement, and so plain that no explanation can make clearer their meaning” (Gregory,
1886, p. 8). As one begins to examine the applications and effects of these principles, it becomes apparent that while
Published by Sciedu Press
3
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
there is constancy, there is also opportunity for variation as each teacher finds their personal expression of each
principle.
The functionality of the principles is not temporally constrained; the principles are as applicable for the 21st century
teacher as they were for teachers of the 19th century. For example, while the language of the learners of the 1800s
was likely to have been substantially different from the language of the learners of the 2000s, teachers must prepare
their lesson with the language of their learners in mind regardless of the century in which they taught or are teaching.
Gregory’s (1886) principles offer a basis for modern strategies and theories of teaching and learning that is consistent
with broader philosophies of education. For this reason, we will refer to them as the Seven Timeless Principles.
The ubiquitous nature of these Seven Timeless Principles needs to be understood in order for the principles to be
applied in effective teaching. Gregory (1886) stated that the laws “cover all teaching of all subjects and in all grades,
since they are the fundamental conditions on which ideas may be made to pass from one mind to another, or on
which the unknown can become known” (p. 8). In this way, he suggested that the principles are just as applicable to
the elementary school teacher as they are to the college professor, equally important to the music teacher as to the
health teacher.
Associated with each principle were what Gregory (1886) described as “Rules for Teachers” (p. 31). These rules
herein subsequently will be referred to as guidelines. These guidelines detail the core components that shape each
principle. For example, a guideline under the Teacher Principle would be: “Prepare each lesson by fresh study. Last
year’s knowledge has necessarily faded somewhat” (Gregory, 1886, p. 20). A guideline posited for the Learner
Principle: “Adapt the length of the class exercise to the ages of the pupils: the younger the pupils the briefer the
lesson” (Gregory, 1886, p. 30).
1.3 Significance and Study Objective
Gregory’s (1886) original work has been recognized as making valuable contributions to the teaching and learning
process in some circles (Stephenson, 2014; Wilson, 2014). In a recent reprint of Gregory’s first edition text,
Stephenson (2014) provided supplemental materials that included study questions, self-assessment, and a sample
teacher observation form. In the same book, Wilson (2014) argued that one of the essential elements of effective
teaching is that teachers understand the distinction between the methods of teaching and the principles of teaching.
Wilson (2014) stated, “Methods change. They come and go. In the ancient world, students would use wax tablets to
take notes, and now they use another kind of tablet, one with microchips inside” (p. 4). Wilson suggested that a
teacher using the methods of wax or stone needed to know what was going to be said and why just as much as a
teacher using the methods of a smart board or computer in today’s classroom. The purpose of this study is twofold: (a)
to assess the perceived relevance of the Seven Timeless Principles and guidelines posited by Gregory (1886) for
current educators and educators-in-training and (b) to develop and pilot test the instrument needed to accomplish the
former. The research hypothesis of this study is that the principles and guidelines posited by Gregory are affirmed as
relevant by current and future educators. The approach is to translate Gregory’s guidelines into a survey instrument
capable of providing evidence of the value of the overarching principles.
2. Method
2.1 Research Design
This research was an exploratory study with a cross-sectional design that used a convenience sample. Research sites
were chosen because of their accessibility to the researchers. The research protocol was approved by the institutional
review boards (IRBs) of all of the institutions with which the authors are affiliated.
2.2 Sample and Participant Selection
The participants for this study consisted of current educators and educators-in-training. The current educators were
higher education professors from three universities ranging in size from small- to mid-sized: one in the South, one in
the Midwest, and one in the North. The educators in training participants were students enrolled in the undergraduate
teacher education programs at two of the universities. Recruitment for all participants was conducted via an email or
in-class invitation to participate in the research project by completing the survey.
Student participants were recruited from two classes: an introduction to teacher education course and a senior-level
course. Surveys were taken by students prior to participating in their student teaching experience, and bonus points
were offered for participation. Faculty participants were recruited through the faculty development process, though
participation in the process was not required to participate in the survey. All participants voluntarily completed the
Published by Sciedu Press
4
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
survey after reading and acknowledging the informed consent form.
2.3 Data Collection and Analysis
Participant invitations and all surveys were administered in the 2018 spring and fall academic semesters using
Google Forms, from which aggregated data were downloaded. Statistics for descriptive and reliability analyses were
generated using SPSS Version 26 software. Means and standard deviations were calculated for all 43 guidelines,
including all aggregate groupings for principles and dimensions. To affirm reliability of the instrument and the
subscales, Cronbach’s alphas were computed on the total scale and on each of the principle subscales.
2.4 Institutional Approvals and Ethical Considerations
The protocol of this project was approved by the IRBs of Western Michigan University, Mid-America Nazarene
University, and University of Tennessee at Martin. Prior to completing the electronic survey, each potential
participant reviewed an IRB-approved informed consent form online. Potential participants who agreed to participate
clicked on the “proceed to survey” button, which led them to the initial questions of the survey. The informed
consent notified participants that they could discontinue participation at any time.
Participant confidentiality and anonymity were protected through the security of the Google survey management
system and the encrypted, password-protected security of the investigators’ university computers. There is limited
psychometric risk to participation in an online survey. No prior psychometric data were available for the instrument,
as one purpose of this study was to pilot its use.
2.5 Instrument Development: Assessment and Measures
The instrument used in this research was developed by the authors and is based upon Gregory’s (1886) Seven
Timeless Principles. The instrument contains two basic components. The first component of the survey was basic
demographic information, including: age, binary gender, race, level of involvement in teaching, and primary
academic discipline. No identifying information beyond the above-mentioned variables was collected.
The second component of the instrument was developed directly from the guidelines for teachers described by
Gregory (1886) to measure teacher perception of the guidelines’ modern relevance. Each guideline was used as an
item on the instrument. Evidence of face validity was obtained by a panel of education professionals who reviewed
each of the guidelines for its relevance to the principle with which it was associated. In some instances, minor
changes were made to the language of Gregory’s guidelines in order present the content in more modern language.
Care was taken to ensure that each statement accurately reflected its original meaning.
The final instrument consisted of five demographic items and 43 items related to the guidelines for effective teaching,
creating a Timeless Principles Scale. The items (guidelines) associated with each of the seven principles were
combined into subscales comprised of n items (i.e., Principle of the Teacher [n = 6], Principle of the Learner [n = 6],
Principle of the Language [n = 6], Principle of the Lesson [n = 6], Principle of the Teaching Process [n = 9], Principle
of the Learning Process [n = 4], and Principle of Review and Application [n = 6]).
Using a 4-point Likert scale, participants affirmed or rejected the perceived relevance of each item (guideline) as it
relates to teacher best practices in 21st century educational settings (1 = strongly disagree to 4 = strongly agree).
Means and standard deviations were computed for the total scale, for each of the subscales, and for each of the 43
items. Responses and mean scores of 3.0 or greater were considered affirming of the relevance of the principle and/or
guideline for current teaching and learning.
3. Results
3.1 Demographics
Of the 84 educators and education students who participated in the study, 86.9% identified as White and 9.6%
identified as African American/Black, Hispanic, Asian, or Native American; 3.6% did not identify race. The majority
of participants were female (57.1%), with 39.3% participants identifying as male and 3.6% that did not identify
gender. With regard to primary discipline, health sciences was most common (25.0%), followed by physical sciences
(15.5%), behavioral sciences (13.1%), and social sciences (11.9%). Humanities, language arts, music or fine arts, and
physical education represented 8.3%, 9.5%, 4.8%, and 2.4% of disciplines, respectively.
The majority of respondents were educators in higher education settings (70.2%). Education students represented
28.6% of participant responses, and other workforce professionals represented 1.2% of the sample. Most participants
in higher education reported employment at a full-time level (44% of total), with 4.8% reporting a part-time teaching
Published by Sciedu Press
5
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
position. Of the 59 total teachers/professors, 50% stated that their education included training and course work in
effective teaching practices.
3.2 Total Scale
Mean and standard deviation scores for the total scale, the subscales, and for each item are presented in Table 1. The
mean total score for the Timeless Principles Scale (consisting of all 43 items on the instrument) was 3.37 with a
standard deviation of 0.348 (see Table 1). This result indicates that participants agreed overall, and were inclined to
strongly agree, with the guidelines and principles identified by Gregory’s (1886) laws. Cronbach’s alpha calculated
for the total scale was 0.954, indicating a high level of internal consistency and that the total scale is reliable.
Table 1. Mean, Standard Deviation, and Cronbach’s Alpha Scores for the Timeless Principles Scale
Item
Timeless Principles: Total Scale
Principle of the Teacher - An effective teacher should:
1) Prepare each lesson by fresh study. Last year’s knowledge has necessarily faded somewhat.
2) Find the connection of the lesson to the lives and duties of the learners. Its practical value lies in
these connections.
3) Keep in mind that complete mastery of a few things is better than an ineffective smattering of many.
4) Have a plan of study, but do not hesitate, when necessary, to study beyond the plan.
5) Make use of all good books and resources available to you on the subject of the lesson.
6) Get the help of the best scholars and thinkers on the topic at hand to solidify your own thoughts.
Principle of the Learner - To enhance student engagement, an effective teacher should:
7) Never exhaust wholly the learner's power of attention. Stop or change activities when signs of
attention fatigue appear.
8) Adapt the length of the class exercise to the ages of the pupils: The younger the pupils the briefer the
lesson.
9) Appeal whenever possible to the interests of your learners.
10) Prepare beforehand thought-provoking questions. Be sure that these are not beyond the ages and
attainments of your learners.
11) Make your lesson as attractive as possible, using illustrations and all legitimate devices and
technologies. Do not, however, let these devices or technologies be so prominent as to become sources
of distraction.
12) Maintain in yourself enthusiastic attention to and the most genuine interest in the lesson at hand.
True enthusiasm is contagious.
Principle of the Language (n = 6) - In order to ensure a common language, an effective teacher should:
13) Secure from the learners as full a statement as possible of their knowledge of the subject, to learn
both their ideas and their mode of expressing them, and to help them correct their knowledge.
14) Rephrase the thought in more simple language if the learner fails to understand the meaning.
15) Help the students understand the meanings of the words by using illustrations.
16) Give the idea before the word, when it is necessary to teach a new word.
17) Test frequently the learner's sense of the words she/he uses to make sure they attach no incorrect
meaning and that they understand the true meaning.
18) Should not be content to have the learners listen in silence very long at a time since the acquisition
of language is one of the most important objects of education. Encourage them to talk freely
Principle of the Lesson - In order to create an effective lesson, an effective teacher should:
19) Find out what your students know of the subject you wish to teach to them; this is your starting
point. This refers not only to textbook knowledge but to all information they may possess, however
acquired.
20) Relate each lesson as much as possible with prior lessons, and with the learner's knowledge and
experience.
21) Arrange your lesson so that each step will lead naturally and easily to the next; the known leading
to the unknown.
22) Find illustrations in the most common and familiar objects suitable for the purpose.
23) Lead the students to find fresh illustrations from their own experience.
Published by Sciedu Press
6
ISSN 1925-0746
M
3.37
3.30
3.02
SD
.348
.385
.640
α
.954
.667
-
3.48
.611
-
3.23
3.39
3.33
3.30
3.52
.704
.560
.627
.561
.376
.754
3.39
.602
-
3.43
.556
-
3.62
.513
-
3.64
.530
-
3.60
.518
-
3.45
.629
-
3.31
.414
.800
3.27
.588
-
3.54
3.37
3.11
.525
.533
.581
-
3.30
.636
-
3.25
.641
-
3.51
.415
.828
3.37
.655
-
3.65
.503
-
3.61
.515
-
3.46
3.48
.525
.571
-
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
Item
24) Urge the learners to use their own knowledge to find or explain other knowledge. Teach them that
knowledge is power by showing how knowledge really helps solve problems.
Principle of the Teaching Process - To create and effective teaching process, the effective teacher should:
25) Select and/or develop lessons and problems that relate to environment and needs of the learner
26) Excite the learner's interest in the lesson when starting the lesson, by some question or statement
that will awaken inquiry. Develop a hook to awaken their interest.
27) Place yourself frequently in the position of a learner among learners, and join in the search for
some fact or principle.
28) Repress the impatience which cannot wait for the student to explain themselves, and which takes
the words out of their mouth. They will resent it, and feel that they could have answered had you given
them sufficient time
29) Count it your chief duty to awaken the minds of the learners and do not rest until each learner
shows their mental activity by asking questions.
30) Repress the desire to tell all you know or think upon the lesson or subject; and if you tell
something to illustrate or explain, let it start a fresh question.
31) Give the learner time to think, after you are sure their mind is actively at work, and encourage them
to ask questions when puzzled.
32) Do not answer the questions asked too promptly, but restate them, to give them greater force and
breadth, and often answer with new questions to secure deeper thought.
33) Teach learners to ask What? Why? and How? in order to better learn the nature, cause, and method
of every fact, idea, or principle observed or taught them: also, Where? When? By whom? and What of
it? - the place, time, actors, and consequences.
The Principle of the Learning Process - In order to facilitate an effective learning process, the effective
teacher should:
34) Ask the learner to express, in their own words, the meaning as they understand it, and to persist
until they have the whole thought.
35) Let the reason why be perpetually asked until the learner is brought to feel that they are expected to
give a reason for their opinion.
36) Aim to make the learner an independent investigator - a student of nature, a seeker of truth.
Cultivate in them a fixed and constant habit of seeking accurate information.
37) Seek constantly to develop a profound regard for truth as something noble and enduring.
The Principle of Review and Application: To affirm the learning that has occurred and apply it, the
effective teacher should:
38) Have a set time for reviews. At the beginning of each lesson take a brief review of the preceding
lesson
39) Glance backward, at the close of each lesson, to review the material that has been covered. Almost
every good lesson closes with a summary. It is good to have the learners know that any one of them
may be called upon to summarize the lesson at the end of the class.
40) Create all new lessons to bring into review and application, the material of former lessons.
41) The final review, which should never be omitted, should be searching, comprehensive, and
masterful, grouping all parts of the subject learned as on a map, and giving the learner the feeling of a
familiar mastery of it all.
42) Seek as many applications as possible for the subject studied. Every thoughtful application
involves a useful and effective review.
43) An interesting form of review is to allow members of the class to ask questions on previous
lessons.
M
SD
α
3.51
.570
-
3.37
3.49
.407
.549
.858
-
3.57
.521
-
3.42
.587
-
3.29
.654
-
3.06
.766
-
3.30
.555
-
3.48
.548
-
3.34
.590
-
3.37
.599
-
3.40
.462
.684
3.35
.674
-
3.29
.721
-
3.58
.542
-
3.37
.638
-
3.37
.348
.852
3.25
.618
-
3.30
.619
-
3.12
.722
-
3.28
.668
-
3.33
.627
-
3.23
.533
-
Note. N = 84. Survey questions (“guidelines”) were aggregated by subscales representing Gregory’s (1886) Seven
Laws (“Principles”) of Teaching. Values were calculated from 4-point Likert scale responses (1 = strongly disagree,
2 = disagree, 3 = agree, 4 = strongly agree).
3.3 Principles and Guidelines
The mean and standard deviation for each of the principle subscales were computed as follows: Principle of the
Teacher (M = 3.30, SD = 0.385), Principle of the Learner (M = 3.52, SD = 0.376), Principle of the Language (M =
Published by Sciedu Press
7
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
3.31, SD = 0.414), Principle of the Lesson (M = 3.51, SD = 0.415), Principle of the Teaching Process (M = 3.37, SD
= 0.407), Principle of the Learning Process (M = 3.40, SD = 0.462), and the Principle of Review and Application (M
= 3.37, SD = 0.348). This represents affirmation of each of the seven principles as relevant to current educational
settings.
Cronbach’s alpha for each of the subscales were as follows: Principle of the Teacher (α = 0.667), Principle of the
Learner (α = 0.754), Principle of the Language (α = 0.800), Principle of the Lesson (α = 0.828), Principle of the
Teaching Process (α = 0.858), Principle of the Learning Process (α = 0.684), and Principle of Review and
Application (α = 0.852). These values affirm the internal consistency of each of the subscales.
The mean scores of each of the 43 items (guidelines) were above 3.0. The item mean scores ranged from 3.02 to 3.65
with standard deviations ranging from 0.503 to 0.766. These results reflect that each individual guideline was
affirmed as being relevant to current educational settings.
4. Discussion
4.1 Implications
In this paper, we examined the relevance of the principles (laws) presented in 1886 by John Milton Gregory in The
Seven Laws of Teaching. We presented evidence that these principles may indeed represent enduring Timeless
Principles of effective teaching that, while their application in the 21st century may look different than it did in the
19th century, encapsulate the necessary elements to facilitate effective learning. The results of this exploratory study
confirm affirmation from educators and educators-in-training of the current relevance of these principles.
The results of the study also affirm the perception of applicability of the guidelines—or as Gregory (1886) described
them, rules for teachers—for faculty members of institutions of higher education as well as prospective K-12
teachers. However, neither we nor Gregory posit that the guidelines presented in the study represent a comprehensive,
exhaustive list of appropriate guidelines. For example, one could envision a guideline such as “Learn students’ names
to help them feel connected to the learning community” as an element of effective teaching. However, this statement
could easily be considered as a fit for the Principle of the Learner, as the feeling of being connected to the learning
community certainly contributes to learner engagement. It is reasonable and should be expected that other guidelines
for teachers would be consistent with one of the seven principles.
The mean score of respondents to each guideline statement was above 3.0 on a 4-point Likert scale in which a 3
represented agree and a 4 represented strongly agree (lowest M = 3.02, highest M = 3.65). In addition, the mean
scores for the subscales representing each principle had mean scores ranging from 3.31 to 3.52, reflecting strong
affirmation of the current relevance of each of the Seven Timeless Principles.
The enduring nature of these Seven Principles may be a result of their consistency with research-based practices
whose impact has been shown since Gregory (1886) described his Laws for Teachers. For example, the concept of
cognitive load theory (Atkinson & Shiffin, 1968) is consistent with both the Principle of the Learner and the
Principle of the Lesson. In addition, elements of self-determination theory (Ryan & Deci, 2000) are clearly consistent
with the guidelines in the Principle of the Teaching Process, and spaced-retrieval practice (Karpicke & Roediger,
2007) easily fits within the Principle of Review, the reviewing, rethinking, re-knowing, and reproducing of the
learning. Eyler’s (2018) description of curiosity as one the fundamental elements of how humans learn contains
many elements that overlap with and are similar to the language used by Gregory to describe the Principle of the
Learner. In order for learning to occur, the learner must actively engage in the learning process and must demonstrate
curiosity toward that which is to be learned.
As Wilson (2014) indicated, “highly effective teachers will understand the profound differences between methods of
teaching and principles of teaching” (p. 3). For example, lesson plan development is a common method used in
teacher preparation programs to emphasize the importance of comprehensive understanding of the lesson to be taught.
The lesson plan includes objectives, a review of previous lessons, a summary of the content, and identification of
activities that will be used to facilitate the learning. These activities represent methods that are consistent with the
Principle of the Lesson. The teacher must have a clear understanding of what is to be learned in this class and how
the content to be learned builds upon previous lessons or classes.
Additionally, in the higher education arena, institutions and accreditation bodies have a variety of methods designed
to be consistent with the Principle of the Teacher. A teacher must be one who knows the lesson or truth to be taught.
Potential faculty are evaluated on the relevance of their degrees, research, and experiences to the classes to be taught,
Published by Sciedu Press
8
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
all of which is done in an attempt to demonstrate that the instructor knows the lesson or truth to teach.
There is danger in too great of a focus on methods rather than the principles. For example, the actions of some
accrediting bodies in higher education imply that the only way an instructor can learn about a particular content area
is to take courses at a university or college. However, it is easy to elicit examples of respected experts who developed
their expertise outside the traditional classroom. Another example can be easily observed in the developing role of
the digital classroom. While the methods of developing and maintaining the engagement of students are likely to be
quite distinct from a face-to-face classroom versus an online or hybrid classroom, the Principle of the Learner is
equally relevant in both settings.
4.2 Limitations and Future Work
While embracing convenience sampling and incentivizing student participation increases reliability and power
associated with sample size, it also influences who accepts the invitation to participate. This increases the potential
non-response bias of the study. Similarly, while adhering to Gregory’s (1886) language closely was a primary
component of identifying transferability, the structure of the instrument may increase desirability and acquiescence
biases. Such response biases are possible when evaluating a series of statements without embedded item controls.
While a highly controlled instrument was outside the scope of this work, future studies can leverage an in-depth
analysis of specific principles and guidelines using survey techniques designed to mitigate bias.
The sample size, while sufficient for the statistical purposes of the study, is not necessarily sufficient to make an
argument that it is representative of a national population of educators or future educators. However, we believe the
sample is strengthened by the diversity of academic disciplines that are represented in it. Additional replications of
the study with larger, more representative samples will be necessary to extrapolate the results to a larger population;
this will be a focus of continued research.
Additional efforts are needed to examine each of the Seven Timeless Principles in-depth and to provide insights into
the application in 21st century education. This includes more detailed research involving a larger and more diverse
sample, as well as the addition of mixed methods for a more comprehensive portrayal of data. Further, future efforts
will attempt to demonstrate that current-day teaching theories and methods, as well as modern policies and
regulations that are considered innovative, are founded in these Timeless Principles. In addition, there is potential to
create a framework for the teaching and learning process that assists teachers at all levels of education to clearly
associate their strategies and methods of teaching with the Timeless Principles.
References
Atkinson, R. C., & Shiffin, R. M. (1968). Human memory: A proposed system and its control processes. Psychology
of Learning and Motivation, 2, 89-195. https://doi.org/10.1016/S0079-7421(08)60422-3
Auld, M. E., & Bishop, K. (2015). Striving for excellence in health promotion pedagogy. Pedagogy in Health
Promotion, 1(1), 5-7. https://doi.org/10.1177/2373379915568976
Chickering, A. W., & Gamson, Z. F. (1987). Seven principles for good practice in undergraduate education. AAHE
Bulletin, 39(7), 3-6. Retrieved from https://aahea.org/articles/sevenprinciples1987.htm
Council of Chief State School Officers. (2011). InTASC model core teaching standards: A resource for state dialogue.
Washington,
DC:
Author.
Retrieved
from
https://ccsso.org/resource-library/intasc-model-core-teaching-standards
Eyler, J. R. (2018). How human learn: The science and stories behind effective college teaching. Morgantown, WV:
West Virginia Press.
Gregory, J. M. (1886). The seven laws of teaching. Boston, MA: Congregational Sunday-School and Publishing
Society.
Karpicke, J. D., & Roediger, H. L. III. (2007). Expanding retrieval practice promotes short-term retention, but
equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 33(4), 704-719. https://doi.org/10.1037.0278-7393.33.4.704
National Education Association. (2012). Preparing 21st century students for a global society: An educator’s guide to
the “four Cs.” Alexandria, VA: Author.
Perkins, D. (2008). Making learning whole: How seven principles of teaching can transform education. San
Francisco, CA: Jossey-Bass.
Published by Sciedu Press
9
ISSN 1925-0746
E-ISSN 1925-0754
http://wje.sciedupress.com
World Journal of Education
Vol. 11, No. 3; 2021
Pruitt, S. D., & Epping-Jordan, J. E. (2005). Preparing the 21st century global healthcare workforce. BMJ, 330, 637.
https://doi.org/10.1136/bmj.330.7492.637
Rosenshine, B., & Furst, N. (1971). Research on teacher performance criteria. In B. O. Smith (Ed.), Research in
teacher education (pp. 37-72). Englewood Cliffs, NJ: Prentice Hill.
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social
development, and well-being. American Psychologist, 55(1), 68-78. https://doi.org/10.1037/0003-066X.55.1.68
Stephenson, L. (2014). Appendices. In J. M. Gregory, The seven laws of teaching (1st ed. reprint; pp. 129-144).
Moscow, ID: Canon Press.
Thorndike, E. L. (1906). The principles of teaching: Based on psychology. London, England: Routledge.
Tomlinson, C. A. (2017). How to differentiate instruction in academically diverse classrooms (3rd ed.). Alexandria,
VA: ASCD.
Walls, R. T. (1999). Psychological foundations of learning. Morgantown, WV: West Virginia University International
Center for Disability Information.
Wilson, D. (2014). Foreword: The seven disciplines of highly effective teachers. In J. M. Gregory, The seven laws of
teaching (1st ed. reprint; pp. 1-9). Moscow, ID: Canon Press.
Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution
license (http://creativecommons.org/licenses/by/4.0/).
Published by Sciedu Press
10
ISSN 1925-0746
E-ISSN 1925-0754
Developing Peer Review of Instruction in an Online Master Course Model
Developing Peer Review of Instruction
in an Online Master Course Model
John Haubrick
Deena Levy
Laura Cruz
The Pennsylvania State University
Abstract
In this study we looked at how participation in a peer-review process for online Statistics courses
utilizing a master course model at a major research university affects instructor innovation and
instructor presence. We used online, anonymous surveys to collect data from instructors who
participated in the peer-review process, and we used descriptive statistics and qualitative analysis
to analyze the data. Our findings indicate that space for personal pedagogical agency and
innovation is perceived as limited because of the master course model. However, responses
indicate that participating in the process was overall appreciated for the sense of community it
helped to build. Results of the study highlight the blurred line between formative and summative
assessment when using peer review of instruction, and they also suggest that innovation and
presence are difficult to assess through short term observation and through a modified version of
a tool (i.e., the Quality Matters rubric) intended for the evaluation of an online course rather than
the instruction of that course. The findings also suggest that we may be on the cusp of a second
stage for peer review in an online master course model, whether in-person or online. Our findings
also affirm the need for creating a sense of community online for the online teaching faculty. The
experiences of our faculty suggest that peer review can serve as an integral part of fostering a
departmental culture that leads to a host of intangible benefits including trust, reciprocity,
belonging, and, indeed, respect.
Keywords: Peer review, online teaching, teaching evaluation, master course model, statistics
education, instructor presence
Haubrick, J., Levy, D., & Cruz, L., (2021). Developing peer review of instruction in an online
master course model. Online Learning, 25(3), 313-328. doi:10.24059/olj.v25i3.2428
Online Learning Journal – Volume 25 Issue 3 – September 2021
313
Developing Peer Review of Instruction in an Online Master Course Model
Peer review has a long history in academia, originating in the professional societies of the
early Enlightenment. The practice first arose to address the need for an evaluation/evaluative
metric of the quality of research in an era replete with amateur scientists. In this same context,
peer review also functioned as a foundation for establishing collective expertise that was not
dependent on the approval of an external body, whether political fiat or divine consecration. The
present study examines one way in which this long-standing practice of peer review has evolved
to embrace new professional modes (i.e., teaching), new modalities of instruction (i.e., online),
and new roles for instructors within the current context of higher education.
Literature Review
Peer review had long been the gold standard for academic research, but it was not until
the learning-centered revolution, begun in the 1970s, that the practice found application in
education. At first, peer review was confined largely to volunteers who were experimenting with
pedagogical changes stemming from recent developments in learning science research. As one
leading scholar writes, there was “a general sense…that teaching would benefit from the kinds of
collegial exchange and collaboration that faculty seek out as researchers” (Hutchings, 1996).
Further, contrary to the conservative bias often attributed to the peer review of research (Roy &
Ashburn, 2001), peer review of teaching (PRT) has increasingly proven to foster both personal
empowerment and teaching transformation (Chism, 2005; Hutchings, 1996; Lomas & Nicholls,
2005; Smith, 2014; Trautman, 2009). As one set of scholars state, “the value of formative peer
assessment is promoted in the exhortative literature…justified in the theoretical literature…and
supported by reports of experimental and qualitative research” (Kell & Annetts. 2009; Hyland et
al., 2018; Thomas et al., 2014).
Those early experiments led to dramatic breakthroughs in evidence-based practice in
teaching and learning and, by extension, changes in how these activities are evaluated. Since the
early 2000s, universities have responded to a growing imperative to assess teaching
effectiveness, both as a means of evaluating work performance and as a way of demonstrating
collective accountability for the student learning experience. An increasing number of studies
have linked effective instruction to desired institutional outcomes, including recruitment,
persistence, and graduation rates, upon the latter of which many funding models rest. Because
the drive towards accountability is fueled by student interests, it is perhaps not surprising that the
most common strategy for evaluating teaching are student evaluations of instruction (SETs). At
a typical U.S. university today, students are asked to complete an electronic survey at the end of
each semester comprised of a series of scaled survey items along with a handful of open-ended
questions.
Over the years, the use of SETs as a measure of teaching effectiveness has been both
affirmed and disputed (Seldin, 1993). The reliability of the practice has been strengthened
through increasing sophistication of both the design of the questions and the analysis of the
results. At the same time, however, it has also been questioned as the basis of personnel
decisions (Nilson, 2012; Nilson, 2013).
Although not definitively proven, there is a persistent perception that SETs are biased,
particularly in the case of faculty members from under-represented populations, including those
for whom English is a second language and, in some disciplines, women (Calsamiglia &
Loviglio, 2019; Zipser & Mincieli, 2019). Other scholars have called the validity of the results
into question, suggesting that students are not always capable of assessing their own learning
accurately or appropriately, leading to claims that SETs are more likely to measure popularity
Online Learning Journal – Volume 25 Issue 3 – September 2021
314
Developing Peer Review of Instruction in an Online Master Course Model
rather than effectiveness (Schneider, 2013; Uttl et al., 2017). Perhaps the only safe and definitive
conclusion to draw is that the implications of the practice are complex and contested.
Higher education institutions have navigated these stormy waters in multiple ways, most
by encouraging the use of multiple forms of measurement for teaching effectiveness, often in the
form of a portfolio, or similar collection tool (Chism, 1999; Seldin et al., 2010). This practice is
supported by the research literature, which aligns the practice with the multi-faceted nature of
teaching as well as the importance of direct (e.g., not self-reported) measures of student learning.
To potentially counterbalance the limitations of SETs, practitioners have suggested the use of
PRT, which places disciplinary experts, rather than amateur students, in the driver’s seat. In this
evaluative mode, PRT typically takes the form of either peer review of instructional materials
and/or peer observation of teaching.
While PRT may appear to be a neat solution to a pervasive issue, the practice had
previously been used largely for formative purposes on a voluntary basis. The transition to
compulsory (or strongly encouraged) evaluative practice has proven to be fraught with dangers,
both philosophical and practical (Blackmore, 2005; Edström 2019; Keig, 2006; McManus,
2002). Practically speaking, the PRT process requires a considerable investment of time, energy,
and attention, not only to conducting the reviews but also to developing shared standards and
practices. Philosophically, several scholars have predicted that several of the primary benefits of
PRT as a developmental tool might suffer when transposed into a summative context (Cavanagh,
1996; Gosling, 2002; Kell & Annetts, 2002; Morley, 2003; Peel, 2005). It has proven to be
difficult to substantiate these fears, however, as one of the downsides of utilizing summative
assessment is the challenges it presents to research.
The PRT problem is confounded by the rise of new modes of instruction, especially
online and hybrid modalities (Bennett & Barp, 2008; Jones & Gallen, 2016). Since its inception,
online education has carried with it a burden of accountability that traditional in-person
instruction has not, and the onus rests with online instructors to prove that the virtual learning
experience is of comparable quality to other modalities (Esfijani, 2018: Shelton, 2011). This has,
in turn, led to the development and refinement of shared quality standards for online courses
(notably, the Quality Matters (QM) rubric), the application and evaluation of which often rely on
the collective expertise of other online instructors, i.e., pedagogical (rather than disciplinary)
peers (Shattuck et al., 2014). The QM peer-review process, for example, designates two reviewer
roles, a subject matter expert and online pedagogy practitioner, the latter of whom undergoes a
QM-administered certification process.
The proliferation of online courses, however, has been accompanied by design and
implementation changes. Because it takes time and sustained engagement to master the
techniques and approaches needed to meet the quality standards for online courses, the role of
the instructional designer (ID) as expert in these areas has become increasingly commonplace. A
typical role for an ID might be to collaborate closely with faculty members to design and develop
online courses that effectively deliver content in a manner that meets (or exceeds) quality
standards. Once created, it is certainly possible for the same course to be taught by multiple
faculty members.
In a typical ID-faculty scenario, the faculty member often has considerable input on the
design as it evolves and provides primary instruction, but peer review of instruction is
complicated both by the medium and the role of the third party (the ID) (Drysdale, 2019). For
example, the observation protocols developed for the classroom may not apply to a virtual space,
at least not to the same degree, and a review of instructional strategies, as reflected in artifacts
Online Learning Journal – Volume 25 Issue 3 – September 2021
315
Developing Peer Review of Instruction in an Online Master Course Model
such as the syllabus, may be the product of both the ID and/or the faculty member. It is perhaps
for these reasons that peer review of online instruction has tended to focus on the course rather
than the instructor. The Quality Matters rubric, for example, emphasizes attributes of course
design rather than teaching effectiveness. Yet, the need for evaluative measures of instruction
and instructor persists, perhaps even more so as trends point to a growing number of adjunct
faculty teaching online courses for whom such measures can provide both accountability and
professional development. (Barnett, 2019; Taylor, 2017).
The challenge is further compounded by the emergence of instructional standards and/or
competencies for online (or hybrid) courses that are distinctive to the virtual environment, both
in form and context (Baran et al., 2011). The popular community of inquiry model, for example,
differentiates between cognitive presence (content and layout), social presence (engagement),
and teaching presence in online courses; all are facets of instruction that are less emphasized in
in-person instruction. These insights have led to the development of several exemplary protocols
specifically intended for reviewing online instruction (McGahan et al., 2015; Tobin et al., 2015).
Each of these tools are firmly grounded in an extensive body of evidence-based practice for
online teaching, but still, the handful of studies that have been conducted on the PRT process
itself have tended to be limited to case studies and/or action research (Barnard et al., 2015;
Swinglehurst et al., 2014; Sharma & Ling, 2018; Wood & Friedel, 2009). As one researcher put
it, it is simply “difficult to find quantitative evidence due to its nature and context” (Bell, 2002;
Peel, 2002).
The challenge of peer review of teaching is even further complicated by the increasing
use of the master course model (Hanbing & Mingzhuo, 2012; Knowles & Kalata, 2007). For
courses in which stakes are higher and student populations larger, such as gateway or barrier
courses, an institution may choose to adopt a master course model in which an already designed
course is provided to all instructors, thereby ensuring a consistent experience for all students
(Parscal & Riemer, 2010). In this scenario, instructors have little to no control over the content,
design, and, in many cases, delivery of the course, all of which serve as major components of
most peer review of instruction models, whether for online or in-person courses. However, even
within a master course model, instruction varies and opportunities remain to provide both
formative (for individual improvement) and summative (for performance evaluation) feedback.
Yet, the question of how to evaluate teaching within these boundaries is a subject that has
received less attention in both research and practice. Our study explores the implementation of a
peer review of teaching process for an online statistics program that uses master courses at a
large, public, research-intensive university.
Methods
Context
The Pennsylvania State University is a public research university located in the
northeastern part of the United States. The statistics program offers 24 online courses, with
approximately 1500 enrollments per semester, including those for its online graduate program
and two undergraduate service courses. Statistics courses have been identified as barrier courses
at many institutions, including this one. Therefore, the program at The Pennsylvania State
University bears the responsibility for high standards of instructions that contribute to student
success, especially persistence.
Online Learning Journal – Volume 25 Issue 3 – September 2021
316
Developing Peer Review of Instruction in an Online Master Course Model
Each of the program’s 24 courses is based on a master template of objectives, content,
and assessments. The courses are delivered through two primary systems, the learning
management system (LMS) and the content management system (CMS). Each section has its
own unique LMS space for each iteration of the course. Students and instructors use the LMS for
announcements, communication/email, assessments, grading, discussion and any other
assignments or interactions. The lesson content for each course is delivered through a CMS,
which in this case has a public website whose content is classified as open educational resources
under a creative commons license. The CMS is unique to the course and is not personalized or
changed from semester to semester. Similarly, the lesson content, developed and written by
program faculty members, does not change from semester to semester, aside from minor fixes
and/or planned revisions.
Instructor agency in the LMS context varies depending on the course taught, how long
the instructor has taught it, and how many sections are offered in that semester. Instructors who
are teaching a course that has only one section have more agency to change appearance and
interactions within the LMS than instructors who are teaching a course with multiple sections. In
this statistics department, only one section of most of the online graduate courses is offered per
semester, while more than one section of undergraduate courses is typically offered. The largest
of these undergraduate courses is a high enrollment, general education requirement course that
runs 10-12 sections per semester. Courses with multiple sections use the same CMS as well as
the same master template in the LMS to maintain consistency in the student experience.
Therefore, in a single section course the instructor could modify the design of their course space
within the LMS by choosing their home page, setting the navigation, and organizing the modules
while still delivering the content and objectives as defined by the department for that course.
Such modifications are less likely to occur in multi-section courses. The following table
highlights the level of agency possessed by the instructor in both the CMS and LMS according to
the varied teaching contexts in this department.
Table 1
Levels of Instructor Agency in Various Course Types Offered
If the instructor teaches...
Undergraduate, single section
Graduate, single section
Undergraduate, multiple sections
Graduate, multiple sections
Content Management System
(CMS)
Learning Management System
(LMS)
Low
Low
Low
Low
High
High
Low
Low
During the fall 2019 semester, the faculty members in the department who teach online courses
were comprised of full-time teaching professors (n=13), tenure-track professors (n=6), and
adjuncts (n=10). Peer review of instruction has been practiced since the onset of the program. In
its current iteration, the process takes place annually over an approximately three-week period in
the fall semester. The primary purpose of the peer-review process is to offer formative feedback
to the instructors, but the results are shared with the assistant program director and faculty
members are permitted (though not required) to submit the results as part of their reappointment,
promotion, and tenure dossiers. For the fall 2019 semester, 27 of the 29 (93%) faculty members
participated in the peer-review process.
Online Learning Journal – Volume 25 Issue 3 – September 2021
317
Developing Peer Review of Instruction in an Online Master Course Model
Peer Review of Instruction Model
In the fall of 2018, the instructional designer for these statistics courses piloted a new
peer-review rubric, which is a modification of the well-known Quality Matters Higher Ed rubric.
In this modification, 21 out of 42 review standards were determined to be applicable to the
instructors in the master course context. The rubric serves as the centerpiece of a two-part
process, in keeping with identified best practices (Edkey, & Roehrich, 2013). First, the faculty
member completes a pre-observation survey and the reviewer, who is added to the course as an
instructor, evaluates the course according to each of the twenty-one standards in the rubric. The
observation is followed by a virtual, synchronous meeting with the peer-review partner. Faculty
members are paired across various teaching ranks and course levels, and the pairings are rotated
from year to year. Both the observation and the peer meeting are guided by materials created by
the instructional designer, who provides both the instructor intake form and two guiding
questions for discussion.
In keeping with evidence-based practice for online instruction, the first discussion prompt
addresses how the faculty establish social, cognitive, and teaching presence within their course.
Along with the prompt, definitions and examples of each type of presence are provided to the
instructor.
Discussion prompt 1 in the online statistics program peer-review guide:
Prompt #1: Share with your peer how you establish these three types of presence in your
course.
Notes: How does your peer establish these three types of presence in their course?
The second prompt provides an opportunity for the instructors to share changes or innovations
they have implemented within the past year.
Discussion prompt 2 in the online statistics program peer-review guide:
\Prompt #2: Share with your peer if you are trying anything new this semester (or year)?
If yes, share your innovation or change you’ve made this semester (or year).
• Has the innovation or change been successful?
• What challenges have you had to work through?
• How could others benefit from what you’ve learned?
• What advice would you share with a colleague who is interested in trying
this or something similar?
Notes: What has your peer done this semester (or year) that is innovative or new for
them?
The process seeks to evaluate and promote not only quality standards through the rubric, but also
collegial discussion around innovation, risk-taking, and instructor presence.
Online Learning Journal – Volume 25 Issue 3 – September 2021
318
Developing Peer Review of Instruction in an Online Master Course Model
Study Design
The IRB-approved study was originally intended to be a mixed methods study, in which
input from participating instructors, collected in the form of a survey, would be supplemented
with an analysis of the peer-review artifacts, especially the instructor intake form and the peerreview rubric (which includes the 2 discussion prompts). The instructors provided mixed
responses to the requests for use of their identifiable artifacts, which limits their inclusion in the
study, but the majority did choose to participate in the anonymous survey (14 out of 27, 54%)
which was administered in the Fall semester of 2019. The online survey, sent to instructors by a
member of the research team not associated with the statistics department, consisted of 11
questions, comprised of 1 check all that apply, 8 five-point Likert scale, 1 yes/no, and 3 openended questions.
Results
Quantitative Results
With the small sample size (n=13) we are limited to basic descriptive statistics to analyze
the results of the Likert questions. The most infrequently chosen category on the Likert scale of
this survey was “neither agree nor disagree” (n=10), while “somewhat agree” (n=37) was the
most frequently chosen. In looking at the responses to specific prompts, we note that the
statement with the highest score was The steps of the peer-review process were clear. For this
statement, 13/13 responded with somewhat agree or strongly agree (mode = “strongly agree”).
Consistent with our qualitative findings, the next highest scoring statement was The peer-review
process was collegial, where 12/13 responded with somewhat agree or strongly agree and one
responded as neither agree nor disagree (mode = “strongly agree”). The statement The peerreview process was beneficial to my teaching received the third highest rating with 10/13
respondents saying that they somewhat agree (n=7) or strongly agree (n=3) (mode = “somewhat
agree”).
We do want to note that consistent with best survey design practice, one of the statements
was purposely designed as a negative statement: The peer-review process was not worth the time
spent on doing it. For this prompt, 8/13 responded with strongly disagree or somewhat disagree,
while 3/13 somewhat agreed with that statement and 2 chose neither agree nor disagree (mode =
“strongly disagree”).
Qualitative Results
The findings suggest that the participants operated under several constraints. When asked
how they assess student learning in the intake form, for example, the majority indicated that the
assessments are part of the master class and largely outside of their control, e.g. All… sections
have weekly graded discussion forums (might not be the same question), same HWs and same
exams. All instructors contribute for exams and HWs. Assessment of learning outcomes mainly
occur through these. This was evident both in the content and tone of their responses, with
passive voice predominating, e.g., quiz and exam questions are linked to lesson learning
objectives. The presence of constraint also came to the fore in the survey questions about
changes; for those who did make changes (6/11), these largely took the form of microinnovations (e.g., so far just little things, small modifications), tweaks primarily focused either
on course policies (e.g., new late policy); enhancing instructor presence (e.g., try new
introductions; I am using announcements more proactively) or fostering community (e.g.,
increasing discussion board posts, add netiquette statement).
Online Learning Journal – Volume 25 Issue 3 – September 2021
319
Developing Peer Review of Instruction in an Online Master Course Model
Space for personal pedagogical agency and innovation is perceived as limited because of
the master course model employed in this context. This sentiment is evidenced by the tone of the
survey responses related to assessments, and as just discussed. On the other hand, the instructor
intake form shows that instructors can innovate and experiment with those course components
that can be characterized broadly as relating to instructor presence, particularly regarding
communication in the course. There is a marked shift in the tone of response when asked, for
example, Please describe the nature and purpose of the communications between students and
instructors in this course. Responses to this question show agency and active involvement on the
part of the instructor in this aspect of the course:
I post announcements regularly and am in constant communication with the class. The
discussion forums have a fair bit of chatter and I have replied with video and images as
well there with positive feedback.
I respond very quickly to student correspondence. I use the course announcements
feature very often and check Canvas multiple times a day.
I would like to promote the use of the Discussion Boards more, but students still do not
use those as much as I would like them to.
In this last example, we see that the instructor is forward looking and discusses changes that he
or she would like to make even in the future. The data suggest that instructors are trying to make
space for their own unique contribution to the course and for more personalized choices in their
interactions with students. They are also eager to get feedback from their peers on practices that
fall into this space of agency:
I would appreciate any feedback on my use of course announcements. Do you feel that
they are appropriate in both content, frequency, and timing?
Our findings indicate that many of these instructors are operating within the constraints of a
master course model, as discussed earlier, and they are most enthusiastic in their responses and
innovative in their teaching when they can identify areas over which they can exert some degree
of control in the course design and delivery process.
As evidenced in the quantitative findings previously discussed, these qualitative findings
also tell us that instructors who participated in the survey appreciate the collegiality of the
process. Their open-ended responses indicate an appreciation of the collegiality and connection,
the informal learning, that the peer-review process afforded them. For example, one instructor
comments, “I have enjoyed the opportunity to discuss teaching ideas and strategies with other
online faculty. As a remote faculty member, I particularly value that interaction.” Responses
primarily indicate that participating in the process was overall appreciated for the sense of
community it helped to build. What we see emerge is another space—a space where instructors
can negotiate together the limitations for innovation that exist in this sequence of Statistics
courses, and where they can also share experiences. As one participant comments, The direct
communication with the peer is great for sharing positive and negative experiences with different
courses. As we see in our findings, faculty members clearly find value in the process, regardless
of the product. This insight suggests the presence of a lesser known third model, distinct from
Online Learning Journal – Volume 25 Issue 3 – September 2021
320
Developing Peer Review of Instruction in an Online Master Course Model
either formative or evaluative formats, called collaborative PRT (Gosling, 2002; Keig &
Waggoner, 1995). In collaborative PRT, the end goal is to capture the benefits of turning
teaching from a public to a more collaborative activity (Hutchings, 1996).
Discussion
Our findings should not be overstated. This study was conducted for a single program at a
single university over the course of one semester; as such, the results may or may not be
replicable elsewhere. Replication may also be hindered by the challenges inherent in studying
peer review as a process. Because the results of peer review in this case may be used for
summative or evaluative purposes, any evidence generated is considered part of a personnel file
and, as such, subject to higher degrees of oversight in the ethical review process. The ethical
review board at The Pennsylvania State University, for example, did not classify this study as
exempt research, but rather put the proposal through full (rather than expedited or exempt) board
review, and has required additional accountability measures. And the evaluative nature of those
documents also contributed to low faculty participation (n=3) in the first stage of our study,
where we asked to include copies of their peer-review documents (an intake form, review rubric,
and meeting notes). There is a reason why there are comparatively few studies on peer review as
a process.
In the case of the statistics program, the primary rationale for establishing a peer review
of teaching process was intended to be formative assessment, i.e., providing feedback to
instructors so that they might improve the teaching and learning in online statistics courses. In
practice, however, the boundaries between formative and summative assessment blurred. While
instructors were not required or compelled to disclose the results of their peer review, many did
choose to include comments and/or ratings in their formal appointment portfolios, especially
when the only other evidence of teaching effectiveness (a primary criterion) available are student
evaluations of instruction (SETs). At The Pennsylvania State University, SETs are structured so
that students provide feedback on both the instructor and the course, at times separately and, at
other times, together. In a master course model, however, instructors have limited control over
many components of the course, making the results of student evaluations challenging to parse
out and potentially misleading if treated nominally or comparatively.
The distinction between formative and evaluative assessment is not the only blurred line
that arose from this study. In this case, peer review of instruction was accomplished with a
modified version of a tool (the QM rubric) intended to be used for the evaluation of an online
course. The modification of the QM rubric took the form of removing questions or sections
pertaining to course components deemed to be outside the control of the master course
instructors. In addition to the modified QM rubric, two supplemental items—open-ended
questions—were added to the review process. These items focused on presence and innovation,
which are difficult to assess through short-term observation. Our results suggest that this strategy
has led to partial success, i.e., the majority (10/13) of faculty members who responded to our
survey strongly or somewhat agreed that the process was beneficial, but its impact on teaching
practice has been limited. This may be partially a result of the limited scope of the study (one
academic year) which may or may not be an appropriate time frame for capturing changes to
teaching practice, but it may also stem from limitations in the current iteration of the peer-review
process itself.
Online Learning Journal – Volume 25 Issue 3 – September 2021
321
Developing Peer Review of Instruction in an Online Master Course Model
If we look back over the history of peer review of instruction for online courses, a pattern
emerges in which first, an existing tool, developed for a different purpose or context, is
importuned and adapted into a new environment. This occurred, for example, when peer
evaluation tools designed for in-person courses were adapted to suit online courses. In the next
stage, the adaption process reveals limitations of the existing tool which, in turn, spur the
development of new instruments or processes that are specifically designed for the context in
which they are being used. The creation of the QM Rubric is a clear example of this latter step.
The findings of our study suggest that we may be on the cusp of this second stage for
peer review of teaching in online master courses, which constitutes a quite different teaching
environment than other types of courses, whether in-person or online. In the case of master
courses, there is a distinctive division of labor where, primarily, instructional designers work
with authors to develop courses, course leads manage content, and instructors serve as the
primary point of contact with students. It may be time to develop a new rubric (or similar tool)
that takes this increasingly popular configuration more into consideration.
Adoption of the master course model is fueled by the need for both efficiency and
consistency in the student learning experience, and both experience and research suggest that it
has been effective in serving these goals. That being said, like all models, it also has its
limitations. Our study suggests that one of those tradeoffs may be that the model constricts both
the space for and the drivers of change. Without being able to make changes to the master course
itself, the faculty in our study tried to find ways to make small changes, i.e., micro-improvements
in those areas over which they held agency. Larger or more long-term changes, on the other
hand, would need to come from instructional designers and program managers, who may be one
or even two steps removed from the direct student experience. Although instructors frequently
make suggestions for course improvements, large changes to courses are not frequently
implemented. In other words, the division of labor needed to support the master course model
also divides agency, and the challenge remains to find systematic ways to re-integrate that
agency in the service of continuous improvement.
The limitations on faculty agency inherent in the master course model have led some
institutions to further devalue the role, substituting faculty-led courses for lower-paid, lesser
recognized, and more easily inter-changeable instructor roles (Barnett, 2019). Such a path would
be at odds with the culture of The Pennsylvania State University, but it does suggest the need for
faculty development, i.e., for finding ways to support and treat even part-time instructors as
valued and recognized members of the community of teaching and learning, even in conditions
where they may not be able to meet in person. It could be said that our findings affirm both the
need for creating a sense of community online both inside and outside of the courses, for faculty
members who teach them. The experiences of our faculty members suggest that peer review can
be an integral part of departmental culture that supports faculty peer to peer engagement, leading
to a host of intangible benefits including trust, reciprocity, belonging, and, indeed, respect.
Online Learning Journal – Volume 25 Issue 3 – September 2021
322
Developing Peer Review of Instruction in an Online Master Course Model
References
Baran, E., Correia, A. P., & Thompson, A. (2011). Transforming online teaching practice:
Critical analysis of the literature on the roles and competencies of online teachers. Distance
Education, 32(3), 421-439.
Barnard, A., Nash, R., McEvoy, K., Shannon, S., Waters, C., Rochester, S., & Bolt, S. (2015).
LeaD-in: a cultural change model for peer review of teaching in higher education. Higher
Education Research & Development, 34(1), 30-44.
Barnett, D. E. (2019). Full-range leadership as a predictor of extra effort in online higher
education: The mediating effect of job satisfaction. Journal of Leadership Education, 18(1).
Bennett, S., & Barp, D. (2008). Peer observation–a case for doing it online. Teaching in Higher
Education, 13(5), 559-570.
Blackmore, J. A. (2005). A critical evaluation of peer review via teaching observation within
higher education. International Journal of Educational Management, 19(3), 218-232.
Calsamiglia, C., & Loviglio, A. (2019). Grading on a curve: When having good peers is not
good. Economics of Education Review, 73(C).
Cavanagh, R. R. (1996). Formative and summative evaluation in the faculty peer review of
teaching. Innovative higher education, 20(4), 235-240.
Chism, N. V. N. (1999). Peer review of teaching. A sourcebook. Bolton, MA: Anker.
Drysdale, J. (2019). The collaborative mapping model: Relationship-centered instructional
design for higher education. Online Learning, 23(3), 56-71.
Edkey, M. T. & Roehrich, H. (2013). A faculty observation model for online instructors:
Observing faculty members in the online classroom. Online Journal of Distance Learning
Administration, 16 (2).
http://www.westga.edu/~distance/ojdla/summer162/eskey_roehrich162.html
Edström, K., Levander, S., Engström, J., & Geschwind, L. (2019). Peer review of teaching merits
in academic career systems: A comparative study. In Research in Engineering Education
Symposium.
Esfijani, A. (2018). Measuring quality in online education: A meta-synthesis. American Journal
of Distance Education, 32(1), 57-73.
Gosling, D. (2002). Models of peer observation of teaching. Report. LTSN Generic Center.
https://www.researchgate.net/profile/David_Gosling/publication/267687499_Models_of_Peer_O
bservation_of_Teaching/links/545b64810cf249070a7955d3.pdf
Online Learning Journal – Volume 25 Issue 3 – September 2021
323
Developing Peer Review of Instruction in an Online Master Course Model
Graham, C., Cagiltay, K., Lim, B. R., Craner, J., & Duffy, T. M. (2001). Seven principles of
effective teaching: A practical lens for evaluating online courses. The Technology Source, 30(5),
50.
Hanbing, Y., & Mingzhuo, L. (2012). Research on master-teachers’ management model in online
course by integrating learning support. Journal of Distance Education, 5(10), 63-67.
Hutchings, P. (1996). Making teaching community property: A menu for peer collaboration and
peer review. AAHE Teaching Initiative.
Hutchings, P. (1996). The peer review of teaching: Progress, issues and prospects. Innovative
Higher Education, 20(4), 221-234.
Hyland, K. M., Dhaliwal, G., Goldberg, A. N., Chen, L. M., Land, K., & Wamsley, M. (2018).
Peer review of teaching: Insights from a 10-year experience. Medical Science Educator, 28(4),
675-681.
Johnson, G., Rosenberger, J., & Chow, M. (October 2014) The importance of setting the stage:
Maximizing the benefits of peer review of teaching. eLearn, 2014 (10).
https://doi.org/10.1145/2675056.2673801
Jones, M. H., & Gallen, A. M. (2016). Peer observation, feedback and reflection for development
of practice in synchronous online teaching. Innovations in Education and Teaching
International, 53(6), 616-626.
Keig, L. (2000). Formative peer review of teaching: Attitudes of faculty at liberal arts colleges
toward colleague assessment. Journal of Personnel Evaluation in Education, 14(1), 67-87.
Keig, L. W., & Waggoner, M. D. (1995). Peer review of teaching: Improving college instruction
through formative assessment. Journal on Excellence in College Teaching, 6(3), 51-83.
Kell, C., & Annetts, S. (2009). Peer review of teaching embedded practice or policy‐holding
complacency?\ Innovations in Education and Teaching International, 46(1), 61-70.
Knowles, E., & Kalata, K. (2007). A model for enhancing online course development. Innovate:
Journal of Online Education, 4(2).
Lomas, L., & Nicholls, G. (2005). Enhancing teaching quality through peer review of teaching.
Quality in Higher Education, 11(2), 137-149.
Mayes, R. (2011, March). Themes and strategies for transformative online instruction: A review
of literature. In Global Learn (pp. 2121-2130). Association for the Advancement of Computing
in Education (AACE).
McGahan, S. J., Jackson, C. M., & Premer, K. (2015). Online course quality assurance:
Development of a quality checklist. InSight: A Journal of Scholarly Teaching, 10, 126-140.
Online Learning Journal – Volume 25 Issue 3 – September 2021
324
Developing Peer Review of Instruction in an Online Master Course Model
McManus, D. A. (2001). The two paradigms of education and the peer review of teaching.
Journal of Geoscience Education, 49(5), 423-434.
Nilson, L. B. (2012). 14: Time to raise questions about student ratings. To improve the academy,
31(1), 213-227.
Nilson, L. B. (2013). 17: Measuring student learning to document faculty teaching effectiveness.
To Improve the Academy, 32(1), 287-300.
Nogueira, I. C., Gonçalves, D., & Silva, C. V. (2016). Inducing supervision practices among
peers in a community of practice. Journal for Educators, Teachers and Trainers, 7, 108-119.
Parscal, T., & Riemer, D. (2010). Assuring quality in large-scale online course development.
Online Journal of Distance Learning Administration, 13(2).
Peel, D. (2005). Peer observation as a transformatory tool? Teaching in Higher Education, 10(4),
489 - 504.
Roy, R., & Ashburn, J. R. (2001). The perils of peer review. Nature, 414(6862), 393-394.
Schneider, G. (2013, March). Student evaluations, grade inflation and pluralistic teaching:
Moving from customer satisfaction to student learning and critical thinking. In Forum for Social
Economics 42(1),122-135.
Seldin, P. (1993). The use and abuse of student ratings of professors. Chronicle of Higher
Education, 39(46), A40-A40.
Seldin, P., Miller, J. E., & Seldin, C. A. (2010). The teaching portfolio: A practical guide to
improved performance and promotion/tenure decisions. John Wiley & Sons.
Sharma, M., & Ling, A. (2018). Peer review of teaching: What features matter? A case study
within STEM faculties. Innovations in Education and Teaching International, 55(2), 190200.ms: a comparative study. In Research in Engineering Education Symposium.
Shattuck, K., Zimmerman, W. A., & Adair, D. (2014). Continuous improvement of the QM
Rubric and review processes: Scholarship of integration and application. Internet Learning
Journal, 3(1).
Shelton, K. (2011). A review of paradigms for evaluating the quality of online education
programs. Online Journal of Distance Learning Administration,4(1), 1-11.
Smith, S. L. (2014). Peer collaboration: Improving teaching through comprehensive peer review.
To Improve the Academy, 33(1), 94-112.
Swinglehurst, D., Russell, J., & Greenhalgh, T. (2008). Peer observation of teaching in the online
environment: an action research approach. Journal of Computer Assisted Learning, 24, 383-393.
Online Learning Journal – Volume 25 Issue 3 – September 2021
325
Developing Peer Review of Instruction in an Online Master Course Model
Taylor, A. H. (2017). Intrinsic and extrinsic motivators that attract and retain part-time online
teaching faculty at Penn State (Doctoral dissertation, The Pennsylvania State University).
Thomas, S., Chie, Q. T., Abraham, M., Jalarajan Raj, S., & Beh, L. S. (2014). A qualitative
review of literature on peer review of teaching in higher education: An application of the SWOT
framework. Review of Educational Research, 84(1), 112-159.
Tobin, T. J., Mandernach, B. J., & Taylor, A. H. (2015). Evaluating online teaching:
Implementing best practices. San Francisco, CA: John Wiley & Sons.
Trautmann, N. M. (2009). Designing peer review for pedagogical success. Journal of College
Science Teaching, 38(4).
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching
effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies
in Educational Evaluation, 54, 22-42.
Wood, D., & Friedel, M. (2009). Peer review of online learning and teaching: Harnessing
collective intelligence to address emerging challenges. Australasian Journal of Educational
Technology, 25(1).
Zipser, N., & Mincieli, L. (2018). Administrative and structural changes in student evaluations of
teaching and their effects on overall instructor scores. Assessment & Evaluation in Higher
Education, 43(6), 995-1008.
Online Learning Journal – Volume 25 Issue 3 – September 2021
326
Developing Peer Review of Instruction in an Online Master Course Model
Appendix A
Anonymous Survey Questions
Likert Questions [1-8]
Answer Options
Strongly disagree
Somewhat disagree
Neither agree nor disagree
Somewhat agree
Strongly agree
1.
The peer-review process was beneficial to my teaching.
2.
The peer-review process was beneficial to my career development.
3.
The peer-review process was not worth the time spent on doing it.
4.
The peer-review process was collegial.
5.
The peer-review process provided me with new insight into my teaching practice.
6.
The peer-review process inspired me to try new things related to my teaching.
7.
The steps of the peer-review process were clear.
8.
I have little to no prior experience with peer review of online teaching.
Open-ended Questions [9-11]
9.
Did you make (or do you plan to make) changes to your instruction based on your participation in this peerreview process (e.g. feedback you received, conversations with your peers, rubrics, etc...)?
Y/N
a.
If Y, please describe the change(s) you plan to make to your instruction based on the feedback you
received through the peer-review process.
10. Please describe at least two insights gained from your participation in the peer-review process.
11. What changes, if any, would you suggest should be made to enhance the benefits of the peer-review
process?
Online Learning Journal – Volume 25 Issue 3 – September 2021
327
Developing Peer Review of Instruction in an Online Master Course Model
Appendix B
Instructor Intake Form Questions
Your information
1. What is your name?
2. What is your e-mail address?
3. Who is your assigned peer reviewer?
Your Online Course
4. What is your course name, number & section (e.g., STAT 500 001)?
5. What is the title of your course (e.g., Applied Statistics)?
6. What is the Canvas link to your course?
7. What is the link to the online notes in your course?
Context
8. How many semesters have you taught this course? Choose: (0-3) (4-6) (6 or more)
9. Does your course have multiple sections?
10. If yes, are all sections based on a single master (or another instructor’s) course?
11. If yes, roughly what percentage of the course do you change or personalize from the master?
12. How do you know if students are meeting the learning outcomes of your course?
13. Is there a specific part of the course content or design for which you would like the reviewer to
provide feedback?
14. Please describe the nature and purpose of the communications between students and instructors in
this course.
15. Are you trying anything new this semester based on prior student or peer feedback, professional
development, or your own experiences?
16. If yes, please explain.
Canvas Communication
17. Please identify other communications among students and instructors about which the
Reviewer should be aware, but which are not available for review at the sites listed above.
18. Does the course require any synchronous activities (same time, same place)?
___Yes
___No
19. If yes, please describe:
20. Is there any other information you would like to share with your peer before they review your
course?
Online Learning Journal – Volume 25 Issue 3 – September 2021
328
feedback
OPEN ACCESS
This is the English version.
The German version starts at p. 8.
article
Does peer feedback for teaching GPs improve student
evaluation of general practice attachments? A pre-post
analysis
Abstract
Objectives: The extent of university teaching in general practice is increasing and is in part realised with attachments in resident general
practices. The selection and quality management of these teaching
practices pose challenges for general practice institutes; appropriate
instruments are required. The question of the present study is whether
the student evaluation of an attachment in previously poorly evaluated
practices improves after teaching physicians have received feedback
from a colleague.
Methods: Students in study years 1, 2, 3 and 5 evaluated their experiences in general practice attachments with two 4-point items (professional competence and recommendation for other students). Particularly
poorly evaluated teaching practices were identified. A practising physician with experience in teaching and research conducted a personal
feedback of the evaluation results with these (peer feedback), mainly
in the form of individual discussions in the practice (peer visit). After
this intervention, further attachments took place in these practices. The
influence of the intervention (pre/post) on student evaluations was
calculated in generalised estimating equations (cluster variable practice).
Results: Of 264 teaching practices, 83 had a suboptimal rating. Of
these, 27 practices with particularly negative ratings were selected for
the intervention, of which 24 got the intervention so far. There were no
post-evaluations for 5 of these practices, so that data from 19 practices
(n=9 male teaching physicians, n=10 female teaching physicians) were
included in the present evaluation. The evaluations of these practices
were significantly more positive after the intervention (by n=78 students)
than before (by n=82 students): odds ratio 1.20 (95% confidence interval 1.10-1.31; p<.001).
Conclusion: The results suggest that university institutes of general
practice can improve student evaluation of their teaching practices via
individual collegial feedback.
Michael Pentzek1
Stefan Wilm1
Elisabeth
Gummersbach1
1 Heinrich Heine University
Düsseldorf, Medical Faculty,
Centre for Health and Society
(chs), Institute of General
Practice (ifam) , Düsseldorf,
Germany
Keywords: general practice, teacher training, feedback, medical
students, undergraduate medical education, evaluation
Introduction
The German “Master Plan Medical Studies 2020”
provides for a strengthening of the role of general practice
in the curriculum [1]. One form of implementation desired
by students and teachers is attachments in practices
early and continuously in the course of studies [2]. Beyond
pure learning effects, experiences that students make in
these attachments can help shape a professional orientation. Good experiences in attachments can increase interest in general practice as a discipline and profession
[3], [4].
In accordance with the Medical Licensing Regulations
[https://www.gesetze-im-internet.de/_appro_2002/
BJNR240500002.html], students in the Düsseldorf
medical curriculum complete an attachment in general
practices lasting a total of six weeks in the academic
years 1, 2, 3 and 5 [https://www.medizinstudium.hhu.de].
The requirements of the attachments build on each other
in terms of content; initially the focus is on anamnesis
and physical examination, later more complex medical
contexts and considerations for further diagnostics and
therapy are added. Under the supervision of the resident
teaching general practitioners (GP), the students can gain
experience in doctor-patient interaction. An important
and therefore repeatedly emphasised factor for a positive
student perception of the attachments is the fact that
the students are given the opportunity to work independently with patients during the attachment in order to be
able to directly experience themselves in the provider
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
1/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
role [2], [5]. The attitude and qualifications of the teaching
physicians continue to play an important role in the didactic success of the attachments [3]. About 2/3 of the
teaching practices are positively evaluated by the students, but about 1/3 are not. Due to the increasing demand for attachments in general practices since the installation of the new curriculum, many teaching practices
have been newly recruited; a feedback culture is now
being established. A first step was the possibility for
teaching practices to actively request their written evaluation results, but this was almost never taken up. The
next step of establishing a feedback strategy is reported
here: One way to improve teaching performance is to receive feedback from an experienced colleague (peer
feedback) [6]. This can generate insights that student
evaluations alone cannot achieve and is increasingly recognised as a complement to student feedback. In personal peer feedback, ideas can be exchanged, problems
discussed, strategies identified and concrete approaches
to improvement found [7]. Potential effects include increased awareness and focus of the teaching physician
on the teaching situation in practice, more information
about what constitutes good teaching, motivation to be
more interactive and student-centred, and inspiration to
use new teaching methods [8]. Pedram et al. found positive effects on teacher behaviour after peer feedback,
especially in terms of shaping the learning atmosphere
and interest in student understanding [9]. The application
of peer feedback to the setting described here has not
yet been investigated. The research question of the
present study is whether the student attachment evaluation of previously poorly rated GPs improves after peer
feedback has been conducted.
Methods
Teaching practices
The data were collected during the 4 attachments in
GP practices [https://www.uniklinik-duesseldorf.de/
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/lehre], all of which take place in
teaching practices coordinated by the Institute of General
Practice. Before starting their teaching practice, all
teaching GPs are informed verbally and in writing about
the collection of student evaluations and a personal interview with an institute staff member in case of poor evaluation results.
Interested doctors take part in a 2-3 hour information
session led by the institute director (SW) before taking
up a teaching GP position, in which they are first informed
about the prerequisites for teaching students in their
practices. These include, among other things, the planning
of time resources for supervising students in the attachments, enthusiasm for working as a GP, acceptance of
the university’s teaching objectives in general practice
(in particular that interns are allowed to work independently with patients) and participation in at least two of
the eight didactic trainings offered annually by the institute (with the commencement of the teaching activity,
the institute assumes the acceptance of these prerequisites on the part of the teaching physician, but does not
formally check that they are met). This is followed by detailed information on the structure of the curriculum, the
position of the attachments, the contents and requirements of the individual attachments and basic didactic
aspects of 1:1 teaching. Information about the student
evaluation of the attachment is provided verbally and in
writing, combined with the offer to actively request both
an overall evaluation and the individual evaluation by email. There is no unsolicited feedback of the evaluation
results to the practices. After the information event, a
folder with corresponding written information is handed
out.
Before each attachment, the teaching physicians are sent
detailed material so that they can orient themselves once
again. This contains information on the exact course of
the attachment, on the current learning status of the
students incl. enclosure of or reference to the underlying
didactic materials, on the tasks to be worked on during
the attachment and the associated learning objectives,
on the relevance of practising on patients as well as a
note on the attitude of wanting to convey a positive image
of the GP profession to the students.
In addition, each student receives a cover letter to the
teaching physician in which the most important points
mentioned above are summarised once again.
Evaluation
Student evaluation as a regular element of teaching
evaluation [https://www.medizin.hhu.de/studium-undlehre/lehre] was carried out by independent student
groups before and after the intervention. It consisted,
among other things, of the opportunity for free-text comments, an indication of the number of patients personally
examined and the items “How satisfied were you with the
professional supervision by your teaching physician?”
and “Would you recommend this teaching practice to
other fellow students?” (both with a positively ascending
4-point scale).
Selection of practices for the intervention
Since most practices received a very good evaluation
(skewed distribution), three groups were identified as
follows. From all the institute’s teaching practices involved
in the attachments, those were first selected that had a
lower than very good evaluation (=“suboptimal”): rated
<2 at least once on at least one of the two above-mentioned items or repeatedly received negative free text
comments. From this group of suboptimal (=less than
very good) practices, those with more than two available
student evaluations, continued teaching and particularly
negative evaluations were selected: at least twice with
<2 on at least one of the two items or repeated negative
free text comments. Of the 27 practices, 24 practices
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
2/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
(88.9%) have so far received an intervention to improve
their teaching from a peer (n=3 not yet due to the pandemic), and 19 practices (70.4%) provided evaluation
results from post-intervention attachments (n=5 had no
attachments after the intervention). To characterise the
three groups of very well, suboptimal and poorly evaluated
(=selected) practices, an analysis of variance including
post-hoc Scheffé tests was calculated with the factor
group and the dependent variable evaluation result.
Intervention
The free texts in the student evaluations as well as the
teacher comments in the peer visits and group discussions were processed qualitatively using content analysis
in order to outline the underlying problems and the
teacher reactions to the feedback in addition to the pure
numbers. For this purpose, inductive category development was carried out on the material [11]. The numbers
of negative student comments before and after the intervention were also compared quantitatively.
Results
Peer feedback was implemented as part of the
didactic concept in particularly negatively evaluated
teaching practices [https://www.uniklinik-duesseldorf.de/
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/didaktik-fortbildungen]: A GP staff
member of the Institute of General Practice (EG) known
to the teaching physicians and experienced in practice
and teaching reported back to the teaching physicians
their student evaluations. The primary mode was a personal visit to the practice (peer visit) [10]. For organisational reasons, group discussions with several teaching
physicians and written feedback occasionally had to be
offered as alternative solutions. Peer visits and group
discussions were both aimed at reflecting on one's own
teaching motivation and problems. This was followed by
a discussion of the personal evaluation in order to enter
into a constructive exchange between teaching GP and
university with regard to teaching and dealing with students in the practice. Peer visits and group discussions
were recorded. The opening question was “Why are you
a teaching doctor?”, followed by questions about personal
experiences: “Can you tell me about your experiences?
What motivates you to be a teaching physician? Are there
any problems from your point of view?” Then the (bad)
feedback was addressed and discussed, followed by the
question “What can we do to support you?”. The written
feedback consisted of an uncommented feedback of the
student evaluation results (scores and free texts).
Analyses
Due to a strong correlation of the two evaluation items
(Spearman's rho=0.79), these were averaged into an
overall evaluation for the present analyses. In order to
determine multivariable influences on this student evaluation, a generalised estimating equation (GEE) was
calculated with the cluster variable “practice”, due to the
lack of a normal distribution (Kolmogorov-Smirnov test
p<.001) with gamma distribution and log linkage. The
following were included as potential influence variables:
Intervention effect (pre/post), intervention mode (peer
visit vs. group/written), time of attachment (study year),
number of patients seen in person per week. In parallel
to this analysis, the intervention effect on the number of
personally supervised patients was examined in a second
GEE.
Teaching practices and pre-evaluations
264 teaching practices with a total of 1648 attachments
were involved. Of these, 181 practices (68.6%) with 1036
attachments were rated very good (student evaluation
mean 3.8±standard deviation 0.2), 56 practices (21.2%)
with 453 attachments were rated suboptimal (3.3±0.4)
and 27 practices (10.2%) with 159 attachments were
rated very poor (2.8±0.4). The overall comparison of the
three
groups
shows
significant
differences
(F(df=2)=205.1; p<.001), with significant differences in
all post-hoc comparisons (all p<.001): very good vs. suboptimal (mean difference 0.51; standard error 0.04); very
good vs. poor (1.09; 0.06); suboptimal vs. poor (0.58;
0.07).
Table 1 describes the analysis sample of n=19 out of the
27 poorly rated practices in more detail.
Reasons for a poor evaluation according to free texts of
the student evaluation can be presented in five categories. For example, the lack of opportunity to practise on
patients was criticised.
“Unfortunately, I did not have the opportunity to examine many patients myself during my last patient attachment, although I requested this on several occasions.”
(about practice ID 1)
There were also comments about lack of appreciation
and difficult communication:
“The teaching doctor has little patience especially
with foreign patients who cannot understand anatomical or medical terms. She makes insulting and ironic
statements. With some patients I was left alone for
30 minutes while with others only 2 and afterwards
she got annoyed when I was not done with the examination/anamnesis.” (about practice ID 14)
Some teaching physicians were commented on with regard to their didactic competence:
“[…] as a teaching doctor, I experienced him as little
to not at all competent and also very disinterested.
He had no idea of what PA1 [Patient Attachment 1]
was supposed to teach us and even after several approaches to him on my part, he understood little of
what I was about or what I was supposed to learn
there.” (about practice ID 22)
Practice procedures and structures were mentioned
which, according to the students, made it difficult to carry
out the attachment efficiently:
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
3/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
Table 1: Characteristics of the analysis sample
“From 8-11 am only patients come for blood collection, fixed appointments are not scheduled during
that time. As I was not allowed to take blood or vaccinate, there was nothing for me to do during that
time.” (about practice ID 10)
In some practices with primarily non-German-speaking
patients and also staff (incl. teaching physician), the
language barrier turned out to be a problem in the evaluations.
“As the teaching doctor is [nationality XY], about 70%
of the consultations were in [language XY].” (about
practice ID 2)
Intervention
In the protocols of the peer visits and group discussions
with the teaching physicians, four categories of problems
emerge, which partly mirror the student comments mentioned above: For example, the teaching physicians reported concerns about letting students work alone with patients (the following are quotes from the protocols of the
intervening peer doctor.)
“He finds it difficult to leave students alone. [...] He
thinks the patients don’t like it that way, although his
experience is actually different. Also has many patients from management. “Students are also too short
in practice.”” (reg. ID 17)
A sceptical attitude towards lower semester students in
particular was also expressed.
“Can’t do anything with the 2nd semesters, “they can’t
do anything, there's no point in letting them listen to
the heart if they don't know the clinical pictures”. [...]
“The problem is also that they are always very young
girls now.”” (reg. ID 24)
Some teaching physicians were not familiar with the didactic concepts and materials of the practical courses.
“He has no knowledge of teaching, doesn't read
through anything. Doesn’t know he is being evaluated
either.” (reg. ID 6)
In some cases, a self-image as a teaching general practitioner leads to the definition of one’s own attachment
content, neglecting or devaluing the learning objectives
set by the university.
““I’ve made a commitment to general practice and I
want to pass that on”. Explains a lot to students, but
doesn’t let them do much. “I show young people the
right way. Nobody else does it (the university certainly
doesn’t), so I do it.”” (reg. ID 4)
“However, clearly wants to show the students
everything, repeatedly mentions ultrasound, blood
sampling, does not know teaching content, makes
his own teaching content: “I show them everything of
interest””. (reg. ID 22)
At several points, the teaching physicians expressed intentions to change their behaviour, e.g. according to the
minutes, “wants to guide students more to examination”
or “says he wants to read through the handouts in future”.
The majority of the teaching physicians showed a basic
interest and commitment in supervising the students.
Most were able to reflect on the points of criticism.
Pre-post analysis
The intervention effect on the student evaluation is significant and independent of the (also significant) influence
of the number of patients (see table 2).
The intervention effect on the number of patients personally cared for by students also persisted in a GEE (odds
ratio 1.41; 95% confidence interval 1.21-1.64; p<.001),
regardless of the type of intervention and study year
(analysis not shown).
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
4/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
Table 2: Multivariable influences on the dependent variable “student evaluation of GP attachment” (generalised estimating
equation (GEE) with cluster variable practice)
Table 3: Number of students’ comments on attachments in 19 poorly evaluated GP teaching practices
The proportion of critical comments in the student freetext comments decreases overall and in four of the five
categories mentioned (see table 3).
Discussion
In a pre-post comparison of poorly evaluated teaching
physicians who supervised students in the context of GP
attachments, peer feedback by a general practitioner had
a positive effect on student evaluation and on the number
of patients personally examined by students during the
attachment. This is reflected in the evaluation scores and
also in the fact that corresponding negative free-text
comments by the students were less frequent after the
intervention.
In line with the literature, it was crucial for student evaluation that students were given the opportunity to work
independently with patients in order to experience
themselves directly in the provider role [2], [5]. Also independent of the number of patients, student evaluation
improved after the intervention: The qualitative results
provide evidence that the teaching physicians may have
been more closely engaged with the meaning of the at-
tachments, the learning objectives and didactic materials
after the intervention. This in turn also seemed to have
had positive effects on the exchange and relationship
between the teaching physician and the student (possibly
in the sense of an alignment of mutual expectations) also important elements of a positive attachment experience [3], [12]. The qualitative results on didactic competence and attitude indicate that, at least for the small
group of previously poorly evaluated teaching physicians
studied here, a more intensive consideration of their
teaching assignment and repeated interaction between
the university and the teaching practice is required in
order to internalise contents and concepts and to implement them in the attachments for students in a recognisable and consistent manner. The fact that it is precisely
the poorly evaluated teaching physicians who tend to
rarely attend the meetings at the university (offered eight
times a year in Düsseldorf) is an experience also reported
by many other locations. The formal review of the prerequisites and criteria for an appropriate teaching GP
position would involve an enormous amount of effort
given the high number of teaching practices required –
especially in a curriculum constructed along the lines of
longitudinal general practice. However, it must be weighed
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
5/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
up whether more resources should be invested in the
selection and qualification of practices interested in
teaching or in quality control and training of practices
already teaching.
A strength of this study is the evaluations by independent
student groups pre-post, so that biases due to repeated
exposure of students to a practice (e.g. response shift
bias, habituation, observer drift) are excluded. The
weakness associated with the pre-post design without a
control group and the focus on poorly evaluated practices
is, among other things, the phenomenon of regression
to the mean, which presumably accounts for part of the
positive intervention effect. The primary research question
of this study is formulated and answered quantitatively;
we report only limited qualitative results. These allow only
partial hypothesis-generating insights into the exact
mechanisms of peer feedback [13]. In the present study,
several modes of mediation of a peer feedback were
realised. Since the analyses do not indicate different effects of the personnel and time-intensive peer visit on
the one hand and the more efficient methods of group
discussion and written feedback on the other, further
studies are necessary to differentiate before a broader
implementation. For example, Rüsseler et al. [14] found
that written peer feedback – albeit in relation to lecturers
– had positive effects on the design of the course.
3.
Grunewald D, Pilic L, Bödecker AW, Robertz J, Althaus A. Die
praktische Ausbildung des medizinischen Nachwuchses Identifizierung von Lehrpraxen-Charakteristika in der
Allgemeinmedizin. Gesundheitswesen. 2020;82(07):601-606.
DOI: 10.1055/a-0894-4556
4.
Böhme K, Sachs P, Niebling W, Kotterer A, Maun A. Macht das
Blockpraktikum Allgemeinmedizin Lust auf den Hausarztberuf?
Z Allg Med. 2016;92(5):220-225. DOI:
10.3238/zfa.2016.0220–0225
5.
Gündling PW. Lernziele im Blockpraktikum Allgemeinmedizin Vergleich der Präferenzen von Studierenden und Lehrärzten. Z
Allg Med. 2008;84:218-222. DOI: 10.1055/s-2008-1073148
6.
Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M,
Prideaux D. A systematic review of faculty development initiatives
designed to improve teaching effectiveness in medical education:
BEME Guide No. 8. Med Teach. 2006;28(6):497-526. DOI:
10.1080/01421590600902976
7.
Garcia I, James RW, Bischof P, Baroffio A. Self-Observation and
Peer Feedback as a Faculty Development Approach for ProblemBased Learning Tutors: A Program Evaluation. Teach Learn Med.
2017;29(3):313-325. DOI: 10.1080/10401334.2017.1279056
8.
Gusic M, Hageman H, Zenni E. Peer review: a tool to enhance
clinical teaching. Clin Teach. 2013;10(5):287-290. DOI:
10.1111/tct.12039
9.
Pedram K, Brooks MN, Marcelo C, Kurbanova N, Paletta-Hobbs
L, Garber AM, Wong A, Qayyum R. Peer Observations: Enhancing
Bedside Clinical Teaching Behaviors. Cureus. 2020;12(2):e7076.
DOI: 10.7759/cureus.7076
10.
O'Brien MA, Rogers S, Jamtvedt G, Oxman AD, Odgaard-Jensen
J, Kristoffersen DT, Forsetlund L, Bainbridge D, Freemantle N,
Davis DA, Haynes RB, Harvey EL. Educational outreach visits:
effects on professional practice and health care outcomes.
Cochrane Database Syst Rev. 2007;2007(4):CD000409. DOI:
10.1002/14651858.CD000409.pub2
11.
Kruse J. Qualitative Interviewforschung. 2. Aufl. Weinheim: Beltz
Juventa; 2015.
12.
Koné I, Paulitsch MA, Ravens-Taeuber G. Blockpraktikum
Allgemeinmedizin: Welche Erfahrungen sind für Studierende
relevant? Z Allg Med. 2016;92(9):357-362. DOI:
10.3238/zfa.2016.0357-0362
13.
Raski B, Böhm M, Schneider M, Rotthoff T. Influence of the
personality factors rigidity and uncertainty tolerance on peerfeedback. In: 5th International Conference for Research in
Medical Education (RIME 2017), 15.-17. March 2017,
Düsseldorf, Germany. Düsseldorf: German Medical Science GMS
Publishing House; 2017. P15. DOI: 10.3205/17rime46
14.
Ruesseler M, Kalozoumi-Paizi F, Schill A, Knobe M, Byhahn C,
Müller MP, Marzi I, Walcher F. Impact of peer feedback on the
performance of lecturers in emergency medicine: a prospective
observational study. Scand J Trauma Resusc Emerg Med.
2014;22:71. DOI: 10.1186/s13049-014-0071-1
15.
Huenges B, Gulich M, Böhme K, Fehr F, Streitlein-Böhme I,
Rüttermann V, Baum E, Niebling WB, Rusche H.
Recommendations for Undergraduate Training in the Primary
Care Sector - Position Paper of the GMA-Primary Care Committee.
GMS Z Med Ausbild. 2014;31(4):Doc35. DOI:
10.3205/zma000927
16.
Böhme K, Streitlein-Böhme I, Baum E, Vollmar HC, Gulich M,
Ehrhardt M, Fehr F, Huenges B, Woestmann B, Jendyk R. Didactic
qualification of teaching staff in primary care medicine - a position
paper of the Primary Care Committee of the Society for Medical
Education. GMS J Med Educ. 2020;37(5):Doc53. DOI:
10.3205/zma001346
Conclusions
It makes sense to further consider the effects of teaching
physician feedback in both research and teaching. The
comprehensive GMA recommendations provide a robust
framework for teaching [15] and the didactic qualification
of teaching physicians [16]. Embedded in this, collegial
peer feedback for poorly rated teaching physicians represents a possible tool for quality management of general
practice teaching.
Competing interests
The authors declare that they have no competing interests.
References
1.
Bundesministerium für Bildung und Forschung. Masterplan
Medizinstudium 2020. Berlin: Bundesministerium für Bildung
und Forschung; 2017. Zugänglich unter/available from: https:/
/www.bmbf.de/files/2017-03-31_Masterplan%
20Beschlusstext.pdf
2.
Wiesemann A, Engeser P, Barlet J, Müller-Bühl U, Szecsenyi J.
Was denken Heidelberger Studierende und Lehrärzte über
frühzeitige Patientenkontakte und Aufgaben in der
Hausarztpraxis? Gesundheitswesen. 2003;65(10):572-578. DOI:
10.1055/s-2003-42999
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
6/14
Pentzek et al.: Does peer feedback for teaching GPs improve student ...
Corresponding author:
PD Dr. rer. nat Michael Pentzek
Heinrich Heine University Düsseldorf, Medical Faculty,
Centre for Health and Society (chs), Institute of General
Practice (ifam) , Moorenstr. 5, Building 17.11, D-40225
Düsseldorf, Germany, Phone: +49 (0)211/81-16818
mp@hhu.de
Please cite as
Pentzek M, Wilm S, Gummersbach E. Does peer feedback for teaching
GPs improve student evaluation of general practice attachments? A
pre-post analysis. GMS J Med Educ. 2021;38(7):Doc122.
DOI: 10.3205/zma001518, URN: urn:nbn:de:0183-zma0015182
This article is freely available from
https://www.egms.de/en/journals/zma/2021-38/zma001518.shtml
Received: 2021-03-03
Revised: 2021-08-12
Accepted: 2021-08-17
Published: 2021-11-15
Copyright
©2021 Pentzek et al. This is an Open Access article distributed under
the terms of the Creative Commons Attribution 4.0 License. See license
information at http://creativecommons.org/licenses/by/4.0/.
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
7/14
Feedback
OPEN ACCESS
This is the German version.
The English version starts at p. 1.
Artikel
Verbessert Peer-Feedback für Lehrärzte die studentische
Bewertung von Hausarztpraktika? Ein Prä-Post-Vergleich
Zusammenfassung
Zielsetzung: Die allgemeinmedizinische Lehre an den Universitäten
nimmt zu und wird u.a. mit Praktika bei niedergelassenen Hausärzten
realisiert. Auswahl und Qualitätsmanagement dieser Lehrpraxen stellen
die allgemeinmedizinischen Institute vor Herausforderungen; entsprechende Instrumente sind gefragt. Die Fragestellung der vorliegenden
Studie lautet, ob sich die studentische Bewertung eines Praktikums in
bislang schlecht evaluierten Hausarztpraxen verbessert, nachdem die
hausärztlichen Lehrärzte eine Rückmeldung durch eine Kollegin erhalten
haben.
Methodik: Studierende der Studienjahre 1, 2, 3 und 5 bewerteten ihre
Erfahrungen in hausärztlichen Praktika mit zwei 4-stufigen Items
(fachliche Betreuung und Empfehlung für andere Kommilitonen). Besonders schlecht evaluierte Lehrpraxen wurden identifiziert. Eine
praktisch tätige und lehr-erfahrene Hausärztin und wissenschaftliche
Mitarbeiterin führte mit diesen eine persönliche Rückmeldung der
Evaluationsergebnisse durch (Peer-Feedback), überwiegend in Form
von Einzelgesprächen in der Praxis (peer visit). Nach dieser Intervention
wurden in diesen Praxen weiter Praktika durchgeführt. Der Einfluss der
Intervention (prä/post) auf die studentischen Evaluationen wurde in
verallgemeinerten Schätzungsgleichungen (Clustervariable Praxis) berechnet.
Ergebnisse: Von insgesamt 264 Lehrpraxen hatten 83 eine suboptimale
Bewertung. Davon wurden 27 besonders negativ bewertete Praxen für
die Intervention ausgewählt, von denen in bislang 24 die Intervention
umgesetzt werden konnte. Für 5 dieser Praxen gab es keine post-Evaluationen, so dass in die vorliegende Auswertung die Daten von 19
Praxen (n=9 männliche Lehrärzte, n=10 weibliche Lehrärztinnen) eingingen. Die Evaluationen dieser Praxen waren nach der Intervention
(durch n=78 Studierende) signifikant positiver als vorher (durch n=82
Studierende): Odds Ratio 1.20 (95% Konfidenzintervall 1.10-1.31;
p<.001).
Schlussfolgerung: Die Ergebnisse deuten darauf hin, dass allgemeinmedizinische Universitätsinstitute die studentische Bewertung ihrer
Lehrpraxen über individuelle kollegiale Rückmeldungen verbessern
können.
Michael Pentzek1
Stefan Wilm1
Elisabeth
Gummersbach1
1 Heinrich-Heine-Universität
Düsseldorf, Medizinische
Fakultät, Centre for Health
and Society (chs), Institut für
Allgemeinmedizin (ifam),
Düsseldorf, Deutschland
Schlüsselwörter: Allgemeinmedizin, Ausbildung von Lehrkräften,
Feedback, Medizinstudenten, medizinische Ausbildung im
Grundstudium, Evaluation
Einleitung
Der „Masterplan Medizinstudium 2020“ sieht eine Stärkung der Rolle der Allgemeinmedizin im Curriculum vor
[1]. Eine von Studierenden und Lehrenden gewünschte
Form der Umsetzung besteht in Praktika in Hausarztpraxen bereits früh und kontinuierlich im Studienverlauf [2].
Die Erfahrungen, die Studierende in diesen Praktika machen, können –über reine Lerneffekte hinaus– eine be-
rufliche Orientierung mitformen; gute Erfahrungen in
Praktika können das Interesse an Allgemeinmedizin und
am Hausarztberuf steigern [3], [4].
Im Einklang mit der ärztlichen Approbationsordnung
[https://www.gesetze-im-internet.de/_appro_2002/
BJNR240500002.html] absolvieren die Studierenden im
Düsseldorfer Modellstudiengang in den Studienjahren 1,
2, 3 und 5 jeweils ein Praktikum in Hausarztpraxen mit
insgesamt
sechs
Wochen
Dauer
[https://
www.medizinstudium.hhu.de]. Die Anforderungen der
Praktika bauen inhaltlich aufeinander auf; zunächst liegt
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
8/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
der Schwerpunkt auf Anamnese und körperlicher Untersuchung, später kommen komplexere medizinische Zusammenhänge und Überlegungen zu weiterführender
Diagnostik und Therapie hinzu. Unter Supervision der
niedergelassenen Lehrärzte können die Studierenden
hier Erfahrungen in der Arzt-Patienten-Interaktion sammeln. Ein wichtiger und deshalb immer wieder betonter
Faktor für eine positive studentische Wahrnehmung der
Praktika ist die Tatsache, dass den Studierenden im
Praktikum die Möglichkeit gegeben wird, selbstständig
mit Patienten zu arbeiten, um sich unmittelbar selbst in
der ärztlichen Rolle erleben zu können [2], [5]. Für den
didaktischen Erfolg der Praktika spielen weiterhin die
Haltung und Qualifikation der Lehrärzte eine wichtige
Rolle [3]. Ungefähr 2/3 der Lehrpraxen werden von den
Studierenden sehr gut bewertet, ca. 1/3 jedoch nicht.
Aufgrund des seit Installation des Modellstudiengangs
steigenden Bedarfs an Praktikumsplätzen in Hausarztpraxen wurden viele Lehrpraxen neu gewonnen; eine Feedback-Kultur wird nun aufgebaut. Ein erster Schritt bestand
in der Möglichkeit für Lehrpraxen, aktiv ihre schriftlichen
Evaluationsergebnisse einzufordern, was aber fast nie in
Anspruch genommen wurde. Über den nächsten Schritt
der Etablierung einer Feedback-Strategie wird hier berichtet: Eine Möglichkeit zur Verbesserung der Lehrperformanz ist die Rückmeldung durch einen erfahrenen Kollegen „auf Augenhöhe“ (Peer-Feedback) [6]. Dies kann
Einsichten generieren, die studentische Evaluationen allein nicht erreichen und wird zunehmend als Ergänzung
zur Studierendenrückmeldung anerkannt. Insbesondere
in persönlichen Peer-Feedbacks können Ideen ausgetauscht, Probleme diskutiert, Strategien aufgezeigt und
konkrete Verbesserungsansätze gefunden werden [7].
Zu den möglichen Effekten gehören ein größeres Bewusstsein und eine stärkere Fokussierung des Lehrarztes auf
die Lehrsituation in der Praxis, mehr Information über
das, was gutes Lehren ausmacht, die Motivation zu verstärkter Interaktivität und Studierendenzentriertheit sowie
eine Inspiration zur Anwendung neuer Lehrmethoden [8].
Pedram et al. fanden nach einem Peer-Feedback positive
Effekte auf das Verhalten der Lehrenden, insbesondere
hinsichtlich der Gestaltung der Lernatmosphäre und des
Interesses am Studierendenverständnis [9]. Die Anwendung von Peer-Feedback auf das hier beschriebene Setting wurde bislang nicht untersucht. Die Fragestellung
der vorliegenden Studie lautet, ob sich die studentische
Praktikumsevaluation bislang schlecht bewerteter Hausarztpraxen nach Durchführung eines Peer-Feedback
verbessert.
Methoden
Lehrpraxen
Die Daten wurden im Rahmen der 4 Praktika in
Hausarztpraxen [https://www.uniklinik-duesseldorf.de/
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/lehre] erhoben, die alle in vom Institut
für Allgemeinmedizin koordinierten hausärztlichen Lehrpraxen stattfinden. Vor Aufnahme der Lehrarzttätigkeit
werden alle Lehrärzte mündlich und schriftlich über die
Erhebung studentischer Evaluationen und ein persönliches Gespräch mit einem oder einer Institutsmitarbeiter/in im Falle schlechter Evaluationsergebnisse informiert.
Interessierte Ärzte nehmen vor Aufnahme einer Lehrarzttätigkeit an einer 2-3-stündigen Informationsveranstaltung
unter Leitung des Institutsdirektors (SW) teil, in der sie
zunächst über die Voraussetzungen für die Lehrarzttätigkeit informiert werden; dazu gehören u.a. die Planung
zeitlicher Ressourcen für die Betreuung der Studierenden
in den Praktika, Begeisterung für die Arbeit als Hausarzt,
die Akzeptanz des universitären allgemeinmedizinischen
Lehrzielkataloges (insbesondere dass Praktikanten
selbstständig mit Patienten arbeiten dürfen) und die
Teilnahme an mindestens zwei der acht jährlich angebotenen allgemeinmedizinisch-didaktischen Fortbildungen
des Instituts. (Mit Aufnahme der Lehrtätigkeit geht das
Institut von der Akzeptanz dieser Voraussetzungen seitens
des Lehrarztes aus, überprüft das Vorliegen jedoch nicht
formal.) Es folgen ausführliche Informationen über den
Aufbau des Curriculums, die Verortung der Praktika, die
Inhalte und Anforderungen der einzelnen Praktika und
grundlegende didaktische Aspekte des 1:1-Unterrichts.
Über die Studierendenevaluation des Praktikums wird
mündlich und schriftlich aufgeklärt, verbunden mit dem
Angebot, sowohl eine Gesamtauswertung als auch die
individuelle Evaluation per E-Mail aktiv anfordern zu
können. Eine unaufgeforderte Rückmeldung der Evaluationsergebnisse an die Praxen gibt es nicht. Nach der Informationsveranstaltung wird eine Mappe mit entsprechenden schriftlichen Informationen ausgehändigt.
Vor jedem Praktikum wird den Lehrärzten ausführliches
Material zugeschickt, damit sie sich noch einmal orientieren können. Dieses enthält Hinweise zum genauen Ablauf
des Praktikums, zum aktuellen Lernstand der Studierenden inkl. Beilage der bzw. Verweis auf die zugrundeliegenden didaktischen Materialien, zu den im Praktikum zu
bearbeitenden Aufgaben und den damit verbundenen
Lernzielen, zur Relevanz des Übens am Patienten sowie
einen Hinweis zur Haltung, den Studierenden ein positives
Bild des Hausarztberufs vermitteln zu wollen.
Außerdem erhält jeder Studierende ein Anschreiben an
den Lehrarzt, in dem die wichtigsten o.g. Punkte noch
einmal zusammengefasst sind.
Evaluation
Die studentische Praktikumsevaluation als reguläres Element
der Lehrevaluation [https://www.medizin.hhu.de/studiumund-lehre/lehre.html] wurde in den untersuchten Praxen
vor und nach der Intervention durch unabhängige Studierendengruppen durchgeführt und bestand u.a. aus der
Möglichkeit für Freitext-Kommentare, einer Angabe der
Anzahl persönlich betreuter Patienten und den Items „Wie
zufrieden waren Sie mit der fachlichen Betreuung durch
Ihre Lehrärztin/Ihren Lehrarzt?“ und „Würden Sie anderen
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
9/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
KommilitonInnen diese Lehrpraxis empfehlen?“, beide
aufsteigend positiv 4-stufig skaliert.
Auswahl der Praxen für die Intervention
Da die meisten Praxen eine sehr gute Bewertung erhielten
(schiefe Verteilung), wurden wie folgt drei Gruppen identifiziert: Aus allen an den Praktika beteiligten Lehrpraxen
des Instituts wurden zunächst diejenigen ausgewählt, die
eine geringere als sehr gute Evaluation aufwiesen
(=„suboptimal“): mindestens einmal mit <2 auf mind.
einem der beiden o.g. Items bewertet oder wiederholt
negative Freitextkommentare. Aus dieser Gruppe der
suboptimal (=geringer als sehr gut) bewerteten Praxen
wurden nun die mit mehr als zwei vorliegenden Studierendenbewertungen, weiterhin bestehender Lehrarzttätigkeit und besonders negativen Bewertungen ausgewählt: mindestens zweimal mit <2 auf mind. einem der
beiden Items bewertet oder wiederholt negative Freitextkommentare. Von den 27 Praxen erhielten bislang 24
Praxen (88.9%) eine Intervention zur Verbesserung ihrer
Lehre von Seiten einer hausärztlich tätigen Allgemeinmedizinerin (n=3 pandemiebedingt noch nicht), und 19
Praxen (70.4%) lieferten Evaluationsergebnisse aus
Praktika nach der Intervention (n=5 hatten nach der Intervention keine Praktikanten mehr). Zur Charakterisierung der drei Gruppen der sehr gut, suboptimal und
schlecht evaluierten (=ausgewählten) Praxen wurde eine
Varianzanalyse inkl. post-hoc Scheffé-Tests mit dem
Faktor Gruppe und der abhängigen Variable Evaluationsergebnis gerechnet.
thematisiert und besprochen, gefolgt von der Frage „Was
können wir tun, um Sie zu unterstützen?“. Das schriftliche
Feedback bestand aus einer unkommentierten Rückmeldung der studentischen Evaluationsergebnisse (Scores
und Freitexte).
Analysen
Aufgrund einer starken Korrelation der beiden Evaluationsitems (Spearman’s rho=0.79) wurden diese für die
vorliegenden Analysen zu einer Gesamtbewertung gemittelt. Um multivariable Einflüsse auf diese studentische
Bewertung zu ermitteln, wurde eine verallgemeinerte
Schätzungsgleichung (GEE) mit der Clustervariable „Praxis“ gerechnet, aufgrund fehlender Normalverteilung
(Kolmogorow-Smirnow-Test p<.001) mit Gamma-Verteilung und Log-Verknüpfung. Als potenzielle Einflussvariablen flossen ein: Interventionseffekt (prä/post), Interventionsmodus (peer visit vs. Gruppe/schriftlich), Praktikumszeitpunkt (Studienjahr), Anzahl der persönlich betreuten
Patienten pro Woche. Parallel zu dieser Analyse wurde
in einer zweiten GEE der Interventionseffekt auf die Anzahl der persönlich betreuten Patienten untersucht.
Die Freitexte in den Studierendenevaluationen sowie die
Lehrarztkommentare in den peer visits und Gruppendiskussionen wurden qualitativ inhaltsanalytisch aufgearbeitet, um neben den reinen Zahlen auch die dahinterliegenden Probleme und die Lehrarztreaktionen auf das Feedback zu skizzieren. Dazu wurde eine induktive Kategorienbildung am Material vorgenommen [11]. Die Anzahlen
negativer Studierendenkommentare vor und nach der
Intervention wurden zudem quantitativ gegenübergestellt.
Intervention
Das Peer-Feedback wurde als Teil des didaktischen
Konzepts bei besonders negativ evaluierten Lehrpraxen
realisiert
[https://www.uniklinik-duesseldorf.de/
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/didaktik-fortbildungen]: Eine den
Lehrärzten bekannte und in Praxis und Lehre erfahrene
hausärztliche Mitarbeiterin des Instituts für Allgemeinmedizin (EG) meldete den Lehrärzten deren studentischen
Evaluationen zurück. Der vorrangige Modus war ein persönlicher Besuch in der Praxis (peer visit) [10]. Aus organisatorischen Gründen mussten gelegentlich Gruppendiskussionen mit mehreren Lehrärzten sowie ein schriftliches
Feedback als Ausweichlösungen angeboten werden. Peer
visit und Gruppendiskussion hatten beide eine Reflexion
der eigenen Lehrarztmotivation, der Probleme sowie eine
Diskussion der persönlichen Evaluation zum Ziel, um
darüber in einen konstruktiven Austausch zwischen
Lehrarzt und Universität in Bezug auf die Lehre und den
Umgang mit Studierenden in der Praxis zu gelangen. Peer
visits und Gruppendiskussionen wurden protokolliert. Die
Eingangsfrage lautete „Warum sind Sie Lehrarzt/Lehrärztin?“, gefolgt von Fragen zu persönlichen Erfahrungen:
„Können Sie mir über Ihre Erfahrungen berichten? Was
motiviert Sie zu der Lehrarzttätigkeit? Gibt es aus Ihrer
Sicht Probleme?“. Dann wurde das (schlechte) Feedback
Ergebnisse
Lehrpraxen und Präevaluationen
264 Lehrpraxen mit insgesamt 1648 Praktika waren beteiligt. Davon wurden 181 Praxen (68.6%) mit 1036
Praktika sehr gut bewertet (Mittelwert der Studierendenevaluation 3.8 ± Standardabweichung 0.2), 56 Praxen
(21.2%) mit 453 Praktika suboptimal (3.3±0.4) und 27
Praxen (10.2%) mit 159 Praktika sehr schlecht (2.8±0.4).
Der übergeordnete Vergleich der drei Gruppen ergibt signifikante Unterschiede (F(df=2)=205.1; p<.001), mit
jeweils signifikanten Unterschieden in allen post-hocVergleichen (alle p<.001): sehr gut vs. suboptimal (mittlere Differenz 0.51; Standardfehler 0.04); sehr gut vs.
schlecht (1,09; 0.06); suboptimal vs. schlecht (0.58;
0.07).
In Tabelle 1 ist die Analysestichprobe der n=19 aus den
27 schlecht bewerteten Praxen näher beschrieben.
Gründe für eine schlechte Bewertung laut Freitexten der
Studierendenevaluation lassen sich in fünf Kategorien
darstellen. So wurde die mangelnde Gelegenheit zum
Einüben praktischer Fertigkeiten am Patienten kritisiert.
„Leider hatte ich während meines letzten Patientenpraktikums nicht die Möglichkeit, viele Patienten ei-
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
10/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
Tabelle 1: Merkmale der Analysestichprobe
genständig zu untersuchen, obwohl ich dies zu mehreren Gelegenheiten eingefordert habe.“ (über Praxis
ID 1)
Weiterhin gab es Kommentare über mangelnde Wertschätzung und schwierige Kommunikation:
„Die Lehrärztin hat wenig Geduld insbesondere mit
ausländischen Patienten, die anatomische oder medizinische Begriffe nicht verstehen können. Sie macht
beleidigende und ironische Aussagen. Mit einigen
Patienten wurde ich 30 Minuten lang alleine gelassen,
während mit anderen nur 2 und danach hat sie sich
darüber geärgert, wenn ich mit der Untersuchung/Anamnese noch nicht fertig war.“ (über Praxis ID 14)
Einige Lehrärzte wurden hinsichtlich ihrer didaktischen
Kompetenz kommentiert:
„[…] als Lehrarzt hab ich ihn als wenig bis gar nicht
kompetent erlebt und auch sehr desinteressiert. Er
hatte keine Ahnung von dem, das PP1 [Patientenpraktikum 1] uns lehren soll und hat auch nach mehrmaligem Herantreten an ihn meinerseits wenig verstanden, worum es mir ging bzw. was ich dort lernen sollte.“ (über Praxis ID 22)
Genannt wurden Praxisabläufe und –strukturen, die laut
Studierenden eine effiziente Praktikumsdurchführung
erschwerten:
„Von 8-11 Uhr kommen nur Patienten zur Blutentnahme, feste Termine sind in der Zeit nicht geplant. Da
ich weder Blut abnehmen noch impfen durfte, war in
der Zeit nichts für mich zu tun.“ (über Praxis ID 10)
In einigen Praxen mit primär nicht-deutschsprachigem
Patientenklientel und auch Personal (inkl. Lehrarzt)
stellte sich in den Evaluationen die Sprachbarriere als
Problem heraus.
„Da die Lehrärztin [Nationalität XY] ist, verliefen ca.
70% der Konsultationen auf [Sprache XY].“ (über
Praxis ID 2)
Intervention
In den Protokollen der peer visits und Gruppendiskussionen mit den Lehrärzten zeigen sich vier Kategorien von
Problemen, die teilweise die genannten Studierendenkommentare spiegeln: So berichteten die Lehrärzte von
Bedenken, Studierende allein mit Patienten arbeiten zu
lassen. (Im folgenden Zitate aus den Protokollen der intervenierenden Peer-Ärztin.)
„Es fällt ihm schwer, Studierende allein zu lassen. […]
Er meint, die Patienten mögen das nicht so, obwohl
seine Erfahrungen eigentlich anders sind. Hat auch
viele Patienten aus dem Management. „Die Studierenden sind auch zu kurz in der Praxis.““ (zu ID 17)
Auch eine skeptische Haltung vor allem Studierenden
niedriger Semester gegenüber wurde geäußert.
„Kann mit den 2. Semestern nichts anfangen, „die
können nichts, es hat keinen Sinn, sie das Herz abhören zu lassen, wenn sie die Krankheitsbilder nicht
kennen.“ […] „Das Problem ist auch, dass es jetzt
immer ganz junge Mädchen sind.““ (zu ID 24)
Einige Lehrärzte waren nicht vertraut mit den didaktischen
Konzepten und Materialien der Praktika.
„Er hat keine Kenntnis von der Lehre, liest sich nichts
durch. Weiß auch nicht, dass er evaluiert wird.“ (zu
ID 6)
Teils führt ein Selbstverständnis als allgemeinmedizinischer Lehrarzt zur Definition eigener Praktikumsinhalte
unter Vernachlässigung oder Abwertung der universitär
vorgegebenen Lernziele.
„„Ich habe mich zur Allgemeinmedizin bekannt und
will das weiterreichen.“ Erklärt den Studierenden viel,
lässt aber nicht viel machen. „Ich zeige jungen Menschen den rechten Weg. Sonst macht es ja keiner (die
Uni schon gar nicht), also mach ich es.““ (zu ID 4)
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
11/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
Tabelle 2: Multivariable Einflüsse auf die abhängige Variable ‚studentische Bewertung des Praktikums‘ (verallgemeinerte
Schätzungsgleichung (GEE) mit Clustervariable Praxis)
Tabelle 3: Anzahl der Kommentare von Studierenden zu Praktika in 19 schlecht evaluierten hausärztlichen Lehrpraxen
„Möchte allerdings eindeutig den Studierenden alles
zeigen, erwähnt wiederholt Ultraschall, Blutabnahmen, kennt Lehrinhalte nicht, macht sich eigene
Lehrinhalte: „Ich zeig denen alles Interessante““. (zu
ID 22)
An mehreren Stellen äußerten die Lehrärzte Intentionen
zur Verhaltensänderung, laut Protokollen z.B. „will Studierende mehr zum Selbst-Untersuchen anleiten“ oder „sagt,
er wolle sich zukünftig die Handouts durchlesen“. Die
Mehrzahl der besuchten Lehrärzte zeigte sich im Gespräch grundsätzlich interessiert und engagiert in der
Betreuung der Studierenden. Die meisten waren in der
Lage, die Kritikpunkte zu reflektieren.
Prä-post-Analyse: Der Interventionseffekt auf die studentische Bewertung ist deutlich und unabhängig vom
(ebenfalls signifikanten) Einfluss der Patientenanzahl
(siehe Tabelle 2).
Auch der Interventionseffekt auf die Anzahl persönlich
durch die Studierenden betreuter Patienten bleibt in einer
GEE bestehen (Odds Ratio 1.41; 95% Konfidenzintervall
1.21-1.64; p<.001), unabhängig von der Art der Intervention und dem Studienjahr (Analyse nicht gezeigt).
Der Anteil kritischer Anmerkungen in den studentischen
Freitextkommentaren nimmt insgesamt und in vier der
fünf genannten Kategorien deutlich ab (siehe Tabelle 3).
Diskussion
Ein Peer-Feedback durch eine hausärztlich tätige Allgemeinmedizinerin wirkte sich in einer Stichprobe schlecht
evaluierter Lehrärzte, die im Rahmen der hausärztlichen
Praktika Studierende betreuten, im prä-post-Vergleich
positiv auf die studentische Evaluation und auf die Anzahl
der im Praktikum von Studierenden persönlich betreuten
Patienten aus. Dies zeigt sich in den Evaluationsscores
und auch darin, dass entsprechend negative Freitextkommentare der Studierenden nach der Intervention seltener
waren.
Im Einklang mit der Literatur war es entscheidend für die
studentische Bewertung, dass den Studierenden die
Möglichkeit gegeben wurde, selbstständig mit Patienten
zu arbeiten, um sich unmittelbar selbst in der ärztlichen
Rolle erleben zu können [2], [5]. Aber auch unabhängig
von der Patientenanzahl verbesserte sich die studentische
Evaluation nach der Intervention: Die qualitativen Ergeb-
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
12/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
nisse liefern Hinweise darauf, dass sich die Lehrärzte
nach der Intervention näher mit dem Sinn der Praktika,
den Lernzielen und didaktischen Materialien beschäftigt
haben könnten. Dies wiederum schien auch positive Effekte auf den Austausch und die Beziehung zwischen
Lehrarzt und Studierendem (u.U. im Sinne eines Abgleichs
gegenseitiger Erwartungen) gehabt zu haben – ebenfalls
wichtige Elemente einer positiven Praktikumserfahrung
[3], [12]. Die qualitativen Ergebnisse zur didaktischen
Kompetenz und Haltung weisen darauf hin, dass es zumindest für die hier untersuchte kleine Gruppe zuvor
schlecht evaluierter Lehrärzte einer intensiveren Auseinandersetzung mit ihrem Lehrauftrag und einer wiederholten Interaktion zwischen der universitären Einrichtung
für Allgemeinmedizin und der Lehrpraxis bedarf, um Inhalte und Konzepte zu verinnerlichen und diese in den
Praktika für Studierende wiedererkennbar und konsistent
umzusetzen. Dass gerade die schlecht evaluierten
Lehrärzte eher selten an den (in Düsseldorf achtmal pro
Jahr angebotenen) Treffen in der Universität teilnehmen,
ist eine auch von vielen anderen Standorten berichtete
Erfahrung. Die formale Überprüfung der Voraussetzungen
und Kriterien für eine angemessene Lehrarzttätigkeit
wäre bei der – insbesondere in einem longitudinal-allgemeinmedizinisch und praxisnah konstruierten Curriculum
erforderlichen – hohen Anzahl an Lehrpraxen mit enormem Aufwand verbunden. Es ist jedoch abzuwägen, ob
mehr Ressourcen in die Auswahl und Qualifikation lehrinteressierter Praxen oder aber in die Qualitätskontrolle
und das Training bereits lehrender Praxen zu investieren
ist.
Eine Stärke dieser Studie sind die Bewertungen durch
unabhängige Studierendengruppen prä-post, so dass
Verzerrungen durch wiederholte Exposition der Studierenden mit einer Praxis (z.B. response shift bias, Gewöhnung,
observer drift) ausgeschlossen sind. Die mit dem präpost-Design ohne Kontrollgruppe und dem Fokus auf
schlecht evaluierte Praxen einhergehende Schwäche
besteht u.a. im Phänomen der Regression zur Mitte,
welches vermutlich einen Teil des positiven Interventionseffekts begründet. Die primäre Fragestellung dieser Studie ist quantitativ formuliert und beantwortet; wir berichten nur begrenzt qualitative Ergebnisse. Diese erlauben
hier nur in Teilen hypothesengenerierende Einsichten in
die genauen Wirkmechanismen eines Peer-Feedback
[13]. In der vorliegenden Studie wurden mehrere Modi
der Vermittlung eines Peer-Feedback realisiert. Da die
Analysen nicht auf unterschiedliche Effekte des personell
und zeitlich aufwändigen peer visit einerseits und der
effizienteren Methoden Gruppendiskussion und schriftliche Rückmeldung andererseits hindeuten, sind vor einer
breiteren Umsetzung weitere Studien zur Differenzierung
notwendig. So fanden Rüsseler et al. [14], dass ein
schriftliches Peer-Feedback – dort allerdings bezogen
auf Vorlesungsdozenten – positive Effekte auf die Gestaltung der Lehrveranstaltung hatte.
Schlussfolgerungen
Es macht Sinn, die Effekte eines Lehrarzt-Feedbacks sowohl in der Forschung als auch in der Lehre weiter zu
berücksichtigen. Die umfangreichen GMA-Empfehlungen
bieten einen robusten Rahmen für die Lehre [15] und die
didaktische Qualifizierung von Lehrärzten [16]. Darin
eingebettet stellt ein kollegiales Peer-Feedback für
schlecht bewertete Lehrärzte ein mögliches Werkzeug
für das Qualitätsmanagement der allgemeinmedizinischen
Lehre dar.
Interessenkonflikt
Die Autor*innen erklären, dass sie keinen Interessenkonflikt im Zusammenhang mit diesem Artikel haben.
Literatur
1.
Bundesministerium für Bildung und Forschung. Masterplan
Medizinstudium 2020. Berlin: Bundesministerium für Bildung
und Forschung; 2017. Zugänglich unter/available from: https:/
/www.bmbf.de/files/2017-03-31_Masterplan%
20Beschlusstext.pdf
2.
Wiesemann A, Engeser P, Barlet J, Müller-Bühl U, Szecsenyi J.
Was denken Heidelberger Studierende und Lehrärzte über
frühzeitige Patientenkontakte und Aufgaben in der
Hausarztpraxis? Gesundheitswesen. 2003;65(10):572-578. DOI:
10.1055/s-2003-42999
3.
Grunewald D, Pilic L, Bödecker AW, Robertz J, Althaus A. Die
praktische Ausbildung des medizinischen Nachwuchses Identifizierung von Lehrpraxen-Charakteristika in der
Allgemeinmedizin. Gesundheitswesen. 2020;82(07):601-606.
DOI: 10.1055/a-0894-4556
4.
Böhme K, Sachs P, Niebling W, Kotterer A, Maun A. Macht das
Blockpraktikum Allgemeinmedizin Lust auf den Hausarztberuf?
Z Allg Med. 2016;92(5):220-225. DOI:
10.3238/zfa.2016.0220–0225
5.
Gündling PW. Lernziele im Blockpraktikum Allgemeinmedizin Vergleich der Präferenzen von Studierenden und Lehrärzten. Z
Allg Med. 2008;84:218-222. DOI: 10.1055/s-2008-1073148
6.
Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M,
Prideaux D. A systematic review of faculty development initiatives
designed to improve teaching effectiveness in medical education:
BEME Guide No. 8. Med Teach. 2006;28(6):497-526. DOI:
10.1080/01421590600902976
7.
Garcia I, James RW, Bischof P, Baroffio A. Self-Observation and
Peer Feedback as a Faculty Development Approach for ProblemBased Learning Tutors: A Program Evaluation. Teach Learn Med.
2017;29(3):313-325. DOI: 10.1080/10401334.2017.1279056
8.
Gusic M, Hageman H, Zenni E. Peer review: a tool to enhance
clinical teaching. Clin Teach. 2013;10(5):287-290. DOI:
10.1111/tct.12039
9.
Pedram K, Brooks MN, Marcelo C, Kurbanova N, Paletta-Hobbs
L, Garber AM, Wong A, Qayyum R. Peer Observations: Enhancing
Bedside Clinical Teaching Behaviors. Cureus. 2020;12(2):e7076.
DOI: 10.7759/cureus.7076
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
13/14
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ...
10.
O'Brien MA, Rogers S, Jamtvedt G, Oxman AD, Odgaard-Jensen
J, Kristoffersen DT, Forsetlund L, Bainbridge D, Freemantle N,
Davis DA, Haynes RB, Harvey EL. Educational outreach visits:
effects on professional practice and health care outcomes.
Cochrane Database Syst Rev. 2007;2007(4):CD000409. DOI:
10.1002/14651858.CD000409.pub2
11.
Kruse J. Qualitative Interviewforschung. 2. Aufl. Weinheim: Beltz
Juventa; 2015.
12.
Koné I, Paulitsch MA, Ravens-Taeuber G. Blockpraktikum
Allgemeinmedizin: Welche Erfahrungen sind für Studierende
relevant? Z Allg Med. 2016;92(9):357-362. DOI:
10.3238/zfa.2016.0357-0362
13.
14.
15.
16.
Raski B, Böhm M, Schneider M, Rotthoff T. Influence of the
personality factors rigidity and uncertainty tolerance on peerfeedback. In: 5th International Conference for Research in
Medical Education (RIME 2017), 15.-17. March 2017,
Düsseldorf, Germany. Düsseldorf: German Medical Science GMS
Publishing House; 2017. P15. DOI: 10.3205/17rime46
Korrespondenzadresse:
PD Dr. rer. nat Michael Pentzek
Heinrich-Heine-Universität Düsseldorf, Medizinische
Fakultät, Centre for Health and Society (chs), Institut für
Allgemeinmedizin (ifam), Moorenstr. 5, Gebäude 17.11,
40225 Düsseldorf, Deutschland, Tel.: +49
(0)211/81-16818
mp@hhu.de
Bitte zitieren als
Pentzek M, Wilm S, Gummersbach E. Does peer feedback for teaching
GPs improve student evaluation of general practice attachments? A
pre-post analysis. GMS J Med Educ. 2021;38(7):Doc122.
DOI: 10.3205/zma001518, URN: urn:nbn:de:0183-zma0015182
Artikel online frei zugänglich unter
https://www.egms.de/en/journals/zma/2021-38/zma001518.shtml
Ruesseler M, Kalozoumi-Paizi F, Schill A, Knobe M, Byhahn C,
Müller MP, Marzi I, Walcher F. Impact of peer feedback on the
performance of lecturers in emergency medicine: a prospective
observational study. Scand J Trauma Resusc Emerg Med.
2014;22:71. DOI: 10.1186/s13049-014-0071-1
Eingereicht: 03.03.2021
Überarbeitet: 12.08.2021
Angenommen: 17.08.2021
Veröffentlicht: 15.11.2021
Huenges B, Gulich M, Böhme K, Fehr F, Streitlein-Böhme I,
Rüttermann V, Baum E, Niebling WB, Rusche H.
Recommendations for Undergraduate Training in the Primary
Care Sector - Position Paper of the GMA-Primary Care Committee.
GMS Z Med Ausbild. 2014;31(4):Doc35. DOI:
10.3205/zma000927
Copyright
©2021 Pentzek et al. Dieser Artikel ist ein Open-Access-Artikel und
steht unter den Lizenzbedingungen der Creative Commons Attribution
4.0 License (Namensnennung). Lizenz-Angaben siehe
http://creativecommons.org/licenses/by/4.0/.
Böhme K, Streitlein-Böhme I, Baum E, Vollmar HC, Gulich M,
Ehrhardt M, Fehr F, Huenges B, Woestmann B, Jendyk R. Didactic
qualification of teaching staff in primary care medicine - a position
paper of the Primary Care Committee of the Society for Medical
Education. GMS J Med Educ. 2020;37(5):Doc53. DOI:
10.3205/zma001346
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017
14/14
Higher Learning Research Communications
2021, Volume 11, Issue 2, Pages 22–39. DOI: 10.18870/hlrc.v11i2.1244
Original Research
© The Author(s)
Students’ and Teachers’ Perceptions and Experiences
of Classroom Assessment: A Case Study of a Public
University in Afghanistan
Sayed Ahmad Javid Mussawy, PhD Candidate
University of Massachusetts Amherst, Amherst, Massachusetts, United States
https://orcid.org/0000-0001-9991-6681
Gretchen Rossman, PhD
University of Massachusetts Amherst, Amherst, Massachusetts, United States
https://orcid.org/0000-0003-1224-4494
Sayed Abdul Qahar Haqiqat, MEd
Baghlan University, Pule-khumri, Baghlan, Afghanistan
Contact: smussawy@umass.edu
Abstract
Objective: The primary goal of the study was to examine students’ perceptions of classroom assessment at a
public university in Afghanistan. Exploring current assessment practices focused on student and faculty
members lived experiences was a secondary goal. The study also sought to collect evidence on whether or not
the new assessment policy was effective in student achievement.
Method: Authors used an explanatory sequential mixed-methods design to conduct the study. Initially, we
applied the Students Perceptions of Assessment Questionnaire (SPAQ), translated into Dari/Farsi and
validated, to collect data from a random sample of 400 students from three colleges: Agriculture, Education,
and Humanities. Response rate was 88.25% (N = 353). Semi-structured interviews were used to collect data
from a purposeful sample of 18 students and 7 faculty members. Descriptive statistics, one-way ANOVA, and
t-tests were used to analyze quantitative data, and NVivo 12 was used to conduct thematic analysis on
qualitative data.
Results: The quantitative results suggest that students have positive perceptions of the current assessment
practices. However, both students and faculty members were dissatisfied with the grading policy, reinforcing
summative over formative assessment. Results support that the policy change regarding assessment has
resulted in more students passing the courses compared to in the past. The findings also suggest
improvements in faculty professional skills such as assessment and teaching and ways that they engage
students in assessment processes.
Implication for Policy and Practice: Recommendations include revisiting the grading policy at the
national level to allow faculty members to balance the formative and summative assessment and utilizing
We would like to thank the students and teachers who participated and assisted with this study.
Mussawy et al., 2021
Open Access
assessment benchmarks and rubrics to guide formative and summative assessment implementation in
practice.
Keywords: assessment, classroom assessment, higher education, Afghanistan
Submitted: March 14, 2021 | Accepted: July 23, 2021 | Published: October 13, 2021
Recommended Citation
Mussawy, S. A. J., Rossman, G., & Haqiqat, S. A. Q. (2021). Students’ and teachers’ perceptions and experiences of
classroom assessment: A case study of a public university in Afghanistan. Higher Learning Research
Communications, 11(2) 22–39. DOI: 10.18870/hlrc.v11i2.1244
Introduction
Classroom assessment, an instrumental aspect of teaching and learning, refers to a systematic process of
obtaining information about learner progress, understanding, skills, and abilities towards the learning goals
(Dhindsa et al., 2007; Goodrum et al., 2001; Klenowski & Wyatt-Smith, 2012; Linn & Miller, 2005). According
to Scriven (1967) and Poskitt (2014), educational assessment surfaced in the 20th century to serve two purposes.
The first was to improve learning (formative assessment), and the second was to make judgments about student
learning (summative assessment). The current literature on assessment emphasizes establishing alignment
between educational expectations versus student learning needs (Black et al., 2003; Gulikers et al., 2006;
Mussawy, 2009). Therefore, teachers use various forms of assessment to determine where students are and
create diverse activities to help them achieve the expected outcomes (Mansell et al., 2020).
As most countries have expanded their higher education systems by embracing broader access to higher
education, the student population has also become diverse (Altbach, 2007; Salmi, 2015). The more diverse
student population suggests that conventional assessment approaches may no longer work. Therefore,
alternative assessment approaches need to put students in the center to avoid wasting “learning for drilling
students in the things that they [teachers] will be held accountable [for]” (Dhindsa et al., 2007, p. 1262).
The concept of classroom assessment has been loosely defined in the higher education sector of Afghanistan.
While students and teachers are aware of different assessment approaches, current assessment practices rely
heavily on conventional summative assessment (Mussawy, 2009; Noori et al., 2017). Previously, final exams
were the only mechanisms to assess student learning (UNESCO-IIEP, 2004). However, higher education
reform in Afghanistan in the early 2000s paved the way for introducing mid-term exams and the credit
system that replaced the conventional course structure based on the number of subjects (Babury & Hayward,
2013; Hayward, 2017). More specifically, in the traditional system, the value of final grades for each subject
was the same, irrespective of the number of hours the subject was taught per week. However, in the credit
system, the value of grades varies depending on credit hours per week. Further, due to the absence of specific
regulations on assessment approaches, faculty members enjoyed immense autonomy in assessing student
learning. Since most of the faculty members had not received any training on pedagogy and assessment, they
primarily relied on conventional open-ended summative assessment (Darmal, 2009).
In 2014, the Ministry of Higher Education (MoHE) in Afghanistan introduced a new assessment policy that
centers on (a) transparency through the establishment of assessment committees at the institution and faculty
levels and (b) the type and the number of question items in an exam (Ministry of Higher Education (MoHE),
2018). The second component, which is the focus of this study, indicates that assessment includes “evaluation
of quizzes, mid-term exams, assignments, laboratory projects, class seminars and projects, final exams, and
thesis and dissertations” (MoHE, 2018, p. 5). While mid-term and final exams constitute 20% and 60% of
students’ grades, respectively, the policy emphasizes “30—40 question items on final exams” and “a minimum
of 10 items on mid-terms” (MoHE, 2018, p. 7). The policy also recommends a combination of closed-ended
Higher Learning Research Communications
23
Mussawy et al., 2021
Open Access
and open-ended questions with a value of “3–5 points” for descriptive and analytic items and “1 point for
multiple-choice questions” (MoHE, 2018, p. 5).
Although the assessment policy recognizes various approaches, such as quizzes, assignments, student
projects, seminars, and mid-term and final exams, formative assessment and class attendance account for
only 20% of a student’s grade. Mid-term and final exams, on the contrary, constitute 80% of students’ grades;
this indirectly projects more value for summative over formative assessment. Therefore, perceptions of
students and faculty members can shed light on current practices and participants’ experiences of classroom
assessment.
Review of Literature
The focus of classroom assessment has gradually shifted from assessment of learning—“testing learning”
(Birenbaum & Feldman, 1998, p. 92) to assessment for learning—creating diverse opportunities for learners to
prosper (Brown, 2005; Wiliam, 2011). This is because research shows that classroom assessment significantly
affects the approach students take to learning (Pellegrino & Goldman, 2008). More specifically, new
assessment approaches encourage an increase in correspondence between student learning needs and
expectations to prosper in a changing environment (Gulikers et al., 2006). Goodrum et al. (2001) argued that,
ideally, assessment “enhances learning, provides feedback about student progress, builds self-confidence and
self-esteem, and develops skills in evaluation” (p. 2). Nonetheless, Dhindsa et al. (2007) stated that [primary
and secondary school] teachers “sacrifice learning for drilling students in the things that they will be held
accountable for” (p. 1262). This suggests that teachers use “a very narrow range of assessment strategies” to
help students prepare for high-stakes tests, while limited evidence exists to support that “teachers actually use
formative assessment to inform planning and teaching” (Goodrum et al., 2001, p. 2). Most importantly, recent
research on classroom assessment emphasizes the quality and relevance of assessment activities to help
students learn (Ibarra-Saiz et al., 2020).
Inquiring into students’ perception of assessment has been an important aspect of the literature on classroom
assessment (Koul et al., 2006; Segers et al., 2006; Struyven et al., 2005; Waldrip et al., 2009). Examining
their perceptions confirms the assumption that assessment “rewards genuine effort and in-depth learning
rather than measuring luck” (Dhindsa et al., 2007, p. 1262). For this reason, recent studies on classroom
assessment advocate for student involvement in developing assessment tools (Falchikove, 2004; Waldrip et
al., 2014) to make the learning process more valuable to students. With this in mind, Fisher et al. (2005)
developed Students Perceptions of Assessment Questionnaire (SPAQ) and confirmed its validity by applying it
to a sample consisting of 1,000 participants from 40 science classes in grades 8–10. Following that, Cavanagh
et al. (2005) modified and adapted the SPAQ as an analytic tool to study student perceptions of classroom
assessment in five specific areas: Congruence with planned learning (CPL), assessment of applied learning
(AAL), students’ consultation (SC) types, transparency in assessment (TA), and accommodation of students’
diversity (ASD) in assessment procedures. Cavanagh et al. (2005) used SPAQ to study 8th through 10th grade
student perceptions of assessment in Australian science classrooms. Their study showed that student
perceptions of assessment in science subjects varied depending on their abilities.
Other studies examining students’ perceptions of assessment reveal diverse responses. For instance, Koul et
al. (2006) modified, validated, and applied SPAQ on a 4-point Likert scale to study secondary student
perceptions of assessment in Australia. Their study shows that the difference between males’ and females’
perceptions of assessment was not statistically significant. However, they reported statistically significant
differences in student perceptions of assessment by grade level. Similarly, Dhindsa et al. (2007) used SPAQ to
examine high school student perceptions of assessment in Brunei Darussalam and learned that the Student
Consultation was rated the lowest of the scales. Their findings suggest that students perceived assessment as
Higher Learning Research Communications
24
Mussawy et al., 2021
Open Access
transparent and as aligned with learning goals. However, they did find that teachers hardly consulted with
students regarding assessment forms.
Kwok (2008) also studied student assumptions of peer assessment and reported that, while students
perceived peer assessment as substantially important in enhancing self-efficacy, they considered themselves
unprepared relative to their teachers who brought years of experience. In another study, Segers et al. (2006)
examined college students’ understanding of assignment-based learning versus problem-based learning. Their
study showed that students in the assignment-based learning course embraced “more deep learning strategies
and less surface-learning strategies than the students in the PBL [problem-based learning] course” (Segers et
al., 2006, p. 234). They reported that students in the PBL course showed surface-level learning strategies
(Segers et al., 2006, p. 236). Although the context varied, their findings are partly consistent with those of
Birenbaum and Feldman (1998), who examined 8th through 10th-grade student attitudes towards openended versus closed-ended response assessment. They reported that gender and learning strategies were
significantly correlated in that female students leaned towards essay questions while male students favored
closed responses. In other words, students who demonstrated the “surface study approach” preferred closeended question items as opposed to those with “deep study approach,” favoring open-ended questions
(Birenbaum & Feldman, 1998).
However, Beller and Gafni’s (2000) study shows that although boys favored multiple-choice questions items
in mathematics assessment, the difference between performance based on gender was not profound. Their
study focused on the relationship between question format, examining whether multiple choice versus openended questions accounted for gender differences. Their study “results challenge the simplistic assertion that
girls perform relatively better on OE [open-ended] test items” (Beller & Gafni, 2000, p. 16). On a similar note,
Van de Watering et al. (2008) found no “relationship between students’ perceptions of assessment and their
assessment results” (p. 657). They reported that students prefer close-ended question formatting when
attending to a “New Learning Environment” (Van de Watering et al., 2008, p. 245).
Meanwhile, Struyven et al. (2005) studied the relationship between student perceptions of assessment and
their learning approaches. In general, students preferred close-ended questions; however, students with
advanced learning abilities and with low test anxiety favored essay exams. Lastly, Ounis (2017) investigated
perceptions of classroom assessment among secondary school teachers in Tunisia. The author reported that
the teachers “have highly favorable perceptions of assessment and they hold highly the motivational function
of assessment” (p. 123). According to Ounis (2017), the teachers emphasized oral assessment as a useful
approach to increase learning even though they reported some challenges to implementing the oral
assessment.
Although assessment in higher education is loosely defined relative to assessment at the primary and
secondary education levels, recent literature sheds light on introducing alternative/formative assessment
tasks such as portfolios, applied research projects, and others (Bess, 1977; Ibarra-Sáiz et al., 2020; Nicol &
Macfarlane-Dick, 2006; Struyven et al., 2005). Further, to date, research on perceptions and experiences of
undergraduate students and faculty members in Afghanistan is scarce. For instance, Noori et al. (2017) and
Darmal (2009) studied assessment practices of university lecturers in Afghanistan. However, the scope of
these research studies is limited. For instance, Darmal’s (2009) study focuses on the experiences of six faculty
members involved in the Department of Geography, and Noori et al.’s (2017) research included three lecturers
who taught English as a Foreign Language. Since the government has introduced new regulations on
assessment with a focus on types and number of questions in mid-term and final exams, exploring the
experiences of students and faculty members can shed light on the meaningfulness of classroom assessment
and create insight into the policy.
Although the existing literature provides mixed findings regarding the student perceptions of assessment
based on gender, gender equity has been underscored as a key challenge in the higher education sector of
Higher Learning Research Communications
25
Mussawy et al., 2021
Open Access
Afghanistan (Babury & Hayward, 2014; Mussawy & Rossman, 2018). According to Babury and Hayward
(2014), female students constitute less than 20% of the student population in universities. Since females are
underrepresented in the higher education sector, examining students’ perceptions of assessment based on
gender will inform whether assessment practices serve male and female students evenly.
Study Purpose
The primary purpose of the study was to examine student perceptions of classroom assessment at a university
in Afghanistan. Exploring current assessment practices focused on student and faculty lived experiences was a
secondary purpose. The study also sought to collect evidence on whether the new Afghanistan assessment
policy was effective in improving student learning. Cavanagh et al. (2005) suggested two strategies to
understand the advantages and disadvantages of classroom assessment on student learning: (a) examining
the research on assessment forms that teachers use; and (b) inquiring into students’ perceptions of classroom
assessment. This study used both strategies. More specifically, the research questions and hypotheses guiding
the study are below.
1.
What are the perceptions of students about classroom assessment? As part of this research question,
gender and academic discipline differences were explored.
Hypothesis 1: There is no significant difference in student perceptions of classroom assessment
based on gender.
Hypothesis 2: There is no significant difference in student perceptions of classroom assessment
based on academic discipline.
2. What are the experiences of students and faculty members concerning classroom assessment?
Significance of the Study
This study contributes to the literature on classroom assessment. First, the study’s findings provide new
insights into how students perceive classroom assessment and whether the assessment outcomes affect their
learning. Second, the research explored student and faculty lived experiences with classroom assessment.
Specific attention was given to faculty pedagogical skills and assessment literacy. Third, teachers’ challenges
concerning the national assessment policy with a focus on grading practices are highlighted. The study also
informs the conversation regarding student involvement in assessment processes and the challenges
associated with the lack of student preparedness to pursue undergraduate degree programs.
Theoretical Framework
The study uses formative and summative assessment as an analytic lens to explore perceptions and experiences
of classroom assessment among undergraduate students and faculty. Formative and summative assessment
approaches are well explored in the literature (Scriven, 1967; Wiliam & Black, 1996; Wiliam & Thompson,
2008). Formative assessment in the United States refers to “assessments that are used to provide information on
the likely performance of students on state-mandated test—a usage that might better be described as ‘earlywarning summative’” (Wiliam & Thompson, 2008, p. 60). Other places use formative assessment to provide
feedback to students—informing them “which items they got correct and incorrect” (Wiliam & Thompson, 2008,
p. 60). Providing feedback to improve learning is a key component of formative assessment that benefits
students in a higher education setting to achieve desirable outcomes (Black & Wiliam, 1998; Nicole &
Macfarlane-Dick, 2006; Sadler, 1998). In other words, formative assessment allows instructors to help students
engage in their own learning by exhibiting what they know and identifying their needs to move forward (Black &
Higher Learning Research Communications
26
Mussawy et al., 2021
Open Access
Wiliam, 1998; Mansell et al., 2020; Wiliam, 2011). Formative assessment occurs in formal and informal forms
such as quizzes, oral questioning, self-reflection, peer feedback, and think-aloud (Mansell et al., 2020; Wiggins &
McTighe, 2007). Formative assessment also influences the quality of teaching and learning while engaging
students in self-directed learning (Stiggins & Chappuis, 2005).
On the other hand, summative assessment is bound to administrative decisions (Wiliam, 2008). It occurs at the
“end of a qualification, unit, module or learning target to evaluate the learning which has taken place towards the
required outcomes” (Mansell et al., 2020, p. xxi). Summative assessment, known as assessment of learning, is
primarily used “in deciding, collecting and making judgments about evidence relating to the goals of the learning
being assessed” (Harlen, 2006, p. 103). Herrera et al. (2007, p. 13) argued that “assessment of achievement has
become increasingly standardized, norm-referenced and institutionalized,” which thus negatively affects the
quality of teaching (Firestone & Mayrowetz, 2000). For scholars like Stiggins and Chappuis (2005), student
roles vary depending on assessment forms, suggesting that summative assessment enforces a passive role while
formative assessment engages students in the process as active members.
While some studies promote formative assessment over summative assessment (Firestone & Mayrowetz,
2000; Harlen, 2006), other studies emphasize the purpose and outcome of assessment activities with a focus
on ways to utilize the information to improve the teaching and learning experience (Taras, 2008; Ussher &
Earl, 2010). Bloom (1969) also asserted that when assessment is aligned with the process of teaching and
learning, it will have “a positive effect on student learning and motivation” (cited in Wiliam, 2008, p. 58).
Assessment in general accounts for “supporting learning (formative), certifying the achievement or potential
of individuals (summative), and evaluating the quality of educational institutions or programs (evaluative)”
(Wiliam, 2008, p. 59). Black and Wiliam (2004) emphasized ways to use the outcomes of formative and
summative assessment approaches to improve student learning. Taras (2008) argued that “all assessment
begins with summative assessment (which is a judgment) and that formative assessment is, in fact,
summative assessment plus feedback which the learner uses” (p. 466). According to Taras (2008), both
formative and summative assessments require “making judgments,” which might be implicit or explicit
depending on the context (p. 468). In other words, Taras (2008) argued that assessment could not “be
uniquely formative without the summative judgment having preceded it” (p. 468). Similarly, Wiggins and
McTighe (2007) explained that formative assessment occurs during instruction rather than as a separate
activity at the end of a class or unit. The literature on assessment underscores the importance of formative
and summative assessment and ways that “assessment… feed into actions in the classroom in order to affect
learning” (Wiliam & Thompson, 2008, p. 63).
Methods
Research Site
The study was conducted at a public university in Northern Afghanistan. The university, established in 1993
and re-established in 2003, has seven colleges and 27 departments. The university has approximately 155 fulltime faculty members who serve approximately 5,000 students, 20% of whom are female. The faculty–
student ratio at the university is 1/35, and the staff–student ratio is 1:70. The university offers only
undergraduate degrees.
Procedure and Participants
The authors used an explanatory sequential mixed-methods design to collect data from senior, junior, and
some sophomore students. We administered the 24-item SPAQ to a random sample of 400 students from the
Agriculture, Education, and Humanities colleges and received a response rate from 355 students (88.25 %).
Following the administration of the SPAQ, the authors conducted document analysis (mainly policy
documents on assessment) as well as semi-structured interviews with a purposeful sample of 25 individuals,
Higher Learning Research Communications
27
Mussawy et al., 2021
Open Access
seven faculty members, and 18 undergraduate students to explore their lived experiences concerning current
assessment practices. The in-person interviews ranged from 30 to 70 minutes. The notation for this study can
be written as QUAN → QUAL (Creswell & Clark, 2017). The authors obtained approval of the Institutional
Review Board prior to conducting the study.
Instrument
We adapted the SPAQ (Cavanagh et al., 2005) to examine students’ perceptions of assessment. As a
conceptual model, SPAQ assesses students’ perceptions of assessment in the following five dimensions:
1.
Congruence with planned learning (CPL)—Students affirm that assessment tasks align with the
goals, objectives, and activities of the learning program;
2. [Assessment] Authenticity (AA)—Students affirm that assessment tasks feature real-life
situations that are relevant to themselves as learners;
3. Student consultation (SC)—Students affirm that they are consulted and informed about the forms
of assessment tasks being employed;
4. [Assessment] Transparency (AT) –The purposes and forms of assessment tasks are affirmed by
the students as well-defined and are made clear; and
5. Accommodation to student diversity (ASD)—Students affirm they all have an equal chance of
completing assessment tasks (Cavanagh et al., 2005, p. 3).
Since the original instrument was only used to measure science assessment, we adapted and translated it to
correspond to other disciplines such as social science, agriculture, and humanities. The Dari/Farsi translation
of SPAQ is located in Appendix A. Students’ responses to the SPAQ were recorded on a 4-point Likert scale (4
= Strongly Agree to 1 = Strongly Disagree).
For the qualitative section of the study, we used a phenomenological approach to explore student and faculty
experiences of classroom assessment (Rossman & Rallis, 2016). Using a phenomenological approach in a
qualitative study is important in “understanding meaning, for participants in the study, of the events,
situations, and actions they are involved with, and of the accounts that they give of their lives and
experiences” (Maxwell, 2012, p. 8). The authors used two semi-structured interview protocols (one for
students and one for faculty) containing 19 open-ended questions to corroborate the results of the quantitative
data. Appendices B and C contain the interview protocols for faculty and students, respectively. These
protocols centered on four important themes of classroom assessment—methods, authenticity, transparency,
and the use of assessment outcomes to improve learning—that emerged from the literature on perceptions of
assessment.
Since the SPAQ and interview protocols were developed in English, one of the authors, fluent in English and
Dari, used a forward translation approach to translate the instruments into Dari/Farsi. The English and Dari
versions were shared with three experts who were fluent in both languages, and the translated versions were
revised based on their comments and suggestions. Then, the instruments were pilot tested among senior and
junior students and faculty members. The investigators conducted the survey and interviews once the
research participants confirmed that the questionnaire and interview protocols were understandable in the
local language.
Higher Learning Research Communications
28
Mussawy et al., 2021
Open Access
Validity and Reliability
Previous research confirmed the validity and reliability of SPAQ. For instance, Fisher et al. (2005) developed
SPAQ and confirmed its validity by applying it to a sample consisting of 1,000 participants from 40 science
classes in grades 8–10. Cavanagh et al. (2006) replicated the study and revised the instrument from 30 to 24
items. Dhindsa et al. (2007) administered the revised SPAQ with 1,028 Bruneian upper-secondary students.
They reported Cronbach’s alpha reliability (Cronbach, 1951) as “0.86” for 24 items, while it ranged from “0.64
to 0.77” for subscales (p. 1269). Similarly, Koul et al. (2006) applied the original 30-item instrument and
reported that Cronbach’s alpha reliability coefficient for SPAQ subscales ranged from 0.63 to 0.83. Lastly,
Mussawy (2009) administered the revised SPAQ at Baghlan Higher Education Institution in Afghanistan and
confirmed that the SPAQ was suitable for understanding student perceptions of assessment. Cronbach’s alpha
reliability coefficient in that study was 0.89 for all items (24), and it ranged from 0.61 to 0.76 for subscales.
Thus, validity and reliability of SPAQ have been confirmed in secondary and tertiary education settings. The
investigators used the triangulation technique to increase the study’s validity by collecting data from different
sources including the SPAQ, semi-structured interviews, and document analysis. Research methodologists,
including Maxwell (2012) and Rossman and Rallis (2016), support that by using triangulation, researchers
can reduce the risk of any chance combined with the data or covering only one aspect of the phenomenon that
can result when using one particular method. Further, the Cronbach’s alpha reliability coefficient was
calculated to determine the extent to which items in each subscale measure the same dimension of students’
perceptions of assessment.
Analysis
Descriptive analyses address the first research question about students’ overall perceptions about assessment
at the university. Two separate statistical analyses were performed to answer the research hypotheses testing
whether there are statistical differences in student perceptions of assessment by academic discipline and
gender. The investigators performed one-way, between-groups ANOVA to examine whether the difference
between students’ perceptions of assessment was statistically significant based on colleges/disciplines. Next,
we conducted a t-test to analyze the difference in students’ perceptions of assessment based on gender.
To analyze the qualitative data, initially, the interviews were transcribed and translated into English. Next, the
authors organized the data, reviewed it for accuracy, and cross-checked the original translation to ensure the
meanings were consistent (Marshall & Rossman, 2016). Then, the authors applied accepted analysis practices
such as “immersion in the data, generating case summaries and possible categories and themes, coding the
data, offering interpretations through analytic memos, search for alternative understanding, and writing the
report” to analyze the data inductively (Marshall & Rossman, 2016, p. 217). We used NVivo 12 to code the
data, run queries, and observe overlaps/connections among themes. The process, overall, was very interactive
as the authors exchanged perspectives by writing analytic memos and reflections to draw connections between
the qualitative themes and to corroborate the quantitative results (Marshall & Rossman, 2016). In short, the
qualitative analysis focused on the meaningfulness of classroom assessment based on lived experiences
(Rossman & Rallis, 2016).
Results
Quantitative
The Cronbach alpha reliability coefficient for all items in SPAQ was α = 0.89, suggesting strong internal
consistency. Among the subcales within SPAQ, Transparency had the highest alpha reliability score of α =
0.75, and Congruence with Planned Learning had the lowest α = 0.64. The instrument reliability for subscales is consistent with previous research (see Dhindsa et al., 2007; Koul et al., 2006; Mussawy, 2009). Given
Higher Learning Research Communications
29
Mussawy et al., 2021
Open Access
that the alpha reliability results for the subscales of SPAQ were consistently above 0.63, according to Cortina
(1993), the use of SPAQ was considered reliable (See Table 1).
The descriptive statistics show mean scores ranging from M = 2.99 for the sub-scales Accommodation to
Student Diversity to M = 3.30 for Congruence with Planned Learning on a 4-point Likert scale (4 = strongly
agree—1 = strongly disagree). The high mean scores suggest that students have a very positive perception of
classroom assessment. Table 1 provides an illustration of sub-scales mean scores, standard deviations, and
Cronbach alpha reliability.
Table 1. Sub-Scale Mean, Standard Deviation, and Cronbach Alpha Reliability Coefficient for the SPAQ and
its Subscales
SPAQ Scales
Mean
St. Dev
Alpha Reliability
Congruence with planned learning
3.30
.506
.644
Assessment authenticity
3.19
.540
.694
Student consultation
3.09
.690
.732
Assessment transparency
3.18
.652
.749
Accommodation to student diversity
2.99
.710
.698
Overall
3.16
.484
.898
The descriptive statistics associated with students’ perceptions of classroom assessment across three colleges
are reported in Table 2. The results show that participants from the College of Humanities were associated
with the smallest mean value (M = 3.05, SD =.467); participants from the College of Education were
associated with the highest mean value (M = 3.28, SD = .499); and participants from the College of
Agriculture were in between (M = 3.19, SD = .397). A one-way, between-groups ANOVA was performed to test
the hypothesis that college was associated with perceptions of classroom assessment. The assumption of
homogeneity of variance was tested and satisfied based on Levene’s test, F(2, 350) = .59, p = .55.
Table 2. Average Scale-Item Mean, Average Item Standard Deviation, and Standard Error Results for
College Level Differences in SPAQ Overall Scores
95% Confidence Interval for
Mean
N
M
SD
Std.
Error
Lower
Bound
Upper
Bound
Education
142
3.28
.499
.041
3.20
3.37
Humanities
171
3.05
.467
.035
2.98
3.12
Agriculture
40
3.19
.397
.062
3.06
3.31
Total
353
3.16
.484
.025
3.11
3.21
Colleges
The independent between-groups ANOVA was statistically significant, F(2, 350) = 9.45, p = .000, η2 = .058.
Thus, the null hypothesis of no difference between the mean scores was rejected, and 5.8% of variance was
accounted for in the college group. To analyze the differences between the mean scores of the three colleges,
we used Fisher’s LSD post-hoc tests. The difference between students’ perceptions from the College of
Education and the College of Humanities was statistically significant across Congruence with Planned
Learning, Assessment Authenticity, Student Consultation, and Accommodation to Student Diversity
subscales. The difference between student perceptions from the Colleges of Education and Agriculture was
Higher Learning Research Communications
30
Mussawy et al., 2021
Open Access
only statistically significant for the Accommodations to Student Diversity subscale. Finally, the difference
between students’ perceptions of assessment from the Colleges of Agriculture and Humanities was not
statistically significant across all scales. See Table 3 for further information on means and probability values.
Table 3. Average Scale-Item Mean, Average Item Standard Deviation, and ANOVA Results for
College Differences in SPAQ Scale Scores
Education
Humanities
Agriculture
p values
M
SD
M
SD
M
SD
Education
versus
Humanities
CLP
3.38
.461
3.15
.559
3.41
.516
.003
.800
.030
AA
3.26
.602
3.04
.540
3.20
.571
.005
.494
.260
SC
3.32
.604
2.97
.707
3.22
.457
.000
.220
.108
AT
3.25
.686
3.11
.639
3.19
.558
.059
.816
.325
ASD
3.21
.621
2.85
.706
2.94
.572
.000
.009
.467
Scale
Education
versus
Agriculture
Agriculture
versus
Humanities
Lastly, an independent sample t-test was performed to determine if the mean scores between male (N = 258)
and female (N = 95) students were statistically different. The assumption of homogeneity of variances was
tested and satisfied via Levene’s test, F(351) = .551, p = .458. The independent samples t-test was not
associated with a statistically significant effect, t(351) = -1.34, p = .17. This suggests that the difference
between students’ perceptions of assessment based on gender was not statistically significant, and the null
hypothesis was retained.
Qualitative Section
The qualitative results generated insights about important aspects of classroom assessment. Both students
and faculty commented that the existing classroom assessment policies and practices favor exams, which
center on summative assessment approaches. However, most faculty members reported that they implement
both formative and summative assessment. Three themes emerged from the interviews with faculty and
students: Improvement in pedagogy and assessment; student involvement in assessment processes; and
assessment forms versus the grading policy. Findings suggest that awareness about different forms of
assessment is high among the faculty. In addition, both students and faculty reported student involvement in
assessment processes at some level. Further, all participants highlighted the restriction of the grading policy
as an important challenge for faculty members to institutionalize alternative assessment approaches in
addition to the existing high stakes assessment and for students to buy into assessment activities that are not
tied to their grades.
Higher Learning Research Communications
31
Mussawy et al., 2021
Open Access
Improvements in Pedagogy and Assessment Skills
Most faculty indicated substantial growth in teaching and assessment competencies due to exposure to
modern pedagogies provided at the national and institutional levels. A faculty member explained that
universities in Afghanistan follow a cascade model of professional development for faculty. She added that the
university has a team of experts facilitating training sessions on “outcome-based learning” and “studentcentered instruction.” Another faculty member supported that the training sessions covered different
assessment approaches. He explained, “I feel confident facilitating student-driven lessons and developing
different assessment forms to assess my students.” Similarly, a junior faculty reported that she had learned
ways to create “individualized and collaborative assessment tasks.” For these participants, professional
development programs facilitated by the quality assurance office have increased their assessment literacy.
While the faculty participants noted improvements in their assessment skills, many students criticized them for
failing to design assessment tasks that matched individual student capabilities. “My classmates come from
different geographies where access to schools is limited. They have different learning abilities, but assignments
and exams are the same for everyone,” said a senior student from the College of Education. He added that not
everyone has the same learning style, suggesting that faculty members should pay attention to the individualized
needs of students. Nonetheless, students acknowledged assessment transparency and the recurrence of daily
assessment during instruction. A senior student described that their “exams consist of simple, medium, and
difficult questions.” Nevertheless, a few students were skeptical about merit-based assessment, noting that final
exams are sometimes politicized to promote one student group over another. While participants avoided
providing specific details, the recent example flags assessment ethics centered on “fairness and equity” as
teachers make judgments about student learning (Klenowski & Wyatt-Smith, 2014, p. 7).
Student Involvement in Assessment Processes
Most of the faculty members who participated in the interviews expressed reluctance to involve the student in
assessment tasks, particularly when grading a student’s work. Nevertheless, they were open to the idea of
having students review their peers’ work and provide constructive feedback. One faculty member stated that
he often encouraged students to make oral comments when their peers presented their projects, but he never
asked them to provide written feedback. Other faculty members also recalled instances when they worked
with students to solve a problem or discuss applying concepts and theories in practice. For these faculty,
assessment and teaching are “inseparable.” For instance, according to one faculty who was teaching writing
courses, providing opportunities for students to ask questions and reflect on the lesson was central to her
teaching philosophy. She went on to explain, “I usually provide lengthy feedback on students’ papers by
explaining the strengths, weaknesses, and ways to improve them.” The faculty member, nonetheless,
acknowledged that she had never shared her assessment rubric with students.
Student engagement in assessment tasks only occurred in informal settings. Students explained that the
faculty usually involve students in assessment when the subject requires them to conduct fieldwork and share
their findings with the class. More precisely, a junior student said, “When we present the findings of our
fieldwork, our classmates can ask questions or make comments about the presentation.” He went on to say
that a few faculty members had specific policies, for example, choosing referees among students to make
judgments about student presentations. Another junior stated, “I felt much empowered when it was my turn
to evaluate other students’ presentations one day.” He added, “I was a little nervous but so excited to serve as
a referee.” However, a few students complained about the purpose of peer assessment when there is no
guideline from the instructor. According to a sophomore, “The faculty members should establish the
grounding rules when they let students ask questions and assess the presentations. Some students ask difficult
questions to challenge their classmates.” While many of the students highlighted the importance of student
involvement in assessment processes, the last example informs the role of faculty members in managing
assessment.
Higher Learning Research Communications
32
Mussawy et al., 2021
Open Access
Given that classroom assessment occurs at different intervals, several faculty members complained about the
lack of student preparedness for post-secondary education. They criticized secondary schools for failing to
prepare students with adequate knowledge and skills to pursue undergraduate programs. For instance, a
faculty member who facilitated a freshman course on academic writing described her experience: “Students
barely know how to write. I had to revisit my course syllabus to meet their needs.” For this participant and
s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment