Feedback and Evaluation Abstract Bibliography

More... Share to Twitter Share to Facebook
Feedback and Evaluation Abstract Bibliography

Interest Group Resources:

(Please note: In order to view the resources below, you must be logged into eCommons in a separate tab in the same browser window in order to view).

Instructions to view password protected resources:
Press CTRL+T to open a new tab
Log into eCommons in that new tab
Return to the Academy website tab and click on the resource you wish to view

If you are logged in as instructed, the article/ resource will pop up. If you are not logged in as instructed, you will receive an error message.


 Baker, Keith. A Paradigm Shift in GME: Evidence and Principles from Cognitive Science Will Bring Change to How we Teach and Learn. 2009.

 Baker, Keith. Determining Resident Clinical Performance: Getting Beyond the Noise. Anaesthesiology. 2011. 115:1-1.

BACKGROUND: Valid and reliable (dependable) assessment of resident clinical skills is essential for learning, promotion, and remediation. Competency is defined as what a physician can do, whereas performance is what a physician does in everyday practice. There is an ongoing need for valid and reliable measures of resident clinical performance.

METHODS: Anesthesia residents were evaluated confidentially on a weekly basis by faculty members who supervised them. The electronic evaluation form had five sections, including a rating section for absolute and relative-to-peers performance under each of the six Accreditation Council for Graduate Medical Education core competencies, clinical competency committee questions, rater confidence in having the resident perform cases of increasing difficulty, and comment sections. Residents and their faculty mentors were provided with the resident's formative comments on a biweekly basis.

RESULTS: From July 2008 to June 2010, 140 faculty members returned 14,469 evaluations on 108 residents. Faculty scores were pervasively positively biased and affected by idiosyncratic score range usage. These effects were eliminated by normalizing each performance score to the unique scoring characteristics of each faculty member (Z-scores). Individual Z-scores had low amounts of performance information, but signal averaging allowed determination of reliable performance scores. Average Z-scores were stable over time, related to external measures of medical knowledge, identified residents referred to the clinical competency committee, and increased when performance improved because of an intervention.

CONCLUSIONS: This study demonstrates a reliable and valid clinical performance assessment system for residents at all levels of training.


 Bing-You, RG. Paterson, J. Feedback Falling on Deaf Ears: Residents' Receptivity to Feedback Tempered by Sender Credibility. Med Teach 1997;19:40-4.

We interviewed internal medicine residents to characterize their perceptions of effective feedback. These semi-structured interviews also explored aspects of the person sending the feedback which might cause residents to discount or disbelieve the information. Well-timed, private and verbal feedback that fostered development of an action plan are examples of residents' perceptions of effective feedback. Sender credibility, and subsequent resident receptivity to feedback, was influenced by the method of feedback delivery, the content of the feedback and the residents' perceptions of sender characteristics, and their observation of sender behavior. These qualitative results may help to develop initial hypotheses and frame further investigations optimizing the reception of feedback by residents.


 Blackwell, L., Trzesniewski, K., and Dweck, C.S. 2007. Implicit Theories of Intelligence Predict Achievement Across an Adolescent Transition: A Longitudinal Study and an Intervention. Child Development 78 (1): 246-263.

Two studies explored the role of implicit theories of intelligence in adolescents' mathematics achievement. In Study 1 with 373 7th graders, the belief that intelligence is malleable (incremental theory) predicted an upward trajectory in grades over the two years of junior high school, while a belief that intelligence is fixed (entity theory) predicted a flat trajectory. A mediational model including learning goals, positive beliefs about effort, and causal attributions and strategies was tested. In Study 2, an intervention teaching an incremental theory to 7th graders (N548) promoted positive change in classroom motivation, compared with a control group (N543). Simultaneously, students in the control group displayed a continuing downward trajectory in grades, while this decline was reversed for students in the experimental group.


 Brosvic, Gary M. "Acquisition and Retention of Esperanto: The Case for Error Correction and Immediate Feedback." The Psychological Record. 2006;56.

This article combines laboratory and classroom research methods to examine the effectiveness of immediate versus delayed feedback. It is important to know, for example, whether students leaving a final exam without knowing the right answers retain misinformation. The authors gave feedback various ways: immediate (were allowed to respond until they answered correctly), end of test (30-minute review), and 24-hour delay; they tested 1 week, 3 & 6 months later. Retention at all times was best after immediate feedback when students were allowed to respond repeatedly.  Delayed feedback was better than no feedback. 


 Cavalcanti, Rodrigo., Detsky, Allan. The Education and Training of Future Physicians: Why Coaches Can't Be Judges. JAMA. 2011. 306;9.

A physician must be able to diagnose and treat patients. The clinical skills required to be successful include gathering data, differentiating important from unimportant facts, making decisions about further investigations and treatments, implementing therapy, and providing follow-up, education, and counseling. These skills cannot be learned through reading or in classrooms alone; practical experience is required. The present method of exposing physicians-in-training to practical experience involves a hierarchical team approach with graded levels of responsibility whereby the decisions of the most junior members of the team are reviewed by physicians with more experience and seniority. These practical experiences impart content knowledge and also allow trainees to become comfortable with decision making and to learn the consequences of these decisions. Although there may be better ways to train future physicians, this apprenticeship method seems to work, as evidenced by the relatively low failure rate in medical schools and training programs.


 Dweck, Carol S., Mueller, Claudia M. "Praise for Intelligence Can Undermine Children's Motivation and Performance." Journal of Personality and Social Psychology. 1998, Vol. 75, No. 1, 33-52.

Praise for ability is commonly considered to have beneficial effects on motivation. Contrary to this popular belief, six studies demonstrated that praise for intelligence had more negative consequences for students' achievement motivation than praise for effort. Fifth graders praised for intelligence were found to care more about performance goals relative to learning goals than children praised for effort. After failure, they also displayed less task persistence, less task enjoyment, more low-ability attributions, and worse task performance than children praised for effort. Finally, children praised for intelligence described it as a fixed trait more than children praised for hard work, who believed it to be subject to improvement. These findings have important implications for how achievement is best encouraged, as well as for more theoretical issues, such as the potential cost of performance goals and the socialization of contingent self-worth.


 Fidler H. Lockyer JM, Toews J, Vilato C. "Changing physicians' practices: the effect of individual feedback." Acad Med. 1999;74:14.

OBJECTIVE: To determine whether physicians who received feedback from six peers, six referring/referral physicians, six co-workers, and 25 patients about 55 aspects of their medical practices (e.g., able to reach doctor by phone after office hours) would make changes to their practices based on that feedback.

METHOD: In an earlier study, 308 physicians were given feedback about 106 aspects of their practices in the form of mean Likert-scale ratings that (1) the peers made on 26 aspects; (2) the referring/referral physicians made on 23 aspects; (3) the co-workers made on 17 aspects; and (4) the patients made on 40 aspects. Three months later 255 of these physicians responded when asked to indicate whether they had contemplated or initiated changes, or whether no change had been necessary, regarding 31 practice aspects, each of which was a summary of one or more of 55 of the original 106 aspects on which they had received ratings. These 55 were considered the aspects most amenable to change over a short period. The physicians were also asked about the educational interventions that they felt would help them make changes. Multivariate analysis of variance was used to see whether the types of changes reported for the specific aspects of practice were associated with the feedback ratings received for those aspects.

RESULTS: An examination of the responses showed that 83% of the 255 physicians reported having contemplated a change, and 66% reported having initiated a change for at least one aspect of practice. Changes were contemplated most frequently for aspects of practice associated with clinical skills and resource use. Changes were initiated most frequently for aspects of practice associated with communication with patients and support of patients. Physicians who contemplated or initiated changes had lower (i.e., more negative) mean ratings than did physicians who reported that no change was necessary, which suggests that the physicians did use their feedback ratings to decide about changes, although their qualitative comments indicated other sources as well. Printed material was chosen most often as a method of receiving continuing medical education related to making changes in the practice areas examined.


 Goldman, Stuart. "The Educational Kanban: Promoting Effective Self-Directed Adult Learning in Medical Education." Academic Medicine. 2009;84:7.

The author reviews the many forces that have driven contemporary medical education approaches to evaluation and places them in an adult learning theory context. After noting their strengths and limitations, the author looks to lessons learned from manufacturing on both efficacy and efficiency and explores how these can be applied to the process of trainee assessment in medical education.Building on this, the author describes the rationale for and development of the Educational Kanban (EK) at Children's Hospital Boston--specifically, how it was designed to integrate adult learning theory, Japanese manufacturing models, and educator observations into a unique form of teacher-student collaboration that allows for continuous improvement. It is a formative tool, built on the Accreditation Council for Graduate Medical Education's six core competencies, that guides educational efforts to optimize teaching and learning, promotes adult learner responsibility and efficacy, and takes advantage of the labor-intensive clinical educational setting. The author discusses how this model, which will be implemented in July 2009, will lead to training that is highly individualized, optimizes faculty and student educational efforts, and ultimately conserves faculty resources. A model EK is provided for general reference.The EK represents a novel approach to adult learning that will enhance educational effectiveness and efficiency and complement existing evaluative models. Described here in a specific graduate medical setting, it can readily be adapted and integrated into a wide range of undergraduate and graduate clinical educational environments.


 Grant, H., & Dweck, C. S. (2003). Clarifying achievement goals and their impact. Journal of Personality and Social Psychology, 85, 541-553.

The study of achievement goals has illuminated basic motivational processes, though controversy surrounds their nature and impact. In 5 studies, including a longitudinal study in a difficult premed course, the authors show that the impact of learning and performance goals depends on how they are operationalized. Active learning goals predicted active coping, sustained motivation, and higher achievement in the face of challenge. Among performance goals, ability-linked goals predicted withdrawal and poorer performance in the face of challenge (but provided a "boost" to performance when students met with success); normative goals did not predict decrements in motivation or performance; and outcome goals (wanting a good grade) were in fact equally related to learning goals and ability goals. Ways in which the findings address discrepancies in the literature are discussed.


 Hong, Y., Chiu, C., Dweck, C. S., Lin, D. M. and Wan, W. (1999). Implicit theories, attributions, and coping: A meaning system approach. Journal of Personality and Social Psychology, 77, 588-599.

This research sought to integrate C. S. Dweck and E. L. Leggett's (1988) model with attribution theory. Three studies tested the hypothesis that theories of intelligence—the belief that intelligence is malleable (incremental theory) versus fixed (entity theory)—would predict (and create) effort versus ability attributions, which would then mediate mastery-oriented coping. Study 1 revealed that, when given negative feedback, incremental theorists were more likely than entity theorists to attribute to effort. Studies 2 and 3 showed that incremental theorists were more likely than entity theorists to take remedial action if performance was unsatisfactory. Study 3, in which an entity or incremental theory was induced, showed that incremental theorists' remedial action was mediated by their effort attributions. These results suggest that implicit theories create the meaning framework in which attributions occur and are important for understanding motivation.


 Kamins, M., & Dweck, C. S. (1999). Person vs. process praise and criticism: Implications for contingent self-worth and coping. Developmental Psychology, 35, 835–847.

Conventional wisdom suggests that praising a child as a whole or praising his or her traits is beneficial. Two studies tested the hypothesis that both criticism and praise that conveyed person or trait judgments could send a message of contingent worth and undermine subsequent coping. In Study 1, 67 children (ages 5-6 years) role-played tasks involving a setback and received 1 of 3 forms of criticism after each task: person, outcome, or process criticism. In Study 2, 64 children role-played successful tasks and received either person, outcome, or process praise. In both studies, self-assessments, affect, and persistence were measured on a subsequent task involving a setback. Results indicated that children displayed significantly more "helpless" responses (including self-blame) on all dependent measures after person criticism or praise than after process criticism or praise. Thus person feedback, even when positive, can create vulnerability and a sense of contingent self-worth.


 Kluger, Avraham N. DeNisi, Angelo. "The Effects of Feedback Interventions on Performance: A Historical Review, a Meta-Analysis, and a Preliminary Feedback Intervention Theory."Psychological Bulletin. 1996, Vol. 119, No. 2, 254-284.

Since the beginning of the century, feedback interventions (FIs) produced negative--but largely ignored--effects on performance. A meta-analysis (607 effect sizes; 23,663 observations) suggests that FIs improved performance on average ( d = .41) but that over one-third of the FIs decreased performance. This finding cannot be explained by sampling error, feedback sign, or existing theories. The authors proposed a preliminary FI theory (FIT) and tested it with moderator analyses. The central assumption of FIT is that FIs change the locus of attention among 3 general and hierarchically organized levels of control: task learning, task motivation, and meta-tasks (including self-related) processes. The results suggest that FI effectiveness decreases as attention moves up the hierarchy closer to the self and away from the task. These findings are further moderated by task characteristics that are still poorly understood.


 Kogan, Jennifer R.; Holmboe, Eric S.; Hauer, Kristin R. "Skills of Medical Trainees: A Systematic Review: Tools for Direct Observation and Assessment of Clinical." JAMA. 2009;302(12):1316-1326

CONTEXT: Direct observation of medical trainees with actual patients is important for performance-based clinical skills assessment. Multiple tools for direct observation are available, but their characteristics and outcomes have not been compared systematically.

OBJECTIVES: To identify observation tools used to assess medical trainees' clinical skills with actual patients and to summarize the evidence of their validity and outcomes.

DATA SOURCES: Electronic literature search of PubMed, ERIC, CINAHL, and Web of Science for English-language articles published between 1965 and March 2009 and review of references from article bibliographies.

STUDY SELECTION: Included studies described a tool designed for direct observation of medical trainees' clinical skills with actual patients by educational supervisors. Tools used only in simulated settings or assessing surgical/procedural skills were excluded. Of 10 672 citations, 199 articles were reviewed and 85 met inclusion criteria.

DATA EXTRACTION: Two authors independently abstracted studies using a modified Best Evidence Medical Education coding form to inform judgment of key psychometric characteristics. Differences were reconciled by consensus.

RESULTS: A total of 55 tools were identified. Twenty-one tools were studied with students and 32 with residents or fellows. Two were used across the educational continuum. Most (n = 32) were developed for formative assessment. Rater training was described for 26 tools. Only 11 tools had validity evidence based on internal structure and relationship to other variables. Trainee or observer attitudes about the tool were the most commonly measured outcomes. Self-assessed changes in trainee knowledge, skills, or attitudes (n = 9) or objectively measured change in knowledge or skills (n = 5) were infrequently reported. The strongest validity evidence has been established for the Mini Clinical Evaluation Exercise (Mini-CEX).

CONCLUSION: Although many tools are available for the direct observation of clinical skills, validity evidence and description of educational outcomes are scarce.


Mangels, J. A., Butterfield, B., Lamb, J., Good, C. D., & Dweck, C. S. (2006). Why do beliefs about intelligence influence learning success? A social cognitive neuroscience model. Social Cognitive and Affective Neuroscience (SCAN).

Students’ beliefs and goals can powerfully influence their learning success. Those who believe intelligence is a fixed entity (entity theorists) tend to emphasize ‘performance goals,’ leaving them vulnerable to negative feedback and likely to disengage from challenging learning opportunities. In contrast, students who believe intelligence is malleable (incremental theorists) tend to emphasize ‘learning goals’ and rebound better from occasional failures. Guided by cognitive neuroscience models of top–down, goal-directed behavior, we use event-related potentials (ERPs) to understand how these beliefs influence attention to information associated with successful error correction. Focusing on waveforms associated with conflict detection and error correction in a test of general knowledge, we found evidence indicating that entity theorists oriented differently toward negative performance feedback, as indicated by an enhanced anterior frontal P3 that was also positively correlated with concerns about proving ability relative to others. Yet, following negative feedback, entity theorists demonstrated less sustained memory-related activity (left temporal negativity) to corrective information, suggesting reduced effortful conceptual encoding of this material–a strategic approach that may have contributed to their reduced error correction on a subsequent surprise retest. These results suggest that beliefs can influence learning success through top–down biasing of attention and conceptual processing toward goal-congruent information.


 Mann, Karen. Tensions in Informed Self-Assessment: How the Desire for Feedback and Reticence to Collect and Use It Can Conflict. American Medicine. 2011. 86;9.

PURPOSE: Informed self-assessment describes the set of processes through which individuals use external and internal data to generate an appraisal of their own abilities. The purpose of this project was to explore the tensions described by learners and professionals when informing their self-assessments of clinical performance.

METHOD: This 2008 qualitative study was guided by principles of grounded theory. Eight programs in five countries across undergraduate, postgraduate, and continuing medical education were purposively sampled. Seventeen focus groups were held (134 participants). Detailed analyses were conducted iteratively to understand themes and relationships.

RESULTS: Participants experienced multiple tensions in informed self-assessment. Three categories of tensions emerged: within people (e.g., wanting feedback, yet fearing disconfirming feedback), between people (e.g., providing genuine feedback yet wanting to preserve relationships), and in the learning/practice environment (e.g., engaging in authentic self-assessment activities versus "playing the evaluation game"). Tensions were ongoing, contextual, and dynamic; they prevailed across participant groups, infusing all components of informed self-assessment. They also were present in varied contexts and at all levels of learners and practicing physicians.

CONCLUSIONS: Multiple tensions, requiring ongoing negotiation and renegotiation, are inherent in informed self-assessment. Tensions are both intraindividual and interindividual and they are culturally situated, reflecting both professional and institutional influences. Social learning theories (social cognitive theory) and sociocultural theories of learning (situated learning and communities of practice) may inform our understanding and interpretation of the study findings. The findings suggest that educational interventions should be directed at individual, collective, and institutional cultural levels. Implications for practice are presented.


 Mueller, C.M., & Dweck, C.S. (1998). Intelligence praise can undermine motivation and performance. Journal of Personality and Social Psychology, 75, 33-52.

Praise for ability is commonly considered to have beneficial effects on motivation. Contrary to this popular belief, six studies demonstrated that praise for intelligence had more negative consequences for students' achievement motivation than praise for effort. Fifth graders praised for intelligence were found to care more about performance goals relative to learning goals than children praised for effort. After failure, they also displayed less task persistence, less task enjoyment, more lowability attributions, and worse task performance than children praised for effort. Finally, children praised for intelligence described it as a fixed trait more than children praised for hard work, who believed it to be subject to improvement. These findings have important implications for how achievement is best encouraged, as well as for more theoretical issues, such as the potential cost of performance goals and the socialization of contingent self-worth.


Molden, D.C., Plaks, J.E., & Dweck, C.S. (2006). “Meaningful” social inferences: Effects of implicit theories on inferential processes. Journal of Experimental Social Psychology, 42, 738-752.

Perceivers' shared theories about the social world have long featured prominently in social inference research. Here, we investigate how fundamental diVerences in such theories inXuence basic inferential processes. Past work has typically shown that integrating multiple interpretations of behavior during social inference requires cognitive resources. However, three studies that measured or manipulated people's beliefs about the stable versus dynamic nature of human attributes (i.e., their entity vs. incremental theory, respectively) qualify these past Wndings. Results revealed that, when interpreting others' actions, perceivers' theories selectively facilitate the consideration of interpretations that are especially theory-relevant. While experiencing cognitive load, entity theorists continued to incorporate information about stable dispositions (but not about dynamic social situations) in their social inferences, whereas incremental theorists continued to incorporate information about dynamic social situations (but not about stable traits). Implications of these results for how perceivers meaning in behavior are discussed.


 Plaks, J.E, Grant, H., & Dweck, C.S. (2005). Violations of implicit theories and the sense of prediction and control: Implications for motivated person perception. Journal of Personality and Social Psychology, 88, 245-262.

Beginning with the assumption that implicit theories of personality are crucial tools for understanding social behavior, the authors tested the hypothesis that perceivers would process person information that violated their predominant theory in a biased manner. Using an attentional probe paradigm (Experiment 1) and a recognition memory paradigm (Experiment 2), the authors presented entity theorists (who believe that human attributes are fixed) and incremental theorists (who believe that human attributes are malleable) with stereotype-relevant information about a target person that supported or violated their respective theory. Both groups of participants showed evidence of motivated, selective processing only with respect to theory-violating information. In Experiment 3, the authors found that after exposure to theory-violating information, participants felt greater anxiety and worked harder to reestablish their sense of prediction and control mastery. The authors discuss the epistemic functions of implicit theories of personality and the impact of violated assumptions.


Norcini JJ. "Peer assessment of competence." Med Educ 2003;37:539-43.

OBJECTIVE: This instalment in the series on professional assessment summarises how peers are used in the evaluation process and whether their judgements are reliable and valid.

METHOD: The nature of the judgements peers can make, the aspects of competence they can assess and the factors limiting the quality of the results are described with reference to the literature. The steps in implementation are also provided.

RESULTS: Peers are asked to make judgements about structured tasks or to provide their global impressions of colleagues. Judgements are gathered on whether certain actions were performed, the quality of those actions and/or their suitability for a particular purpose. Peers are used to assess virtually all aspects of professional competence, including technical and non-technical aspects of proficiency. Factors influencing the quality of those assessments are reliability, relationships, stakes and equivalence.

CONCLUSION: Given the broad range of ways peer evaluators can be used and the sizeable number of competencies they can be asked to judge, generalisations are difficult to derive and this form of assessment can be good or bad depending on how it is carried out.


 Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. "Use of peer ratings to evaluate physician performance." JAMA 1993;269:1655-60.

OBJECTIVE: To assess the feasibility and measurement characteristics of ratings completed by professional associates to evaluate the performance of practicing physicians.

DESIGN: The clinical performance of physicians was evaluated using written questionnaires mailed to professional associates (physicians and nurses). Physician-associates were randomly selected from lists provided by both the subjects and medical supervisors, and detailed information was collected concerning the professional and social relationships between the associate and the subject. Responses were analyzed to determine factors that affect ratings and measurement characteristics of peer ratings.

SETTING AND PARTICIPANTS: Physician-subjects were selected from among practicing internists in New York, New Jersey, and Pennsylvania who received American Board of Internal Medicine certification 5 to 15 years previously.

MAIN OUTCOME MEASURE: Physician performance as assessed by peers.

RESULTS: Peer ratings are not biased substantially by the method of selection of the peers or the relationship between the rater and the subject. Factor analyses suggest a two-dimensional conceptualization of clinical skills: one factor represents cognitive and clinical management skills and the other factor represents humanistic qualities and management of psychosocial aspects of illness. Ratings from 11 peer physicians are needed to provide a reliable assessment in these two areas.

CONCLUSIONS: These findings suggest that it is feasible to obtain assessments from professional associates of practicing physicians in areas such as clinical skills, humanistic qualities, and communication skills. Using a shorter version of the questionnaire used in this study, peer ratings provide a practical method to assess clinical performance in areas such as humanistic qualities and communication skills that are difficult to assess with other measures.


 Walker AG, Smither JW."A Five-Year Study of Upward Feedback: What Managers Do with their results matters." Personnell Psychol. 1999;52:393-423.

We present results for 252 target managers over 5 annual administrations of an upward feedback program (i.e., twice as long as any previous study in this area). We show that managers initially rated poor or moderate showed significant improvements in upward feedback ratings over the 5-year period, and that these improvements were beyond what could be expected due to regression to the mean. We also found that (a) managers who met with direct reports to discuss their upward feedback improved more than other managers, and (b) managers improved more in years when they discussed the previous year's feedback with direct reports than in years when they did not discuss the previous year's feedback with direct reports. This is important because it is the first research evidence demonstrating that what managers do with upward feedback is related to its benefits. We use an accountability framework to discuss our results and suggest directions for future research.

Instruments: (must be logged into eCommons in a separate tab in the same browser window to view)

 Keith Baker, MD PhD

 Jonathan Alpert, MD PhD: Collaborative Assessment Tool - for use in an Oral Exam/Case Presentation in the Psychiatry Clerkship that is conducted by faculty at a site other than the site to which a student is assigned.

 Jonathan Alpert, MD PhD: HMS Psychiatry Mid-Rotation Feedback Form - HMS Psychiatry Mid-Rotation Feedback Form: feedback instrument for mid-clerkship feedback discussion which is increasingly emphasized by the LCME

 David Topor, PhD: Measure developed for the psychiatry residents to rate their own competency using Cognitive Behavioral Therapy

Book Recommendations:

Holmboe, E. Hawkins, R. Practical Guide to the Evaluation of Clinical Competence. Mosby 2008.