Resident
Evaluation and Feedback
MAJ John Heflin, MC, USA
(edited for web page by CDR DL
Hufford, MC USN, 03 JUL 99)
Faculty Development Fellowship
Madigan Army Medical Center
Introduction:
Goal of Residency Training is to produce clinically competent graduates
Clinical competence encompasses knowledge base, interviewing
and interpersonal skills, physical examination skills, problem-solving
abilities, technical skills, and documentation in the medical record.
Evaluation and Feedback are tools to improve the resident and the residency
program
- A comprehensive program
should evaluate the residents and the residency program and provide
continuous feedback to the residents, their teachers and the program
directors
- Numerous evaluation measures
are required to reliably assess resident performance.
- Formative and Summative
evaluation should be performed
- Trainees must be observed by
evaluators
Clearly outlined goals and objectives are required and must be understood by
the residents and faculty
Types of Evaluation
Formative Evaluation – occurs on an ongoing basis with feedback used
to shape daily behavior
- Best format for efforts to
improve a resident
Summative Evaluation – usually done at the end of an experience to
determine whether a certain standard has been achieved (i.e. promote a
resident, graduate a resident, etc.)
Evaluation Formats
Written Evaluation Forms – completed by the faculty monthly or upon
completion of rotations
- Questionable reliability and
validity
- Inflation of
performance and reluctance to identify performance deficits
- Halo effect where a
strong favorable impression about one skill obscures accurate evaluation
in other skills
- Only the
exceptionally weak resident is given a low rating.7
- Too often use global
or vague descriptions of desired performance
- Each rater has his or
her own definition of performance and reliability of the resulting
measure is low
- Confirmatory strategy
where information that confirms the initial impression is considered more
favorably than information that contradicts the impression
- Questionable ability to
predict performance on the American Board Certifying Exams
- Standardization of the scores
helps to improve the reliability and validity
- All scores are entered
into a database and adjusted for the mean score of the rater
- With standardized
scores, the written evaluations had > 70% predictive ability for
identifying "problem" and "superior" surgical
residents.8
Critical Incident Technique
A widely accepted method of job analysis from
personnel psychology adapted to medicine. Each faculty physician is interviewed
by a psychologist trained in the critical incident technique. The staff
describe incidents of effective and ineffective behavior among residents in the
past year. Effective behaviors are defined as those that faculty physicians
wished all residents would emulate and ineffective behaviors are those which
would call into question the competence of the resident. For each incident, the
interviewer probes with standard questions to obtain a description of the
context and outcome of the behavior. All of the incidents are independently
reviewed by two faculty physicians and grouped into mutually exclusive
categories of behavior. A consensus is reached on the categories of behavior
and an attempt is made to sort the incidents into these categories. A second
revision of the categories is then undertaken with assistance from the
psychologist. A final set of mutually exclusive categories of behavior with
specific definitions of the behaviors for each category is produced.
- Concretely defines
the specific behaviors and attitudes of effective and ineffective
resident performance
- Performance
dimensions are based on actual resident behaviors which are observed
- High interrater
reliability for matching a behavioral incident with a category
- The evaluation forms
are specific to the residency which developed them, i.e. one developed
for pediatrics is probably not suitable for surgical residents
- Only 20-30% of the
critical incidents tend to be cognitive (such as integration of knowledge
or technical skills).5,6
- Labor intensive to
develop
Studies using the Critical Incident Technique to
develop written evaluations
University of Iowa Obstetrics and Gynecology
Residency5
- Nine categories of
behavior each with specific definitions of effective and ineffective
performance
- Conscientiousness,
recognition of limits, confidence in skills and training, ability to
handle crisis/emergency situations, integration of knowledge with practice,
technical skills, relationships with staff, relationships with patients,
ethical actions
- High intra and
interrater reliability for sorting the incidents into the categories
University of Iowa Pediatrics Residency6
- Seven categories of
behavior each with specific definitions of effective and ineffective
performance
- Commitment to
learning, clinical judgment, communicating medical information,
recognition of limits, professional behavior, interpersonal skills with
patients, dealing with emergency situations
- High interrater
reliability for sorting the incidents into the categories
- Although the
categories are not identical to the OB/Gyn categories, there is
considerable overlap in the defining behaviors
Resident Evaluation of the Rotations
- Content of the
rotation to the extent that the goals and objectives were fulfilled
- Subjective thoughts
with at least one suggestion on how to make the rotation a better
educational experience
- Can be submitted as a
quarterly compilation to help maintain anonymity
Standardized Multiple Choice Tests – Examinations such as the
In-Training Examination (ITE) or the American Board Certifying Examination
- Very reliable but only
evaluates knowledge base
Objective Structured Clinical Examination (OSCE) – First described in
1975 for use with medical students and subsequently adapted for residents. A
multistation examination using real or simulated patients which evaluates
clinical skills, attitudes and cognitive abilities. Each station lasts 5-6
minutes and the residents are observed evaluating patients or are queried about
a diagnosis and management. The patient examination stations evaluate the
resident’s interpersonal skills, history and physical skills, and diagnostic
skills. The diagnosis and management stations evaluate the resident’s knowledge
base and problem-solving ability. Grading is performed at each station with a
predetermined answer checklist which the teaching faculty develops. This list
contains events which should occur and which should be avoided in order for competent
care to be provided. The faculty also sets the percentage of correct events
required at a each station to receive a passing grade.
- Identifies information about
the resident’s clinical skills distinct from that provided by faculty
rotation evaluations or standardized multiple choice tests.1,2,3
- Demonstrated reliability and
validity for assessing residents’ clinical performance
- Predictive ability
for performance on the In-Training Examinations and the written rotation
evaluations.
- Labor and time intensive and
requires some expertise
- Canada requires satisfactory
completion of the OSCE as a licensing requirement
Studies of the OSCE with Residents
Kentucky surgical residents1 and
surgical interns2
- Highly reliable and
valid (content, construct, and concurrent) for assessing clinical
performance.
- The OSCE scores
correlated (r = 0.66) with the surgical In-Training Examination (ITE)
scores.
- Greatest correlation
(r = 0.8) occurred with the clinical subsection of the ITE while lower
correlation (r = 0.4) exhibited with the basic science subsection.1
- The OSCE scores
correlated (r = 0.8) with level of training with significant differences
between levels of training (senior residents scored the highest)1
- The interns’ OSCE
scores correlated with the ITE scores (r = 0.5) and monthly written
evaluations (r = 0.48) but the written evaluations and ITE scores did not
correlate with each other.2
- Deficient
performance identified in 8 and 9 of 22 interns with the OSCE and ITE
but in only one with the written evaluations.
Mayo Clinic Second Year Internal Medicine
residents.3
- High intra and
interrater reliability for assessing clinical performance.
- The OSCE scores
correlated with the medicine In-Training Examination scores (r = 0.3) and
the clinical rotation evaluations (r = 0.4)
- No differences
attributable to gender or foreign medical graduate status
Michigan Pediatric Residents.4
- Highly reliable and
valid (content, construct, and concurrent)
- The OSCE scores
correlated with the pediatrics In-Training Examination scores (r = 0.59)
and the monthly written rotation evaluations (r = 0.39)
- Significant
differences between levels of training (senior residents score highest)
- Significant gap
between resident performance on the OSCE and faculty expectations (>
40% of the residents below the pass level)
- Higher failure rate
with senior residents (96% fail) despite increased scores due to greater
faculty expectations
- Critical appraisal
revealed excessive reliance on written evaluations without direct
observation of clinical skills or assessing the attainment of
educational objectives. Residency program changed to a competency based
curriculum with specific observable behaviors which can be tested.
Videotape or Simulated Patients – Low reliability and validity unless
graded with a scheme such as used in the OSCE. Recommend the OSCE over this
method
Chart Audits – The charts of the residents are periodically reviewed
to evaluate thought processes and documentation skills
- Not a reliable indicator of
the quality of the exam or care provided.7,10
- Poor records do
suggest sloppy medical practice
- Can simultaneously satisfy
Quality Assurance, JCAHO or HCFA requirements
Study of outpatient physical exams in Ohio internal
medicine residents.10
(documentation of the performance of physical exam
components compared to patient interviews about the completion of these
components)
- Average agreement of
completion or non-completion was 81%
- 7.1% of the time the
patient reported completion of the exam component but it was not
documented
- 12.4% of the time
the patient denied completion of the exam component but the chart
indicated completion
- The average composite
completeness score was 64% from the patient interviews and 65% from the
chart reviews with a significant correlation (r = .75) between the two
- The correlation
increases to .89 with corrections for measurement error
- No correlation
between the completeness scores and the residents’ longitudinal written
evaluations, In-Training Exam scores, resident age, resident gender,
year of the resident, patient age, patient gender, patient race
Procedure Logs – Useful for documenting a residents experience with
various procedures
- Numbers of each procedure
do not insure competent performance
- Better measure of
resident performance occurs when staff or senior residents observe the
procedure and confirm competence utilizing a performance checklist.7,9
- Technical skills alone are
not sufficient to guarantee competence with procedures
- A written syllabus
which teaches the indications, contraindications, and interpretation of
the various procedures should be incorporated in the training. The
residents knowledge of this material should be assessed.
- When procedural
training was emphasized for internal medicine residents in Hawaii
without the use of a syllabus, there was no improvement over 4 years in
the test scores pertaining to the indications, contraindications and
interpretation of procedures.9
Nursing and Ancillary Staff – Input is sought from the nurses, social
workers, etc. on the interpersonal skills and attitudes of the resident
- Measures of professional
competence should not be sought
- Most residents don’t
believe that nurses are qualified to assess their humanistic behavior.11
- High reliability and
validity is achievable with a standardized evaluation form.11
- Nurses’ ratings were
significantly correlated with those of the attending physicians (r = .38)
and a housestaff evaluation committee (r = .49) while the correlation
between the attendings and housestaff evaluation committee was even
higher (r = .63)
Feedback
Definition - An objective description of performance which is shared
with the person who was evaluated with the intention of guiding further
performance. It should not be judgmental and should provide suggestions on how
to improve future performance.
Giving Feedback
- Requires a trusting
relationship between the teacher and the learner
- Confirm that your motivation
is to improve future performance
- Prepare with review of the
language you will use
- Avoid using
"you" or "your" as it may be perceived that the
resident is being criticized and not the action
- Ask for the learner’s
assessment
- Establish consensus on what
the learner did well and what can be improved
- Verify that the learner
understands and agrees with the suggestions for improvement
- Negative feedback should be
coupled with concern and/or praise for the person
- "Feedback
Sandwich": Start with a positive aspect of performance, Then the
negative point, Closed with suggestions on how to improve
Characteristics of effective feedback7
- Timely
- The longer feedback
is delayed the less effective it will be in changing behavior
- Formative feedback
is the most effective in changing behavior
- Constructive with the
intention of improving performance
- Focused on specific
observable behavior
- Limited to one or two
points at a sitting
- Preservation of an
individuals dignity and self esteem
- Give negative
feedback in private
- Assure
confidentiality
- Residents and faculty must
see the system as useful in contributing to growth of the resident or
improvement of the residency program
References
- Sloan DA, Donnelly MB,
Schwartz RW, Strodel WE, "The Objective Structured Clinical
Examination – The New Gold Standard for Evaluating Postgraduate Clinical
Performance," Ann of Surgery, Vol 222(6), 1995, pp. 735-42.
- Schwartz RW, Donnelly MB,
Sloan DA, et al., "The Relationship Between Faculty Ward Evaluations,
OSCE, and ABSITE as Measures of Surgical Intern Performance," Am J of
Surgery, Vol 169, Apr. 1995, pp. 414-7.
- Dupras DM, Li JT, "Use
of an Objective Structured Clinical Examination to Determine Clinical
Competence," Acad Med, Vol 70(11), Nov. 1995, pp. 1029-34.
- Joorabchi B, Devries JM,
"Evaluation of Clinical Competence: The Gap Between Expectation and
Performance," Pediatrics, Vol 97(2), Feb 1996, pp.179-84.
- Altmaier EM, Johnson SR,
Tarico VS, et al., "An Empirical Specification of Residency
Performance Dimensions," Obstetrics and Gynecology, Vol 72(1), Jul.
1988, pp. 126-9.
- Altmaier EM, McGuinness G,
Wood P, et al., "Defining Successful Performance Among Pediatric
Residents," Pediatrics, Vol 85(2), Feb 1990, pp. 139-43.
- Quattlebaum TG,
"Techniques for Evaluating Residents and Residency Programs,"
Pediatrics, Vol 98(6), Dec 1996, pp. 1277-83.
- Schueneman AL, Carley JP,
Baker WH, "Residency Evaluations – Are They Worth the Effort?"
Arch Surg, Vol 129, Oct. 1994, pp. 1067-73.
- Bruce NC, "Evaluation
of Procedural Skills of Internal Medicine Residents," Acad Med, Vol
64, Apr. 1989, pp. 213-6.
- Ognibene AJ, Jarjoura DG,
Illera VA, et al., "Using Chart Reviews to Assess Residents’
Performances of Components of Physical Examinations: A Pilot Study,"
Acad Med, Vol 69(7), Jul. 1994, pp. 583-7.
- Butterfield PA, Mazzaferri
EL, "A New Rating Form for Use by Nurses in Assessing Residents’
Humanistic Behavior," J Gen Intern Med, Vol 6, Mar/Apr. 1991, pp.
155-61.

Return to Resource Center | Teaching Skills