Do Internal Medicine Programs Care About Scores? Unpacking the Impact on Patient Outcomes

Internal medicine residency programs employ a multifaceted approach to evaluate their trainees, with the ultimate goal of producing competent and effective physicians. Central to this evaluation are various forms of “scores”—ranging from subjective milestone ratings to objective certification examination results. But Do Internal Medicine Programs Care About Scores, and more importantly, how do these scores correlate with real-world physician performance, specifically patient outcomes? This question is paramount for medical educators, program directors, and aspiring physicians alike. Understanding the significance of different evaluation metrics is crucial for shaping effective training programs and ensuring the highest quality of patient care.

This article delves into the heart of this inquiry, examining a comprehensive study that investigated the relationship between internal medicine resident evaluations and patient outcomes. Specifically, we will explore whether milestone ratings, a relatively newer method of assessing resident competency, and the American Board of Internal Medicine (ABIM) certification examination scores, a long-standing benchmark, are truly indicative of a physician’s ability to deliver superior patient care. The findings of this research shed light on the varying degrees of importance these “scores” hold and their implications for the future of internal medicine training.

Milestone Ratings vs. Certification Exams: Two Paths to Competency Assessment

For graduates of internal medicine residency programs in the United States, competency evaluation traditionally hinges on two distinct methodologies: the Accreditation Council for Graduate Medical Education (ACGME)/American Board of Medical Specialties (ABMS) milestone assessments and the ABIM internal medicine certification examination. Understanding the nuances of each is key to appreciating their role in shaping physician training and their perceived importance by internal medicine programs.

Milestone ratings were introduced in 2013 as a structured and standardized system for assessing a resident’s progression toward competency throughout their training. This initiative arose from recognized limitations in previous evaluation methods, which often lacked clear performance expectations, relied heavily on program director subjectivity, and lacked standardized criteria to track and foster competency development during residency. The milestone framework revolves around six core competencies: patient care, medical knowledge, practice-based learning and improvement, systems-based practice, interpersonal and communication skills, and professionalism. Each competency is further broken down into subcompetencies, providing a detailed roadmap for resident development and continuous feedback throughout their training.

In contrast, the ABIM internal medicine board certification examination serves as a summative assessment of clinical judgment grounded in medical knowledge. Taken upon completion of residency, this exam diverges from milestone ratings by not incorporating direct observations of residents in patient care settings. Instead, the exam relies on meticulously crafted questions, developed by diverse panels of practicing physicians, designed to mirror real-world clinical scenarios requiring critical decision-making in patient management.

Given these fundamental differences, a crucial question arises: do internal medicine programs care about scores from both evaluations equally? And, more importantly, do these scores effectively predict how physicians perform in practice, particularly concerning patient outcomes? This study seeks to address this gap by analyzing the relationship between both milestone ratings and ABIM certification examination scores with real-world hospital outcomes for patients treated by newly trained internists working as hospitalists.

Investigating the Link Between Physician Evaluation and Hospital Outcomes: Methodology

To rigorously examine the association between physician competency assessments and patient outcomes, a retrospective cohort study was conducted. The study focused on a cohort of 6,898 hospitalists who completed their internal medicine residency training between 2016 and 2018. These physicians were then tracked as they cared for Medicare fee-for-service beneficiaries hospitalized between 2017 and 2019 across various US hospitals.

The researchers meticulously constructed their study sample, starting with internal medicine residents who completed their training and had both valid milestone ratings and ABIM certification examination scores linked to their National Provider Identifiers (NPIs). They then identified Medicare beneficiaries over 65 years old who were hospitalized and treated by these physicians. To ensure the focus was on post-residency performance, the study excluded hospitalizations where physicians were still in residency or had commenced fellowship training. Furthermore, to minimize bias from non-random physician assignment, elective hospitalizations and patients in hospice care at admission were excluded.

Physician attribution to hospitalizations was carefully determined. A physician was assigned to a hospitalization if they had the majority of inpatient evaluation and management encounters during the patient’s stay among generalist physicians. In cases of ties, the physician with the highest evaluation and management charges was attributed. The study further refined the sample to include hospitalizations in acute short-stay hospitals with at least 100 beds and categorized under 25 common diagnosis-related groups (DRGs), ensuring a focus on comparable patient populations and hospital settings. Finally, to ensure the care provided by the attributed physician had a significant impact on outcomes, only hospitalizations where the physician provided care within the first three days of admission were included.

The primary outcomes measured were 7-day post-admission mortality and 7-day post-discharge readmission rates. This shorter timeframe was chosen to reflect the direct impact of care processes during hospitalization, minimizing the influence of pre-existing conditions or social determinants of health. Secondary outcomes included 30-day mortality and readmission rates, length of hospital stay, and the frequency of subspecialist consultations.

Physician competency was assessed using two key measures: milestone ratings and certification examination scores. Milestone ratings were based on the 22 subcompetencies of the original milestone program, evaluated at the end of residency. These ratings were converted to a numerical scale for analysis. An overall core competency rating was calculated as the mean of the six core competency mean ratings, categorized as low, medium, or high. Similarly, a knowledge core competency rating was derived and categorized. Certification examination scores were represented by yearly quartiles based on the scores of all physicians taking the ABIM internal medicine certification exam for the first time in a given year. These scores were standardized to account for variations in exam difficulty.

To account for potential confounding factors, the statistical analysis incorporated hospital fixed effects, effectively comparing outcomes among physicians within the same hospital. The analysis also adjusted for patient characteristics (age, sex, race, ethnicity, comorbidities, income), physician experience, and year of hospitalization. This rigorous methodology aimed to isolate the specific association between physician competency assessments and patient outcomes, providing robust evidence to answer the question: do internal medicine programs care about scores, and should they?

![Figure 1. Cohort Development of Physicians and Hospitalizations in a Study of the Relationship Between Milestones, Board Certification Examinations, and Patient Outcomes.](http://carcodereader.store/wp-content/uploads/2025/02/jama-e245268-g001.jpg){width=780 height=1086}

Alt text: Cohort development diagram illustrating patient and physician inclusion and exclusion criteria for study analyzing the relationship between internal medicine milestone ratings, board certification exams, and patient outcomes, detailing steps from initial physician and hospitalization pools to final study sample.

Key Findings: Certification Scores Matter More for Patient Outcomes

The study’s findings revealed a striking disparity in the predictive power of milestone ratings and certification examination scores concerning patient outcomes. Notably, milestone ratings, whether overall core competency or specifically knowledge-based, showed no statistically significant association with any of the hospital outcome measures assessed. This means that patients treated by hospitalists rated as “high” in core competencies did not experience demonstrably better outcomes (mortality, readmission, length of stay) compared to those treated by hospitalists rated as “low.” In fact, the study observed a non-significant increase in 7-day mortality rates for patients treated by physicians with high overall core competency ratings compared to those with low ratings, further underscoring the lack of positive correlation.

![Figure 2. Overall Core Competency Associations: Adjusted Percentage Difference Compared With the Adjusted Low Ratings Category Outcome.](http://carcodereader.store/wp-content/uploads/2025/02/jama-e245268-g002.jpg){width=3908 height=2517}

Alt text: Bar graph depicting adjusted percentage difference in patient outcomes (7-day mortality, 7-day readmission, 30-day mortality, 30-day readmission, length of stay, consultation frequency) for medium and high overall core competency ratings compared to low ratings, showing no statistically significant associations.

![Figure 3. Knowledge Core Competency Associations: Adjusted Percentage Difference Compared With the Adjusted Low Ratings Outcome Category.](http://carcodereader.store/wp-content/uploads/2025/02/jama-e245268-g003.jpg){width=3892 height=2517}

Alt text: Bar graph illustrating adjusted percentage difference in patient outcomes for medium and high knowledge core competency ratings compared to low ratings, revealing no statistically significant associations except a minor reduction in 7-day readmissions for medium ratings.

In stark contrast, certification examination scores demonstrated a significant and positive correlation with patient outcomes. Hospitalists in the top quartile of certification examination scores were associated with an impressive 8.0% reduction in 7-day mortality rates and a 9.3% reduction in 7-day readmission rates compared to those in the bottom quartile. This beneficial effect extended to 30-day mortality, with a significant 3.5% reduction observed for top-quartile scorers. Interestingly, the study also found that top-quartile exam scorers were associated with a 2.4% increase in subspecialist consultations, suggesting a potentially more judicious and effective use of specialist expertise.

![Figure 4. Certifying Examination Quartile Associations: Adjusted Percentage Difference Compared With the Adjusted Bottom Quartile Outcome.](http://carcodereader.store/wp-content/uploads/2025/02/jama-e245268-g004.jpg){width=3892 height=3233}

Alt text: Bar graph presenting adjusted percentage difference in patient outcomes for top, second, and third quartiles of certification exam scores compared to bottom quartile, highlighting significant reductions in 7-day and 30-day mortality and 7-day readmission rates for top quartile scorers.

These findings strongly suggest that while internal medicine programs utilize both milestone ratings and certification examinations, certification examination scores appear to be a more reliable predictor of patient outcomes in the hands of newly trained hospitalists. While milestone ratings may provide valuable formative feedback during residency, they did not translate into measurable improvements in patient outcomes in this study. This raises critical questions about the relative weight and emphasis placed on these different evaluation methods within internal medicine training.

Discussion: Implications for Residency Programs and the Significance of Standardized Scores

The study’s results carry significant implications for internal medicine residency programs and the ongoing discourse about physician competency assessment. The lack of correlation between milestone ratings and patient outcomes, contrasted with the clear association between certification examination scores and improved outcomes, prompts a re-evaluation of how internal medicine programs care about scores from different evaluation methods.

One potential explanation for these findings lies in the inherent nature of each assessment tool. Milestone ratings, while designed to be comprehensive and provide nuanced feedback, are ultimately subjective evaluations made by residency faculty. Factors such as rater bias, varying interpretations of milestone criteria, and potential social pressure to provide favorable end-of-residency ratings could contribute to their limited predictive validity regarding real-world performance. The study itself acknowledges the possibility of “social pressure to give higher terminal ratings to marginal candidates.”

On the other hand, the ABIM certification examination is a standardized, objective assessment of medical knowledge and clinical judgment. It is designed to assess a physician’s ability to apply their knowledge in clinically relevant scenarios. The study’s findings suggest that this exam, despite not directly observing clinical practice, effectively captures crucial aspects of physician competency that translate into tangible benefits for patients, such as reduced mortality and readmission rates. The increased consultation frequency observed among high-scoring physicians may further indicate a greater awareness of their knowledge boundaries and a proactive approach to seeking specialist input when necessary, potentially contributing to better patient management.

The study’s limitations, including its cross-sectional design and reliance on observational data, warrant consideration. While the rigorous methodology, including hospital fixed effects and adjustments for confounding factors, strengthens the findings, causality cannot be definitively established. However, the consistency of these results with previous research linking standardized examination scores to various measures of physician performance and patient care quality reinforces their validity and importance.

The conclusion that certification examination scores are more strongly associated with patient outcomes than milestone ratings does not necessarily imply that milestone ratings are without value. Instead, it suggests that internal medicine programs should perhaps place greater emphasis on standardized, objective assessments like certification exams when evaluating resident competency and predicting future physician performance. Furthermore, the study highlights a potential opportunity to enhance the milestone evaluation system by better incorporating results from standardized examinations, such as in-training exams, to potentially improve their predictive validity and ensure a more robust and reliable assessment of resident competency.

Conclusion: Re-evaluating the Role of Scores in Internal Medicine Training

In conclusion, this study provides compelling evidence that while internal medicine programs care about scores from both milestone ratings and certification examinations, the latter appears to be a more potent indicator of patient outcomes among newly trained hospitalists. Certification examination scores, reflecting medical knowledge and clinical judgment, are significantly associated with reduced mortality and readmission rates, suggesting a direct link between exam performance and the quality of patient care delivered. Milestone ratings, in their current implementation, did not demonstrate a similar correlation in this study.

These findings underscore the need for ongoing critical evaluation of resident assessment methods in internal medicine. While milestone ratings offer a framework for longitudinal feedback and competency development, their subjective nature and limited correlation with patient outcomes raise questions about their effectiveness as a primary measure of physician competency in predicting real-world performance. The demonstrated predictive power of certification examination scores highlights the continued importance of standardized, objective assessments in medical education.

Moving forward, internal medicine programs may benefit from re-examining the balance between subjective and objective evaluations, potentially exploring strategies to integrate standardized examination performance more effectively into residency program assessments and milestone evaluations. Ultimately, the goal remains to develop and refine evaluation methods that accurately identify and nurture competent and high-performing physicians who can deliver the best possible care and outcomes for their patients. Understanding how and why internal medicine programs care about scores, and which scores truly matter, is crucial for achieving this vital objective.

Educational Objective: To identify the key insights or developments described in this article.

What are the internal medicine milestones, introduced to training in 2013, intended to accomplish?
- 1. Evaluation of 6 core competencies, including medical knowledge and professionalism, to provide comprehensive feedback throughout residency training.
What associations were observed between core competency ratings and 7-day mortality and readmission rates for hospitalized Medicare fee-for-service beneficiaries?
- 1. No statistically significant association was observed between top vs bottom overall ratings and any hospital outcome measure.
The authors also evaluated the association between internal medicine certification examination scores and hospital outcome measures. What did they find?
- 1. Top quartile certification examination scores were associated with an 8.0% reduction in 7-day mortality compared with bottom quartile scores.