Keywords
resident matching - stable marriage - Gale–Shapley - Nash equilibrium
The San Francisco Residency and Fellowship Match Services (SF Match)[1] and the National Residency Match Program (NRMP)[2] are two national matching systems used to place physicians into residency and fellowship
training programs. Both use versions of the Gale–Shapley algorithm[3] to pair applicants with programs in a binding system that has been used for over
50 years. While there has been no significant alteration in these systems in the past
half century, the landscape of the application process has changed substantially,
mainly due to a growing number of applicants and participating programs each year
while the number of applicants per position remains relatively stable. The year 2020
has the largest match numbers to date, with 40,084 applicants applying to 37,256 positions.[4]
In the 1920s, when the first residency programs were introduced as optional postgraduate
training, only a few medical graduates participated. This inadequate supply of interns
led to fierce competition among the programs, which manifested as a race between programs
to secure binding commitments from potential graduates as early as possible.[5] This resulted in medical students receiving internship offers up to 2 years before
graduation.[6] To avoid this race between programs, the National Interassociation Committee on
Internships (NICI) was formed in 1950 to examine existing matching plans and performed
a trial for a centralized match system. In October 1951, 79 medical schools formed
the National Student Internship Committee and adopted a modified Boston Pool Plan[7] nationally based on the recommendations of the NICI. The National Internship Matching
Program (NIMP, now the NRMP) was incorporated in 1953 to manage and administer the
matching process, and has continued to do so for most medical residency programs and
many fellowship programs.[8] The SF Match oversees ophthalmology and plastic surgery residency programs, as well
as multiple specialty fellowship programs.
The residency matching dilemma can be described as a stable marriage problem, with
the applicants as one side of the “marriage” and the residency programs as the other.
A marriage or match is stable when there is no applicant matched to program A while
preferring program B, when program B also prefers this applicant over at least one
other candidate that is currently matched with program B. Gale and Shapley proved
that for an equal number of participants on each side who have each ranked every potential
partner, stable matches for all participants exist[3] and their eponymous algorithm finds a solution. (The Sveriges Riksbank Prize in
Economic Sciences in Memory of Alfred Nobel for 2012 was awarded to Lloyd S. Shapley
for this work.) The resident matching algorithms used by NRMP and SF Match appear
mathematically equivalent to the Gale–Shapley algorithm. The current NRMP match algorithm
was implemented in 1995.
The Gale–Shapley algorithm takes the rank order lists (a ranked list of choices) from
each of the participants on both sides along with a predetermined proposing side (either
the programs or the applicants in this case). For example, if the applicants are the
proposing side, the algorithm first selects an applicant at random from the pool of
applicants. This applicant will first propose to its most preferred program. If that
program has open positions and has ranked that applicant, then a tentative match is
formed between them and the algorithm picks another applicant to start proposing to
his or her most preferred program. If the program's positions are already filled,
the algorithm checks if the program would prefer the new proposing applicant over
one of their currently matched applicants. If the program prefers the new proposing
applicant, then the program's match with its least preferred previously matched candidate
is annulled, and that candidate is added back to the applicant pool. The algorithm
continues until all program positions have been filled.
Gale–Shapley requires choosing one proposing side: applicants versus programs. In
addition, in the original formulation, both parties must rank all possible matches.
The algorithm then works to achieve a stable match and fill every available position.
While the algorithm favors the proposing side,[3] Roth and Peranson showed that in the case of NRMP, the algorithm produces similar
match results whether applicants or programs propose.[9]
The Gale–Shapley algorithm assumption of submitting full rank lists has practical
implications, since this is not true in practice. Applicants cannot directly rank
all programs because they must first apply to programs for interviews. While applicants
can rank programs that did not interview them, programs will generally only rank applicants
whom they have interviewed. However, under the current applicant-proposing version
of Gale–Shapley, applicants cannot do worse by ranking and being ranked by more programs.
This implies that applicants should apply to as many programs as resources allow in
the hopes of being invited for more interviews and then being ranked more often. Similarly,
programs likely feel induced to interview and rank as many applicants as possible
to increase the likelihood of matching all positions in their program.
In the current study, we examine recent trends in the number of applicants and available
positions, as well the average number of applications and interviews per applicant
for ophthalmology and multiple NRMP specialties. To determine whether these numbers
are insufficient, optimal, or excessive, we simulated matches using the Gale–Shapley
algorithm, comparing present conditions with simulated matches in which the number
of positions that applicants or programs could rank are limited.
Patients and Methods
This retrospective study was conducted in accordance with the Declaration of Helsinki.
The study was exempted from approval by the institutional review board of the University
of Washington, Seattle, WA. Publicly available historical data were collected from
the NRMP and SF Match websites, and from archived versions of the websites using the
Wayback Machine.[10] Data were collected for matches between 1985 and 2020. In addition, fully anonymized
rank lists and match data for ophthalmology applicants and programs were obtained
from the SF Match for the years 2011 to 2019 with approval from the Board of Trustees
of the Association of University Professors of Ophthalmology, who oversees SF Match.
Longitudinal Trend Analysis
Using historical match statistics from the NRMP and SF Match, the total number of
applicants and positions and the average number of applications and interviews over
time were obtained for the following specialties: dermatology, otolaryngology, internal
medicine, orthopedic surgery, plastic surgery, diagnostic radiology, radiation oncology,
and ophthalmology.
To evaluate the trends in the ranking behaviors of the residency programs, we modeled
the length of program rank lists (taken as a proxy and a lower bound for the number
of interviews), and the number of available positions over time in years using a multivariable
ordinary least squares model. The regression model was fitted by using the anonymized
SF match rank list data.
We performed a cost and risk analysis of the ophthalmology residency match for applicants
to determine the economics behind residency matching. Cost estimates were based on
a financial analysis study of the ophthalmology residency match program.[11]
Capping Analyses and Truncation Analysis
We investigated the extent of the universal excessive ranking that occurs, defined
as ranking more programs/applicants than necessary to ensure a match and filling all
available spots. We capped the length of the finalized rank lists of applicants using
anonymized, actual SF Match rank lists as the basis for our experiment. We applied
progressively more capping restrictions to limit the maximum number of entries on
the rank lists. Next, to cap the programs, the number of applicants per spot was increasingly
restricted to account for programs of different sizes. As a final analysis, we capped
both applicant and program rank lists. The Gale–Shapley algorithm was then applied
to the modified rank lists. The percent of all available ophthalmology positions filled
was computed for each capping level.
To understand the pressures behind the universal excessive ranking behavior, we performed
individual truncation experiments where the rank list of each applicant or program
was successively truncated while all other rank lists were unchanged and the Gale–Shapley
algorithm was rerun. For applicants, the change in rank status in going from matched
to unmatched was measured. For programs, the percentage of spots filled was measured
as a function of rank list truncation.
Results
The burden of applications and interviews are increasing
The number of applicants relative to the number of available residency positions in
ophthalmology has been steady at approximately 1.40 applicants per available position
every year (95% confidence interval: 1.28–1.54) since 2000 ([Fig. 1A]). In contrast, the average number of applications submitted and the average number
of interviews per applicant have been rising continuously in ophthalmology. The average
number of applications per applicant submitted annually between 1985 and 2020 was
increased from 24 to 77. Linear regression of these data since 2000 indicates an annual
increase in applications of 2.07/year/applicant (Applications = 2.07*year + 32.44,
r2 = 0.98, p=5.7e-17). Although data were not available for the time period of 2000 to 2010, looking
at a longer timescale, the average number of interviews per applicant was increased
56% (5.7–8.9) between 1985 and 2020 ([Fig. 1B]), although this number appears to have stabilized over the past 5 years.
Fig. 1 Longitudinal trends in ophthalmology match. (A) Total number of matched and unmatched applicants by year for SF Match. (B) Average number of applications and interviews by year for SF Match. Data for the
average number of interviews were not available for the years in the gray box. (C) Comparison of the number of applications as a percentage of all programs in 2019
for ophthalmology and National Resident Matching Program specialties internal medicine,
radiology, orthopedic surgery, otolaryngology (ear, nose, and throat), dermatology,
radiation oncology, and plastic surgery. SF, San Francisco Residency and Fellowship
Match Services.
Similar trends were found for NRMP-matched specialties ([Supplementary Fig. S1] [available in the online version]). The median number of applications per candidate
has increased from 27.6 to 39 (41.3%) between 2008 and 2019 across all NRMP specialties.
For seven selective NRMP specialties, the median number of applications has increased
38.5%. Similarly, the median number of interviews has increased by 19.3 and 5.2% over
this time across all NRMP specialties and for the seven selected specialties, respectively
([Supplementary Fig. S1] [available in the online version]). For comparison over the same time frame (2008–2019),
the average number of applications in ophthalmology increased 56.3% (48–75). In addition,
between 2011 and 2019, where the data were available, the average number of interviews
in ophthalmology was increased by 7.5% (8.24–8.86).
The median number of applications as a ratio of all programs for seven selective NRMP-matched
specialties and ophthalmology is shown in [Fig. 1C]. For specialties, such as radiation oncology and plastic surgery, the applicants
typically apply to over 90% of all programs. In ophthalmology, applicants apply to
65% (75/116) of all programs on average.
In 2019, the average ophthalmology applicant ranked 8.86 ± 5.53 programs, and the
average program ranked 11.54 ± 4.26 applicants per available position. On the program
side, the length of the rank lists has increased over time. By linear regression modeling,
programs ranked 8.34 candidates per open spot in 2011 and have been ranking 1.83 (95%
CI: 1.53–2.14) more candidates per available position every subsequent year.
Moderate Capping of Match List Length for Applicants and Programs have Minimal Effect
on Overall Match Success
It is possible that the current match list lengths have increased over time to ensure
a complete match (i.e., to fill nearly every available position). To estimate the
impact of shortened rank lists on overall match success for both applicants and programs,
we re-simulated the match for each year from 2011 to 2019 while capping the maximum
number of entries on either applicant or program rank lists by progressive degrees.
The total percentage of positions filled for applicant, program, and both combined
after rank list capping are shown in [Fig. 2A–C], respectively. When we compared the number of applicants who matched without rank
list limitations versus the number matched under successively shorter capped maximum
rank list lengths, only capping applicant rank lists below three positions resulted
in more than 5% change in this number ([Fig. 2A]). Notably, capping program rank lists up to the limit of analysis of 15 positions
did not affect the overall success of the match.
Fig. 2 Relationship between the percentage of total ophthalmology residency positions filled
before capping at different maximum rank list lengths for (A) applicants, (B) programs, and (C) both applicants and programs. For each experiment, the Gale–Shapley algorithm was
rerun to simulate the match.
Individual Ranking Behavior of Applicants and Programs through Truncation Experiments
From these data, it appears that both programs and applicants are ranking more counterparts
than are necessary for a stable match. To understand the pressures behind the over-ranking,
we performed unilateral and individual truncation experiments, where we systematically
removed the last entry of each applicant's rank list while not changing any other
applicant or program rank lists and re-ran Gale–Shapley to observe changes in their
match outcome. We then repeated this until each applicant's rank list was reduced
to a single entry ([Fig. 3]). We found that applicants who ranked up to 10 programs had a change in their match
outcome even with the removal of a single entry from their rank lists, and applicants
who ranked 13 or more could remove the bottom two entries on their rank lists without
a change in the match outcome. These results show that, at an individual level, applicants
benefit from submitting long match lists.
Fig. 3 Effect of the individual truncation of applicant rank list. Applicant outcomes (%
matched) grouped by the number of programs the applicants ranked (rank list lengths
are shown in gray), where each applicant's rank list is truncated by different amounts
while no other applicant and program rank lists are modified.
We also analyzed the proportion of applicants matching after truncating the number
of applicants ranked for each program while holding all other programs' and all applicant
rank lists unchanged, stratified by the program size ([Fig. 4]). A stepwise decrease was noted in the number of applicants ranked per spot with
respect to the program size ([Fig. 4A]). Smaller programs rank more applicants per spot compared with larger programs;
a program with four positions available ranks a median of 11.6 applicants per spot,
while a program with eight available positions ranks a median of 7.3 applicants per
spot. When the rank lists of individual programs were truncated, a negative effect
was seen for smaller programs earlier than larger programs ([Fig. 4B]). For instance, two-person programs would lower their fill rate to 90% by truncating
just three to four entries in their rank lists, while four-person programs can truncate
19 ranks to reach a similar rate. Thus, individual programs, particularly smaller
programs, benefit from increasing the length of their match list.
Fig. 4 Effect of individual truncation of program rank list. (A) The number of applicants ranked per spot was grouped by program size. Smaller programs
rank more applicants than larger programs. (B) Total percentage of positions matched grouped by the number of program spots, where
every program truncated its rank list while the rank lists of all other programs and
all applicants remain unchanged, with the match simulated under Gale–Shapley.
Discussion
The results of the current analysis demonstrate that for both ophthalmology and other
“competitive” specialties (i.e., where applicants significantly outnumber positions),
(1) the number of applications per applicant and number of interviews per program
have increased substantially over the past 20 years; (2) the current numbers of ophthalmology
applications and interviews are in excess of those necessary to ensure a near-complete
match; and (3) individual truncation of match list length by either applicant or program
negatively impacts the likelihood of a successful match for the individual.
The increases seen in number of applications and interviews are driven by the Nash
equilibrium.[12] A Nash equilibrium is when no player of a game can improve their payoff by changing
their strategy while all the other players keep their strategies unchanged. In this
“game,” the players are the applicants, who have realized that they will do no worse
by applying to more programs because under the current applicant-proposing Gale–Shapley
stable marriage residency algorithm, applicants will always match to their most preferred
program if that program also prefers them over other candidates. Consequently, if
applicants reduce their rank lists by even a single entry, they run the risk of becoming
unmatched ([Fig. 3]). The only deterrent to applying to more programs for applicants is increased cost.[13]
[14]
[15] The average cost of submitting applications in ophthalmology has risen $805 in less
than 10 years from $930 in 2011 to $1,735 in 2019. It can rise another $35 × (116–75) = $1,435
if the average applicant applies to all programs, the saturation point. The extra
$1,435 is 0.39% of the average ophthalmologist's annual income of $366,000 in 2019,
or 0.013% annualized over a 30-year career.[16] Thus, the financial burdens of application (and indeed, interviewing as well) although
substantial for the student are trivial with respect to the cost of not matching.
The Nash equilibrium of Gale–Shapley not only increases costs for applicants, but
also for programs. In 2011, 621 ophthalmology applicants applied to an average of
53 programs; while in 2019, 648 applicants applied to an average of 75 programs each.
During this time, the number of programs was increased from 113 to 116. Using an estimated
5-minute initial review time per application,[17] a program director would spend on average 10 additional hours reviewing applications
(5 × 648 × 75/116/60 = 35.1 hours in 2019 vs. 5 × 621 × 53/113/60 = 25.3 hours in
2011). The application review process will likely become more time consuming in the
future, as the USMLE Step 1 exam, cited by 94% of program directors as an important
factor in extending interview offers, will become pass/fail in 2022.[18] Programs also experience substantial financial burden to interview candidates.[15]
[19] Ophthalmology programs spend approximately $3,736 per interviewed candidate when
application screening time and lost clinical revenue for interview time are accounted
for.[11] Ophthalmology interview costs are already large, as an average of 8.34 candidates
are interviewed per available position, but costs will rise further with the average
increasing by 1.83 additional candidates per position every year. This burden is particularly
challenging for smaller programs, which must rank more applicants per position to
ensure filling ([Fig. 4]).
The Nash equilibrium challenge could be addressed by an agreement among all programs
and applicants to cap the number of interviews per available slot at 8 ([Fig. 2], bottom), and for applicants to rank only those programs at which they interviewed
(i.e., no more than eight). For example, in 2019, adopting this policy would have
resulted in a 29.6% reduction in total ranked positions by programs, and a 29.1% decrease
in the number of programs ranked by applicants. Despite the reductions, approximately
95% of candidates would have still matched with a stable-marriage result. Overall,
such an agreement would have reduced the number of total interviews system-wide in
2019 from 5,856 to 4,190 (28.5%). Given per interview cost of $404 for candidates
and $3,736 for programs, this would have resulted in a net savings of $6,897,240 for
the system at a cost of 22 candidates needing to enter the scramble to secure a position.
However, such a change would not mitigate screening of initial applications and might
represent a restraint of choice for applicants. A “cascaded match” in which applicants
first “match” to interviews (perhaps by remotely conducting preliminary interviews),
with those interviews limited to eight programs, might be a reasonable approach for
implementing such a system.
Several limitations of the present study should be noted. This was an observational
longitudinal study, and only certain specialties were examined. The consequences of
Gale–Shapley algorithms might differ for specialties where available positions are
in excess of qualified candidates, for example. However, the trend toward increased
applications was remarkably similar across multiple competitive specialties. In addition,
the analysis of applicant and program behavior based on rank list, and match information
was limited to ophthalmology due to the availability of data. Nevertheless, we believe
that the same concerns we highlight would equally apply to other specialties since
the same match algorithm is used by NMRP. We also did not have access to which programs
each applicant actually interviewed at and, as such, do not know the applicant-program
pairs in which both sides decided not to rank each other. Finally, the study does
not examine the effect of the strict ordinal rankings used in the Gale–Shapley algorithm
on participant behavior. Ordinal rankings cannot express relative preference and may
not be representative of either applicant or program preferences.[20] It is possible that weighted match list ranking (in which candidates and programs
can “weight” their preferences in a nonlinear fashion) might show different behavior
under truncation. Our capping and truncation analysis assumes that the final ranking
behavior of the applicants and programs would not change after having been interviewed.
These challenges of the Nash equilibrium driving applicants toward applying to all
programs nationally and inducing programs to interview increasing numbers of applicants
are a direct consequence of the current structure of Gale–Shapley based match systems.
Our analysis of rank list truncation demonstrates that rank lists are currently excessively
long for a successful match. However, mandatory capping of rank list length is likely
not feasible as it would be viewed as constraining choice. An alternative to capping
the number of interviews would be to utilize non-Gale–Shapley algorithms, which might
have different Nash equilibrium behavior. For instance, providing a budget of rank
weightings to programs and candidates (as opposed to the current ordinal ranking)
might intrinsically reduce application numbers while improving satisfaction with the
match by allowing candidates to better express their preferences. Ideally, an improved
algorithm would optimize the preferences of the entire match and better incorporate
the relative preferences of participants, while achieving major cost savings for all
participants.