Objective: The personal statement is often an underutilized aspect of pediatric otolaryngology fellowship applications. In this pilot study, we use deep learning language models to cluster personal statements and elucidate their relationship to applicant rank position and postfellowship research output.
Study design: Retrospective cohort.
Setting: Single pediatric tertiary care center.
Methods: Data and personal statements from 115 applicants to our fellowship program were retrieved from San Francisco Match. BERT (Bidirectional Encoder Representations From Transformers) was used to generate document embeddings for clustering. Regression and machine learning models were used to assess the relationship of personal statements to number of postfellowship publications per year when controlling for publications, board scores, Alpha Omega Alpha status, gender, and residency.
Results: Document embeddings of personal statements were found to cluster into 4 distinct groups by K-means clustering: 2 focused on "training/research" and 2 on "personal/patient anecdotes." Training clusters 1 and 2 were associated with an applicant-organization fit by a single pediatric otolaryngology fellowship program on univariate but not multivariate analysis. Models utilizing document embeddings alone were able to equally predict applicant-organization fit (receiver operating characteristic areas under the curve, 0.763 and 0.750 vs 0.419; P values >.05) as compared with models utilizing applicant characteristics and personal statement clusters alone. All predictive models were poor predictors of postfellowship publications per year.
Conclusion: We demonstrate ability for document embeddings to capture meaningful information in personal statements from pediatric otolaryngology fellowship applicants. A larger study can further differentiate personal statement clusters and assess the predictive potential of document embeddings.
Keywords: deep learning; fellowship application; pediatric otolaryngology; personal statement.