Large Language Models in Ophthalmology: A Bibliographic Analysis

Neslihan Dilruba Köseoğlu; T.Y. Alvin Liu

doi:10.4274/tjo.galenos.2026.79484

Abstract

This study evaluated the distribution of research on the use of large language models (LLMs) in ophthalmology through a bibliographic analysis of articles retrieved from PubMed through November 2024. Studies were categorized into four main areas of LLM application: clinical decision-making (further divided according to subspecialties), education, patient interactions, and miscellaneous applications. Descriptive statistics were used to analyze the distribution of studies by ophthalmic subspecialty, geographical region, journal quality, and author characteristics, including gender and scholarly impact (h-index and i10-index). The findings revealed that clinical decision-making was the most common application (43.7%), with the majority of studies in this subgroup focusing on the retina (39.5%). Geographically, most of the research originated from North America (48.3%), followed by Asia (29.9%) and Europe (20.7%). Most studies were published in high-impact journals (Q1 journals: 74.7%), particularly for those related to clinical decision-making in retina (80.0%), glaucoma (100%), and multiple subspecialties (87.5%). Gender disparities were evident across all author roles, with female authors accounting for only 29.9% of first authors, 25.3% of last authors, and 26.4% of corresponding authors. The results suggest a need for greater diversity in terms of gender and geographic representation in LLM research in ophthalmology to promote inclusive progress in the field.

Keywords:

Large language models, ophthalmology, bibliographical analysis

Introduction

In recent years, artificial intelligence (AI), particularly in the form of large language models (LLMs), has revolutionized many aspects of science and become an integral part of research.¹ Models such as ChatGPT, BERT, and LLaMA utilize millions of parameters and draw from diverse data sources, including books, articles, and other text-based materials, to generate human-like text responses.² These LLMs differ architecturally: ChatGPT uses autoregressive transformers focused on generative language tasks; BERT employs bidirectional transformers for contextual understanding; and LLaMA has a standard decoder-only transformer architecture with minor adaptations, avoiding mixture-of-experts models to maximize training stability.^{3, 4, 5} Despite these differences, all are commonly used for generating responses.

While AI has been utilized in medicine for decades, LLMs are now transforming diagnostics through image analysis and enhancing clinical decision-making by efficiently processing vast amounts of data, clinical information, and patient records, paving the way for precision medicine.⁶ Over the past few years, LLMs have become increasingly adopted across various medical specialties, including radiology, internal medicine, pediatrics, cardiovascular medicine, and many others.^{7, 8, 9} Research related to LLMs has increased significantly since 2023, reflecting a growing interest in their capabilities and applications. Ophthalmology, in particular, has been at the forefront of AI-related research, with growing interest in leveraging LLMs to advance patient care.¹⁰

However, LLM-related research also faces several limitations and challenges. First, LLMs are predominantly trained on English-language data, which may affect the reliability of their responses when applied to non-English languages. As a result, research is often concentrated in regions where English is widely spoken. Regional disparities are evident across all AI studies, with the majority of publications and grants dominated by the US and China, which together have been shown to account for approximately 50% of all publications from 2014 to 2023.¹¹ Additionally, the AI Index Report (2024) showed that novel AI models developed in 2023 primarily originated in the US, followed by China and Europe, further highlighting the geographical gap in AI research.¹² This unequal distribution in research output is accompanied by other disparities within the field, including gender representation. The latest Global Gender Gap Report (2024) by the World Economic Forum revealed that only 22% of professionals in AI are women, a gender gap which is reflected in AI-related research.¹³

Given these known disparities in AI-related research, this study aimed to provide a closer look at LLM-related studies in ophthalmology. We performed a bibliographic analysis to evaluate the distribution of research on LLMs in ophthalmology, focusing on gender, academic impact, and the geographical origin of the studies.

Methods

A comprehensive literature search was conducted on PubMed to identify studies involving the use of LLMs in ophthalmology published up until November 2024. The search terms used were large language models AND ophthalmology OR retina OR glaucoma OR cornea OR uvea OR pediatric ophthalmology OR neuro-ophthalmology. The search results yielded a total of 194 studies. Only original investigations were included in the final analysis. Studies not related to LLMs or ophthalmology, review articles, meta-analyses, and commentaries were excluded from the final list.

Included studies were categorized into four groups based on their primary focus and application:

• Clinical Decision-Making Applications: Studies that investigated the use of LLMs in clinical decision-making and assessed LLM responses in diagnosing, managing, or providing clinical support for ophthalmic conditions. This category was further divided into ophthalmic subspecialties including retina, glaucoma, cornea/anterior segment, uveitis, neuro-ophthalmology, and pediatrics. Studies that encompassed more than one ophthalmic subspecialty or general ophthalmology were categorized as “multiple subspecialties.”

• Educational Applications: Studies that explored the use of LLMs in educational contexts, including their application in answering board-style questions and developing educational materials for both patients and clinicians.

• Patient Interaction Applications: Studies that focused on how LLMs are used to improve patient communication and respond to frequently asked questions about ocular health, such as in the format of a chatbot.

• Miscellaneous Applications: Studies that explored other applications of LLMs in ophthalmology that do not neatly fit into other categories, particularly those that are highly technical in nature.

The initial categorization of studies was conducted by the first author and reviewed by the senior author. Discrepancies were resolved by discussion and consensus.

Descriptive statistics were used to analyze the distribution of studies by application type, ophthalmic subspecialty, geographical region, journal quality, and author characteristics. The geographic regions were determined based on the United Nations World Population Prospects, which divides the world into six continental regions: Africa, Asia, Europe, Latin America and the Caribbean, North America, and Oceania.¹⁴ Geographical region was assigned according to the first author’s affiliated institution. Journal quality was classified into quartiles Q1, Q2, Q3, and Q4 as retrieved from SCImago. Author characteristics included gender and scholarly impact. Gender information was obtained from institutional websites or professional profiles. Scholarly impact was assessed using the h-index and i10-index, both retrieved from Google Scholar. The h-index is a metric that measures the scientific impact of an author’s publications, defined as the maximum value of h for which the author has published at least h papers, each of which has been cited no fewer than h times.¹⁵ In contrast, the i10-index is a related metric used by Google Scholar that counts the number of publications with at least 10 citations.¹⁶

Results

Overall Results

A total of 87 original research studies were included in the final evaluation. In terms of application type, the most common was clinical decision-making (n=38, 43.7%), followed by educational applications (n=22, 25.3%), patient interaction applications (n=18, 20.7%), and miscellaneous applications (n=9, 10.3%) (Table 1). Within the clinical decision-making category, studies related to the retina were most prevalent, accounting for 15 studies (39.5%), while pediatrics had the least, with only 1 study (2.6%). Notably, research encompassing multiple subspecialties accounted for 21.1% of the studies in this group (n=8). There were moderate contributions from glaucoma, cornea/anterior segment, uveitis, and neuro-ophthalmology. The number of studies in each category is presented in Table 2.

The geographic distribution of studies based on first author’s affiliated institution revealed that the majority of research originated from North America (n=42, 48.3%), followed by Asia (n=26, 29.9%) and Europe (n=18, 20.7%). Oceania was represented by a single study (1.1%) from Australia, while no research related to LLMs in ophthalmology was reported from Latin America and the Caribbean or Africa. The distribution of studies by region is presented in Table 3 and Figure 1.

The majority of studies were published in high-impact journals, with 65 studies (74.7%) in Q1 journals, 10 studies (11.5%) in Q2 journals, and 9 studies (10.3%) in Q3 journals. In addition, 4 studies (4.6%) were not assigned a quartile, of which 3 were pre-prints in clinical decision-making. Most studies were published in 2024 (n=71, 81.6%), followed by 15 studies (17.2%) in 2023 and 1 study (1.1%) in 2022.

Gender disparity was evident for all author roles, with women accounting for only 29.9% of first authors, 25.3% of last authors, and 26.4% of corresponding authors (Figure 2). As expected, last and corresponding authors generally had greater scholastic impact than first authors. The average h-indices for last and corresponding authors were 39.7±37.7 and 28.7±33.7, respectively, compared to 8.9±6.2 for first authors (p<0.001). Similarly, the average i10-indices for last and corresponding authors were 147.1±254.2 and 94.4±222.2, compared to 12.0±14.9 for first authors (p=0.005) (Table 4).

We also evaluated the publications according to each application category, and the results of this in-depth analysis are as follows:

Clinical Decision-Making

Studies within this category were published from a diverse range of countries across several continents, although the majority of them still originated from the USA (n=17, 44.7%). Most studies were published within the past year in 2024 (n=32, 84.2%). Gender disparities were evident across all subgroups, most notably in the cornea/anterior segment subspecialty, where there were no women as first, last, or corresponding authors (Table 5).

Most studies were published in high-impact journals, particularly within the retina, glaucoma, and multiple subspecialties subgroups, with 80.0%, 100%, and 87.5% appearing in Q1 journals, respectively. The cornea/anterior segment and uveitis subgroups had a mix of Q1, Q2, and/or Q3 journals. Notably, no studies were published in Q4 journals. A detailed list of the studies in this category is provided in the supplementary material (Supplementary Table 1).^17-54

Educational Applications

Of the 22 studies, North America contributed the largest share (45.5%), with 8 studies from the USA and 2 from Canada. Europe accounted for 31.8% of the studies, with 6 studies from the UK and 1 study from Germany. Asia contributed 5 studies (22.7%), with 2 from China and 1 each from Türkiye, Israel, and Japan. Most of these studies were published in Q1 journals (72.7%), followed by Q3 journals (18.2%) and Q2 journals (9.1%), with the majority published in 2024 (n=18, 81.8%). Gender disparity was also evident in this group, with male dominance in all author roles. Women accounted for 4 first authors (18.2%), 1 last author (4.5%), and 3 corresponding authors (13.6%).

Patient Interaction Applications

Studies in this group were primarily from Asia (44.4%), followed by North America (38.9%), Europe (11.1%), and Oceania (5.6%). The majority of the studies were published in Q1 journals (66.7%), with smaller portions in Q2 and Q3 journals. Female authors were overall more represented in this category, accounting for 33.3% of first authors, 5.6% of last authors, and 44.4% of corresponding authors.

Miscellaneous Applications

Of the 9 studies, North America made the largest contribution, accounting for 66.7% of the studies, with 6 studies originating from the USA. Türkiye, India, and Finland contributed the remaining studies, each providing one. The distribution of author roles was nearly balanced for first and corresponding authors, with 44.4% of first authors being men and 55.6% women, while 55.6% of corresponding authors were men and 44.4% were women. However, last authors were still predominantly male (66.7%).

The distribution of gender and geographical regions for first and last authors across educational applications, patient interaction applications, and miscellaneous applications is detailed in Table 6. A comprehensive list of studies in the other categories and overall journal metrics are provided in the supplementary material (Supplementary Tables 2 and 3).^55-103

Discussion

LLMs are generative AI systems that perform natural language processing in order to understand human text and speech.¹⁰⁴^,¹⁰⁵ ChatGPT, BERT, and LLaMA are commonly used LLMs that employ deep learning (DL) techniques to engage in meaningful conversations with users.¹⁰⁴ These applications provide access to vast amounts of knowledge and may also help patients access medical information relevant to their specific conditions, assess the urgency of their symptoms, or be directed to the appropriate subspecialty.¹⁰⁶

As AI applications continue to grow in medicine, there has also been a rising interest in their use within ophthalmology. Over the past year, ophthalmology has seen an increase in the diverse applications of LLMs, ranging from clinical decision-making to enhancing patient interactions. The purpose of our study was to explore the current state of LLM research in ophthalmology through a bibliographic analysis.

We observed that most of the LLM studies in ophthalmology focused on clinical decision-making, with the majority aimed at retinal applications. This interest in clinical decision-making applications is not surprising, as LLMs have shown promise in assisting with complex diagnostic and treatment decisions. However, there is still significant uncertainty regarding how LLMs could be integrated into actual patient care from a regulatory standpoint.¹⁰⁷ While the reasons for the high number of retina-related LLM publications are not entirely clear, we speculate that it may be due to the history of DL applications in ophthalmology. DL-based image analyses in ophthalmology have been spearheaded by the retina subspecialty. Therefore, the same group of researchers who focused on DL-based analysis of retinal images may have been early adopters of LLM-related research.¹⁰⁸

Similarly, our research team has recently focused on LLMs. For example, we published a meta-analysis evaluating the accuracy of LLMs in answering board-style questions.⁵⁵ In addition, we further examined the impact of integrating retrieval-augmented generation in enhancing LLM performance on both text-based and image-based board-style questions.¹⁰⁹^,¹¹⁰ Another area of interest has been the difference in performance between different LLMs in answering questions related to social determinants of health in ophthalmology.¹¹¹

Furthermore, our current study also demonstrates that LLMs possess significant scientific merit, as evidenced by their increasing presence in top-tier publications in recent years. These studies present substantial scholarly impact, with a notable number appearing in top-tier Q1 journals. We observed that glaucoma-related applications exhibited the greatest scientific impact in clinical decision-making, as reflected by their high h-index and i10-index values, along with their strong presence in Q1 journals. However, overall, the most scientifically impactful applications were those involving algorithms trained for educational purposes, such as evaluating their performance in clinical knowledge exams compared to professionals or developing educational materials. These studies demonstrated higher average h-index and i10-index values, with the majority published in Q1 journals. The high impact of educational applications in particular may be attributed to their adaptability across medical disciplines beyond ophthalmology, their potential to enhance training outcomes, and their effectiveness in addressing educational gaps in medical training.

Despite the increasing prominence of LLMs, the underrepresentation of female authors remains an ongoing challenge. A recent study showed an increase in the proportion of women in research since 2018, from 33% to 37% of first authors and 27% to 30% of last authors.¹¹²^,¹¹³ However, gender disparity continues to be a significant issue, particularly in the field of AI. Similar to the literature, our study revealed a prominent gender gap in all author roles, with women accounting for only 29.9% of first authors, 25.3% of last authors, and 26.4% of corresponding authors. This gender distribution in our study suggests that despite recent efforts to increase female representation in academic publishing, significant barriers still exist for women in reaching key author roles. This underscores the need for continued efforts to promote gender equity. The notable underrepresentation of female authors likely reflects broader systemic issues, such as gender disparities in science, technology, engineering and mathematics education and leadership roles, limited mentorship opportunities for women in AI-driven medical research, and potential institutional biases.¹¹⁴

A similar disparity can be observed in the geographical distribution of studies. Research has shown that English-speaking countries, particularly the US (48.2%), dominate authorship in leading medical journals. In contrast, authors from developing countries remain underrepresented, although there has been an increase in geographical diversity in recent years.¹¹⁵ Another recent study also highlighted a similar bias in the rejection of research submitted to journals. Compared to authors from institutions in non-Western countries, those from Western countries are 5.7% more likely to have their manuscript accepted after rejection. Additionally, authors from Western countries tend to publish 23 days faster, revise 5.9% less often, change co-authors 12.0% less frequently, and ultimately publish in journals with impact factors that are 0.8% higher.¹¹^,¹¹⁶

Consistent with the literature, our study revealed a similar disparity in the geographical origin of research based on first authorship. Overall, English-speaking countries dominated the research output, with the USA contributing 43.7% of the studies and the UK contributing to 10.3%. However, China ranked a strong second after the USA, contributing 13.8% of the studies, underscoring its significant contribution to the field. This suggests that while the USA and China are leading research in LLMs in ophthalmology, there is still a strong need for contributions from other regions, particularly in the development of applications tailored to the specific cultural and societal needs of those regions. This imbalance may be partly explained by the predominance of English-language data used for training LLMs.¹¹⁷ Furthermore, it may also be attributed to the immense investment into fundamental AI infrastructure in the US and China, funding availability, and access to advanced technologies, all of which play a critical role in shaping the global distribution of LLM-related research.¹¹⁸

Study Limitations

This study also has limitations, particularly the use of PubMed as the sole platform for the literature search. While other databases such as arXiv and IEEE Xplore host a significant number of AI-related studies, we chose to include only those indexed in PubMed due to its high standards for peer-reviewed content, thereby ensuring the scientific rigor of the included articles. Additionally, we recognize that categorizing studies based on AI-related infrastructure or funding opportunities could offer a different perspective on the distribution of LLM-related research and should be explored in future studies.

Conclusion

The use of LLMs in ophthalmology is rapidly gaining interest, with the majority of studies published recently (in 2024) and in top-tier (Q1) journals. North America leads in publications, followed by growing contributions from Asia and Europe, while other regions, including Oceania, Latin America and the Caribbean, and Africa, remain underrepresented. A similar imbalance is observed in the gender distribution in authorship, with women being severely underrepresented across all key author roles. These geographic and gender inequities highlight significant gaps in global and demographic representation within LLM research in ophthalmology and demonstrate the need for further progress.

Authorship Contributions

Concept: N.D.K., T.Y.A.L., Design: N.D.K., T.Y.A.L., Data Collection or Processing: N.D.K., T.Y.A.L., Analysis or Interpretation: N.D.K., T.Y.A.L., Literature Search: N.D.K., T.Y.A.L., Writing: N.D.K., T.Y.A.L.

Conflict of Interest: No conflict of interest was declared by the authors.

Financial Disclosure: The authors declared that this study received no financial support.

References

Amisha, Malik P, Pathania M, Rathaur VK. Overview of artificial intelligence in medicine. J Family Med Prim Care. 2019;8:2328-2331.

This page is for health professionals only.