Large language models and their performance for the diagnosis of histoplasmosis

Summary

Researchers tested whether artificial intelligence chatbots like ChatGPT and Microsoft Copilot could help doctors diagnose histoplasmosis, a serious fungal infection affecting people with HIV/AIDS that is often missed. They presented 20 real patient case descriptions to different AI systems and found that Microsoft Copilot performed best, correctly identifying histoplasmosis in 90% of cases—about as good as laboratory tests. While the AI showed promise as a helpful tool to suggest this neglected disease during diagnosis, doctors would still need to verify findings with actual tests.

Background

Progressive disseminated histoplasmosis remains a major AIDS-defining opportunistic infection with low awareness and lack of rapid diagnostic tools in endemic areas, causing potentially fatal delays in treatment. Previous studies showed poor performance of ChatGPT 3.5 in identifying HIV-associated histoplasmosis cases. This study evaluates whether improved language models can better assist in diagnosing this neglected disease.

Objective

To evaluate the performance of various large language models (ChatGPT 3.5, ChatGPT 4.0, Microsoft Copilot, Google Gemini, and Deepseek) in identifying histoplasmosis from clinical vignettes of HIV-associated cases. The study aimed to determine if AI has improved since previous assessments and could serve as a point-of-care diagnostic tool.

Results

Microsoft Copilot achieved the highest sensitivity at 90%, listing histoplasmosis or invasive fungal infection for 18-19 of 20 vignettes. ChatGPT 4.0 identified histoplasmosis in 14-16 of 20 cases depending on geographic context, ChatGPT 3.5 in 15 cases, while Gemini and Deepseek only identified it in 3 of 20 cases each. Performance improved when location was changed to Indianapolis, a known histoplasmosis hotspot.

Conclusion

Large language models, particularly Microsoft Copilot, show significant potential as diagnostic aids for histoplasmosis with performance comparable to antigen detection tests. While concerns about black-box nature and confabulation remain valid, the rapid improvement in LLM capabilities suggests they could be integrated into clinical practice as supplementary tools for identifying neglected diseases, though quality clinical documentation remains crucial.
Scroll to Top