Publication:
Large language model ability to translate CT and MRI free-text radiology reports into multiple languages

dc.contributor.coauthorMeddeb A., Lüken S., Busch F., Adams L., Ugga L., Koltsakis E., Tzortzakakis A., Jelassi S., Dkhil I., Klontzas M.E., Triantafyllou M., Kocak B., Zhang L., Hu B., Andreychenko A., Yurievich E.A., Logunova T., Morakote W., Angkurawaranon S., Makowski M.R., Wattjes M.P., Cuocolo R., Bressem K.
dc.contributor.departmentKUH (Koç University Hospital)
dc.contributor.kuauthorYüzkan, Sabahattin
dc.contributor.schoolcollegeinstituteKUH (KOÇ UNIVERSITY HOSPITAL)
dc.date.accessioned2025-03-06T21:00:23Z
dc.date.issued2024
dc.description.abstractBackground: High-quality translations of radiology reports are essential for optimal patient care. Because of limited availability of human translators with medical expertise, large language models (LLMs) are a promising solution, but their ability to translate radiology reports remains largely unexplored. Purpose: To evaluate the accuracy and quality of various LLMs in translating radiology reports across high-resource languages (English, Italian, French, German, and Chinese) and low-resource languages (Swedish, Turkish, Russian, Greek, and Thai). Materials and Methods: A dataset of 100 synthetic free-text radiology reports from CT and MRI scans was translated by 18 radiologists between January 14 and May 2, 2024, into nine target languages. Ten LLMs, including GPT-4 (OpenAI), Llama 3 (Meta), and Mixtral models (Mistral AI), were used for automated translation. Translation accuracy and quality were assessed with use of BiLingual Evaluation Understudy (BLEU) score, translation error rate (TER), and CHaRacter-level F-score (chrF++) metrics. Statistical significance was evaluated with use of paired t tests with Holm-Bonferroni corrections. Radiologists also conducted a qualitative evaluation of translations with use of a standardized questionnaire. Results: GPT-4 demonstrated the best overall translation quality, particularly from English to German (BLEU score: 35.0 ± 16.3 [SD];TER: 61.7 ± 21.2;chrF++: 70.6 ± 9.4), to Greek (BLEU: 32.6 ± 10.1;TER: 52.4 ± 10.6;chrF++: 62.8 ± 6.4), to Thai (BLEU: 53.2 ± 7.3;TER: 74.3 ± 5.2;chrF++: 48.4 ± 6.6), and to Turkish (BLEU: 35.5 ± 6.6;TER: 52.7 ± 7.4;chrF++: 70.7 ± 3.7). GPT-3.5 showed highest accuracy in translations from English to French, and Qwen1.5 excelled in English-to-Chinese translations, whereas Mixtral 8x22B performed best in Italian-to-English translations. The qualitative evaluation revealed that LLMs excelled in clarity, readability, and consistency with the original meaning but showed moderate medical terminology accuracy. Conclusion: LLMs showed high accuracy and quality for translating radiology reports, although results varied by model and language pair.
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.indexedbyPubMed
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.identifier.doi10.1148/radiol.241736
dc.identifier.issn0033-8419
dc.identifier.issue3
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-85212905277
dc.identifier.urihttps://doi.org/10.1148/radiol.241736
dc.identifier.urihttps://hdl.handle.net/20.500.14288/27869
dc.identifier.volume313
dc.identifier.wos1384890400004
dc.keywordsLarge language models
dc.keywordsCT reports
dc.keywordsMRI reports
dc.keywordsFree-text translation
dc.keywordsMedical translation
dc.keywordsRadiology
dc.keywordsMultilingual AI
dc.keywordsNatural language processing
dc.keywordsMedical AI
dc.keywordsAutomated translation
dc.language.isoeng
dc.publisherRadiological Society of North America Inc.
dc.relation.ispartofRadiology
dc.subjectComputer engineering
dc.titleLarge language model ability to translate CT and MRI free-text radiology reports into multiple languages
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.kuauthorYüzkan, Sabahattin
local.publication.orgunit1KUH (KOÇ UNIVERSITY HOSPITAL)
local.publication.orgunit2KUH (Koç University Hospital)
relation.isOrgUnitOfPublicationf91d21f0-6b13-46ce-939a-db68e4c8d2ab
relation.isOrgUnitOfPublication.latestForDiscoveryf91d21f0-6b13-46ce-939a-db68e4c8d2ab
relation.isParentOrgUnitOfPublication055775c9-9efe-43ec-814f-f6d771fa6dee
relation.isParentOrgUnitOfPublication.latestForDiscovery055775c9-9efe-43ec-814f-f6d771fa6dee

Files