Publication:
Benchmarking AI chatbots for maternal lactation support: a cross-platform evaluation of quality, readability, and clinical accuracy

dc.contributor.coauthorAslan, Ilke Ozer
dc.contributor.departmentKUH (Koç University Hospital)
dc.contributor.kuauthorDoctor, Aslan, Mustafa Törehan
dc.contributor.schoolcollegeinstituteKUH (KOÇ UNIVERSITY HOSPITAL)
dc.date.accessioned2025-09-10T04:55:47Z
dc.date.available2025-09-09
dc.date.issued2025
dc.description.abstractBackground and Objective: Large language model (LLM)-based chatbots are increasingly utilized by postpartum individuals seeking guidance on breastfeeding. However, the chatbots' content quality, readability, and alignment with clinical guidelines remain uncertain. This study was conducted to evaluate and compare the quality, readability, and factual accuracy of responses generated by three publicly accessible AI chatbots-ChatGPT-4o Pro, Gemini 2.5 Pro, and Copilot Pro-when prompted with common maternal questions related to breast-milk supply. Methods: Twenty frequently asked breastfeeding-related questions were submitted to each chatbot in separate sessions. The responses were paraphrased to enable standardized scoring and were then evaluated using three validated tools: ensuring quality information for patients (EQIP), the simple measure of gobbledygook (SMOG), and the global quality scale (GQS). Factual accuracy was benchmarked against WHO, ACOG, CDC, and NICE guidelines using a three-point rubric. Additional user experience metrics included response time, character count, content density, and structural formatting. Statistical comparisons were performed using the Kruskal-Wallis and Wilcoxon rank-sum tests with Bonferroni correction. Results: ChatGPT-4o Pro achieved the highest overall performance across all primary outcomes: EQIP score (85.7 +/- 2.4%), SMOG score (9.78 +/- 0.22), and GQS rating (4.55 +/- 0.50), followed by Gemini 2.5 Pro and Copilot Pro (p < 0.001 for all comparisons). ChatGPT-4o Pro also demonstrated the highest factual alignment with clinical guidelines (95%), while Copilot showed more frequent omissions or simplifications. Differences in response time and formatting quality were statistically significant, although not always clinically meaningful. Conclusions: ChatGPT-4o Pro outperforms other chatbots in delivering structured, readable, and guideline-concordant breastfeeding information. However, substantial variability persists across the platforms, and none should be considered a substitute for professional guidance. Importantly, the phenomenon of AI hallucinations-where chatbots may generate factually incorrect or fabricated information-remains a critical risk that must be addressed to ensure safe integration into maternal health communication. Future efforts should focus on improving the transparency, accuracy, and multilingual reliability of AI chatbots to ensure their safe integration into maternal health communications.
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.indexedbyPubMed
dc.description.openaccessGold OA
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuN/A
dc.description.versionPublished Version
dc.description.volume13
dc.identifier.doi10.3390/healthcare13141756
dc.identifier.eissn2227-9032
dc.identifier.embargoNo
dc.identifier.filenameinventorynoIR06367
dc.identifier.issue14
dc.identifier.quartileQ2
dc.identifier.scopus2-s2.0-105011646514
dc.identifier.urihttps://doi.org/10.3390/healthcare13141756
dc.identifier.urihttps://hdl.handle.net/20.500.14288/30105
dc.identifier.wos001536667600001
dc.keywordsArtificial intelligence
dc.keywordsBreastfeeding
dc.keywordsChatbot
dc.keywordsClinical accuracy
dc.keywordsPatient education
dc.keywordsLactation support
dc.keywordsLarge language models
dc.language.isoeng
dc.publisherMDPI
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofHealthcare
dc.relation.openaccessYes
dc.rightsCC BY (Attribution)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectHealth care sciences and services
dc.subjectHealth policy services
dc.titleBenchmarking AI chatbots for maternal lactation support: a cross-platform evaluation of quality, readability, and clinical accuracy
dc.typeJournal Article
dspace.entity.typePublication
relation.isOrgUnitOfPublicationf91d21f0-6b13-46ce-939a-db68e4c8d2ab
relation.isOrgUnitOfPublication.latestForDiscoveryf91d21f0-6b13-46ce-939a-db68e4c8d2ab
relation.isParentOrgUnitOfPublication055775c9-9efe-43ec-814f-f6d771fa6dee
relation.isParentOrgUnitOfPublication.latestForDiscovery055775c9-9efe-43ec-814f-f6d771fa6dee

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
IR06367.pdf
Size:
386.28 KB
Format:
Adobe Portable Document Format