Publication:
KU ai at MEDIQA 2019: domain-specific pre-training and transfer learning for medical NLI

dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentN/A
dc.contributor.departmentN/A
dc.contributor.kuauthorYüret, Deniz
dc.contributor.kuauthorSert, Ulaş
dc.contributor.kuauthorCengiz, Cemil
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileMaster Student
dc.contributor.kuprofileMaster Student
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.date.accessioned2024-11-09T23:39:29Z
dc.date.issued2019
dc.description.abstractIn this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task (Ben Abacha et al., 2019). As KU ai team, we used BERT (Devlin et al., 2018) as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7% accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.identifier.doiN/A
dc.identifier.isbn9781-9507-3728-4
dc.identifier.linkhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85107900736&partnerID=40&md5=8dcdc859764782655c63af36da23bf04
dc.identifier.scopus2-s2.0-85107900736
dc.identifier.urihttps://hdl.handle.net/20.500.14288/13128
dc.keywordsComputational linguistics
dc.keywordsLarge dataset
dc.keywordsLearning systems
dc.keywordsNatural language processing systems
dc.keywordsStatistical tests
dc.keywordsBaseline models
dc.keywordsDe-identification
dc.keywordsDomain specific
dc.keywordsLanguage inference
dc.keywordsLanguage model
dc.keywordsLearning approach
dc.keywordsNatural languages
dc.keywordsPerformance
dc.keywordsPre-training
dc.keywordsTransfer learning
dc.keywordsProblem oriented languages
dc.languageEnglish
dc.publisherAssociation for Computational Linguistics (ACL)
dc.sourceBioNLP 2019 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 18th BioNLP Workshop and Shared Task
dc.subjectComputer Science
dc.titleKU ai at MEDIQA 2019: domain-specific pre-training and transfer learning for medical NLI
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.kuauthorYüret, Deniz
local.contributor.kuauthorSert, Ulaş
local.contributor.kuauthorCengiz, Cemil
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files