Publication:
Cross-lingual visual pre-training for multimodal machine translation

dc.contributor.coauthorCaglayan, Ozan
dc.contributor.coauthorKuyu, Menekse
dc.contributor.coauthorAmac, Mustafa Sercan
dc.contributor.coauthorMadhyastha, Pranava
dc.contributor.coauthorErdem, Aykut
dc.contributor.coauthorSpecia, Lucia
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.facultymemberYes
dc.contributor.kuauthorErdem, Aykut
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2024-11-09T22:50:20Z
dc.date.issued2021
dc.description.abstractPre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the translation language modelling (Lample and Conneau, 2019) with masked region classification and perform pre-training with three-way parallel vision & language corpora. We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance. We also provide qualitative insights into the usefulness of the learned grounded representations.
dc.description.fulltextNo
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.peerreviewstatusN/A
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuEU - TÜBİTAK
dc.description.sponsorshipThis work was supported in part by the TÜBA GEBİP fellowship awarded to Erkut Erdem; the MMVC project funded by TÜBİTAK [219E054, 352343575] and the British Council through the Newton Fund Institutional Links grant programme [219E054, 352343575]; the MultiMT project (H2020 ERC Starting Grant No. 678017); and the Air Force Office of Scientific Research [FA8655-20-1-7006]. Lucia Specia, Pranava Madhyastha, and Ozan Caglayan also received support from the MultiMT project, while Lucia Specia was additionally supported by the Air Force Office of Scientific Research.
dc.description.versionN/A
dc.identifier.embargoN/A
dc.identifier.endpage1324
dc.identifier.grantno678017
dc.identifier.grantno219E054
dc.identifier.isbn9781954085022
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85107296187
dc.identifier.startpage1317
dc.identifier.urihttps://hdl.handle.net/20.500.14288/6658
dc.identifier.wos000863557001034
dc.keywordsCross-lingual visual
dc.keywordsMultimodal machine translation
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartof16th Conference of The European Chapter of The Association For Computational Linguistics
dc.relation.openaccessN/A
dc.rightsN/A
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectLinguistics
dc.titleCross-lingual visual pre-training for multimodal machine translation
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorErdem, Aykut
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files