Publication:
Cross-lingual visual pre-training for multimodal machine translation

dc.contributor.coauthorÇağlayan, O.
dc.contributor.coauthorKuyu, M.
dc.contributor.coauthorAmaç, M. S.
dc.contributor.coauthorMadhyastha, P.
dc.contributor.coauthorErdem, E.
dc.contributor.coauthorSpecia, L.
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorErdem, Aykut
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2024-11-09T12:01:17Z
dc.date.issued2021
dc.description.abstractPre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the translation language modelling (Lample and Conneau, 2019) with masked region classification and perform pre-training with three-way parallel vision & language corpora. We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance. We also provide qualitative insights into the usefulness of the learned grounded representations.
dc.description.fulltextYES
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuEU - TÜBİTAK
dc.description.sponsorshipMMVC Project
dc.description.sponsorshipScientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.sponsorshipBritish Council Newton Fund Institutional Links Grant Programme
dc.description.sponsorshipEuropean Union (EU)
dc.description.sponsorshipHorizon 2020
dc.description.sponsorshipMultiMT Project
dc.description.sponsorshipERC Starting Grant
dc.description.sponsorshipAir Force Office of Scientific Research
dc.description.sponsorshipTUBA GEBIP Fellowship
dc.description.versionPublisher version
dc.identifier.doi10.18653/v1/2021.eacl-main.112
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR02976
dc.identifier.isbn978-195408502-2
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85107296187
dc.identifier.urihttps://doi.org/10.18653/v1/2021.eacl-main.112
dc.keywordsCross-lingual
dc.keywordsImprove performance
dc.keywordsLanguage model
dc.keywordsMachine translations
dc.keywordsNatural languages
dc.keywordsParallel vision
dc.keywordsRegion classifications
dc.keywordsState-of-the-art performance
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics (ACL)
dc.relation.grantno219E054
dc.relation.grantno352343575
dc.relation.grantno678017
dc.relation.grantnoFA8655-20-1-7006
dc.relation.ispartofEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9624
dc.subjectVisual languages
dc.titleCross-lingual visual pre-training for multimodal machine translation
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorErdem, Aykut
local.publication.orgunit1College of Engineering
local.publication.orgunit2Department of Computer Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
9624.pdf
Size:
406.27 KB
Format:
Adobe Portable Document Format