Publication:
MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

dc.contributor.coauthorÇıtamak, Begüm
dc.contributor.coauthorÇağlayan, Ozan
dc.contributor.coauthorKuyu, Menekşe
dc.contributor.coauthorErdem, Erkut
dc.contributor.coauthorMadhyastha, Pranava
dc.contributor.coauthorSpecia, Lucia
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorErdem, Aykut
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokid20331
dc.date.accessioned2024-11-09T11:22:34Z
dc.date.issued2021
dc.description.abstractAutomatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue2
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsoredbyTubitakEuEU
dc.description.sponsorshipScientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.sponsorshipBritish Council
dc.description.sponsorshipNewton Fund Institutional Links Grant Programme
dc.description.sponsorshipMMVC Project
dc.description.sponsorshipEuropean Union (EU)
dc.description.sponsorshipHorizon 2020
dc.description.sponsorshipERC Starting Grant
dc.description.sponsorshipMultiMT
dc.description.sponsorshipTurkish Academy of Sciences GEBIP 2018 Award
dc.description.sponsorshipScience Academy BAGEP 2021 Award
dc.description.versionAuthor's final manuscript
dc.description.volume35
dc.formatpdf
dc.identifier.doi10.1007/s10590-021-09276-y
dc.identifier.eissn1573-0573
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR03088
dc.identifier.issn0922-6567
dc.identifier.linkhttps://doi.org/10.1007/s10590-021-09276-y
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85111946889
dc.identifier.urihttps://hdl.handle.net/20.500.14288/26
dc.identifier.wos668842800001
dc.keywordsVideo description dataset
dc.keywordsTurkish
dc.keywordsVideo captioning
dc.keywordsVideo understanding
dc.keywordsNeural machine translation
dc.keywordsMultimodal machine translation
dc.languageEnglish
dc.publisherSpringer
dc.relation.grantno219E054
dc.relation.grantno352343575
dc.relation.grantno678017
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9746
dc.sourceMachine Translation
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.titleMSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-6280-8422
local.contributor.kuauthorErdem, Aykut
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
9746.pdf
Size:
3.24 MB
Format:
Adobe Portable Document Format