MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

Publication:
MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

dc.contributor.coauthor	Çıtamak, Begüm
dc.contributor.coauthor	Çağlayan, Ozan
dc.contributor.coauthor	Kuyu, Menekşe
dc.contributor.coauthor	Erdem, Erkut
dc.contributor.coauthor	Madhyastha, Pranava
dc.contributor.coauthor	Specia, Lucia
dc.contributor.department	Department of Computer Engineering
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Erdem, Aykut
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.date.accessioned	2024-11-09T11:22:34Z
dc.date.issued	2021
dc.description.abstract	Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.
dc.description.fulltext	YES
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.issue	2
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	EU - TÜBİTAK
dc.description.sponsorship	Scientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.sponsorship	British Council
dc.description.sponsorship	Newton Fund Institutional Links Grant Programme
dc.description.sponsorship	MMVC Project
dc.description.sponsorship	European Union (EU)
dc.description.sponsorship	Horizon 2020
dc.description.sponsorship	ERC Starting Grant
dc.description.sponsorship	MultiMT
dc.description.sponsorship	Turkish Academy of Sciences GEBIP 2018 Award
dc.description.sponsorship	Science Academy BAGEP 2021 Award
dc.description.studentonlypublication	No
dc.description.studentpublication	No
dc.description.version	Author's final manuscript
dc.description.volume	35
dc.identifier.WoSQuartile	N/A
dc.identifier.doi	10.1007/s10590-021-09276-y
dc.identifier.eissn	1573-0573
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR03088
dc.identifier.issn	0922-6567
dc.identifier.scopus	2-s2.0-85111946889
dc.identifier.uri	https://doi.org/10.1007/s10590-021-09276-y
dc.identifier.wos	668842800001
dc.keywords	Video description dataset
dc.keywords	Turkish
dc.keywords	Video captioning
dc.keywords	Video understanding
dc.keywords	Neural machine translation
dc.keywords	Multimodal machine translation
dc.language.iso	eng
dc.publisher	Springer
dc.relation.grantno	219E054
dc.relation.grantno	352343575
dc.relation.grantno	678017
dc.relation.ispartof	Machine Translation
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9746
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.title	MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Erdem, Aykut
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 9746.pdf
Size:: 3.24 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

Files

Original bundle

Collections

Publication:
MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish