MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

Publication:
MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

Files

9746.pdf (3.24 MB)

Departments

Organizational Unit

Department of Computer Engineering

School / College / Institute

Organizational Unit

College of Engineering

KU-Authors

Erdem, Aykut

Co-Authors

Çıtamak, Begüm

Çağlayan, Ozan

Kuyu, Menekşe

Erdem, Erkut

Madhyastha, Pranava

Specia, Lucia

Date

2021

Type

Journal Article

Embargo Status

NO

Abstract

Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.

Publisher

Springer

Subject

Computer science, Artificial intelligence

Source

Machine Translation

DOI

10.1007/s10590-021-09276-y

URI

https://doi.org/10.1007/s10590-021-09276-y

Collections

Publications with Fulltext

Full item page

3

Views

12

Downloads

View PlumX Details

Publication: MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish

Files

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Editor & Affiliation

Compiler & Affiliation

Translator

Other Contributor

Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

3

Views

12

Downloads

Publication:
MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish