Publication: MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish
dc.contributor.coauthor | Çıtamak, Begüm | |
dc.contributor.coauthor | Çağlayan, Ozan | |
dc.contributor.coauthor | Kuyu, Menekşe | |
dc.contributor.coauthor | Erdem, Erkut | |
dc.contributor.coauthor | Madhyastha, Pranava | |
dc.contributor.coauthor | Specia, Lucia | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Erdem, Aykut | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.other | Department of Computer Engineering | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.yokid | 20331 | |
dc.date.accessioned | 2024-11-09T11:22:34Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages. | |
dc.description.fulltext | YES | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.issue | 2 | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | TÜBİTAK | |
dc.description.sponsoredbyTubitakEu | EU | |
dc.description.sponsorship | Scientific and Technological Research Council of Turkey (TÜBİTAK) | |
dc.description.sponsorship | British Council | |
dc.description.sponsorship | Newton Fund Institutional Links Grant Programme | |
dc.description.sponsorship | MMVC Project | |
dc.description.sponsorship | European Union (EU) | |
dc.description.sponsorship | Horizon 2020 | |
dc.description.sponsorship | ERC Starting Grant | |
dc.description.sponsorship | MultiMT | |
dc.description.sponsorship | Turkish Academy of Sciences GEBIP 2018 Award | |
dc.description.sponsorship | Science Academy BAGEP 2021 Award | |
dc.description.version | Author's final manuscript | |
dc.description.volume | 35 | |
dc.format | ||
dc.identifier.doi | 10.1007/s10590-021-09276-y | |
dc.identifier.eissn | 1573-0573 | |
dc.identifier.embargo | NO | |
dc.identifier.filenameinventoryno | IR03088 | |
dc.identifier.issn | 0922-6567 | |
dc.identifier.link | https://doi.org/10.1007/s10590-021-09276-y | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85111946889 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/26 | |
dc.identifier.wos | 668842800001 | |
dc.keywords | Video description dataset | |
dc.keywords | Turkish | |
dc.keywords | Video captioning | |
dc.keywords | Video understanding | |
dc.keywords | Neural machine translation | |
dc.keywords | Multimodal machine translation | |
dc.language | English | |
dc.publisher | Springer | |
dc.relation.grantno | 219E054 | |
dc.relation.grantno | 352343575 | |
dc.relation.grantno | 678017 | |
dc.relation.uri | http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9746 | |
dc.source | Machine Translation | |
dc.subject | Computer science | |
dc.subject | Artificial intelligence | |
dc.title | MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish | |
dc.type | Journal Article | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-6280-8422 | |
local.contributor.kuauthor | Erdem, Aykut | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae |
Files
Original bundle
1 - 1 of 1