Publication: Leveraging frequency based salient spatial sound localization to improve 360 degrees video saliency prediction
dc.contributor.coauthor | Çökelek, Mert | |
dc.contributor.coauthor | İmamoğlu, Nevrez | |
dc.contributor.coauthor | Özçınar, Çağrı | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Erdem, Aykut | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.other | Department of Computer Engineering | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.yokid | 20331 | |
dc.date.accessioned | 2024-11-09T13:48:23Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Virtual and augmented reality (VR/AR) systems dramatically gained in popularity with various application areas such as gaming, social media, and communication. It is therefore a crucial task to have the knowhow to efficiently utilize, store or deliver 360° videos for end-users. Towards this aim, researchers have been developing deep neural network models for 360° multimedia processing and computer vision fields. In this line of work, an important research direction is to build models that can learn and predict the observers' attention on 360° videos to obtain so-called saliency maps computationally. Although there are a few saliency models proposed for this purpose, these models generally consider only visual cues in video frames by neglecting audio cues from sound sources. In this study, an unsupervised frequency-based saliency model is presented for predicting the strength and location of saliency in spatial audio. The prediction of salient audio cues is then used as audio bias on the video saliency predictions of state-of-the-art models. Our experiments yield promising results and show that integrating the proposed spatial audio bias into the existing video saliency models consistently improves their performance. | |
dc.description.fulltext | YES | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | TÜBİTAK | |
dc.description.sponsorship | Scientific and Technological Research Council of Turkey (TÜBİTAK) 1001 Program Award | |
dc.description.sponsorship | Turkish Academy of Sciences GEBIP 2018 Award | |
dc.description.sponsorship | Science Academy BAGEP 2021 Award | |
dc.description.version | Author's final manuscript | |
dc.format | ||
dc.identifier.doi | 10.23919/MVA51890.2021.9511406 | |
dc.identifier.embargo | NO | |
dc.identifier.filenameinventoryno | IR03218 | |
dc.identifier.isbn | 9.7849E+12 | |
dc.identifier.link | https://doi.org/10.23919/MVA51890.2021.9511406 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85113999621 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/3818 | |
dc.identifier.wos | 733621500061 | |
dc.keywords | Location awareness | |
dc.keywords | Visualization | |
dc.keywords | Social networking (online) | |
dc.keywords | Computational modeling | |
dc.keywords | Predictive models | |
dc.keywords | Streaming media | |
dc.keywords | Observers | |
dc.language | English | |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | |
dc.relation.grantno | 1.79769313486232E+308 | |
dc.relation.uri | http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9980 | |
dc.source | Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications | |
dc.subject | Computer science | |
dc.subject | Engineering | |
dc.title | Leveraging frequency based salient spatial sound localization to improve 360 degrees video saliency prediction | |
dc.type | Conference proceeding | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-6280-8422 | |
local.contributor.kuauthor | Erdem, Aykut | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae |
Files
Original bundle
1 - 1 of 1