Publication:
Leveraging frequency based salient spatial sound localization to improve 360 degrees video saliency prediction

dc.contributor.coauthorÇökelek, Mert
dc.contributor.coauthorİmamoğlu, Nevrez
dc.contributor.coauthorÖzçınar, Çağrı
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorErdem, Aykut
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokid20331
dc.date.accessioned2024-11-09T13:48:23Z
dc.date.issued2021
dc.description.abstractVirtual and augmented reality (VR/AR) systems dramatically gained in popularity with various application areas such as gaming, social media, and communication. It is therefore a crucial task to have the knowhow to efficiently utilize, store or deliver 360° videos for end-users. Towards this aim, researchers have been developing deep neural network models for 360° multimedia processing and computer vision fields. In this line of work, an important research direction is to build models that can learn and predict the observers' attention on 360° videos to obtain so-called saliency maps computationally. Although there are a few saliency models proposed for this purpose, these models generally consider only visual cues in video frames by neglecting audio cues from sound sources. In this study, an unsupervised frequency-based saliency model is presented for predicting the strength and location of saliency in spatial audio. The prediction of salient audio cues is then used as audio bias on the video saliency predictions of state-of-the-art models. Our experiments yield promising results and show that integrating the proposed spatial audio bias into the existing video saliency models consistently improves their performance.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipScientific and Technological Research Council of Turkey (TÜBİTAK) 1001 Program Award
dc.description.sponsorshipTurkish Academy of Sciences GEBIP 2018 Award
dc.description.sponsorshipScience Academy BAGEP 2021 Award
dc.description.versionAuthor's final manuscript
dc.formatpdf
dc.identifier.doi10.23919/MVA51890.2021.9511406
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR03218
dc.identifier.isbn9.7849E+12
dc.identifier.linkhttps://doi.org/10.23919/MVA51890.2021.9511406
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85113999621
dc.identifier.urihttps://hdl.handle.net/20.500.14288/3818
dc.identifier.wos733621500061
dc.keywordsLocation awareness
dc.keywordsVisualization
dc.keywordsSocial networking (online)
dc.keywordsComputational modeling
dc.keywordsPredictive models
dc.keywordsStreaming media
dc.keywordsObservers
dc.languageEnglish
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.grantno1.79769313486232E+308
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/9980
dc.sourceProceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
dc.subjectComputer science
dc.subjectEngineering
dc.titleLeveraging frequency based salient spatial sound localization to improve 360 degrees video saliency prediction
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authorid0000-0002-6280-8422
local.contributor.kuauthorErdem, Aykut
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
9980.pdf
Size:
4.68 MB
Format:
Adobe Portable Document Format