Publication:
Self-supervised object-centric learning for videos

dc.contributor.coauthorXie, Weidi
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorAydemir, Görkay
dc.contributor.kuauthorGüney, Fatma
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2024-12-29T09:41:32Z
dc.date.issued2023
dc.description.abstractUnsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the segmentation in video sequences. However, the performance improvements observed in synthetic sequences, which rely on the robustness of an additional cue, do not translate to more challenging real-world scenarios. In this paper, we propose the first fully unsupervised method for segmenting multiple objects in real-world sequences. Our object-centric learning framework spatially binds objects to slots on each frame and then relates these slots across frames. From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens in the feature space for efficiency and regularization. Additionally, we address over-clustering by merging slots based on similarity. Our method can successfully segment multiple instances of complex and high-variety classes in YouTube videos.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.publisherscopeInternational
dc.description.sponsorsWeidi Xie would like to acknowledge the National Key R&D Program of China (No. 2022ZD0161400).
dc.identifier.issn1049-5258
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85180812822
dc.identifier.urihttps://hdl.handle.net/20.500.14288/23679
dc.identifier.wos1230083400031
dc.keywordsComputer science
dc.languageen
dc.publisherNeural information processing systems foundation
dc.sourceAdvances in Neural Information Processing Systems
dc.subjectComputer science, artificial intelligence
dc.subjectComputer science, information systems
dc.titleSelf-supervised object-centric learning for videos
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.kuauthorAydemir, Görkay
local.contributor.kuauthorGüney, Fatma
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files