Self-supervised object-centric learning for videos

Publication:
Self-supervised object-centric learning for videos

dc.contributor.coauthor	Xie, Weidi
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Aydemir, Görkay
dc.contributor.kuauthor	Güney, Fatma
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-12-29T09:41:32Z
dc.date.issued	2023
dc.description.abstract	Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the segmentation in video sequences. However, the performance improvements observed in synthetic sequences, which rely on the robustness of an additional cue, do not translate to more challenging real-world scenarios. In this paper, we propose the first fully unsupervised method for segmenting multiple objects in real-world sequences. Our object-centric learning framework spatially binds objects to slots on each frame and then relates these slots across frames. From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens in the feature space for efficiency and regularization. Additionally, we address over-clustering by merging slots based on similarity. Our method can successfully segment multiple instances of complex and high-variety classes in YouTube videos.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Weidi Xie would like to acknowledge the National Key R&D Program of China (No. 2022ZD0161400).
dc.description.studentonlypublication	No
dc.description.studentpublication	Yes
dc.identifier.WoSQuartile	N/A
dc.identifier.issn	1049-5258
dc.identifier.scopus	2-s2.0-85180812822
dc.identifier.uri	https://hdl.handle.net/20.500.14288/23679
dc.identifier.wos	1230083400031
dc.keywords	Computer science
dc.language.iso	eng
dc.publisher	Neural information processing systems foundation
dc.relation.ispartof	Advances in Neural Information Processing Systems
dc.subject	Computer science, artificial intelligence
dc.subject	Computer science, information systems
dc.title	Self-supervised object-centric learning for videos
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Aydemir, Görkay
local.contributor.kuauthor	Güney, Fatma
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: IR04737.pdf
Size:: 5.35 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Self-supervised object-centric learning for videos

Files

Original bundle

Collections

Publication:
Self-supervised object-centric learning for videos