Modulating bottom-up and top-down visual processing via language-conditional filters

Publication:
Modulating bottom-up and top-down visual processing via language-conditional filters

dc.contributor.coauthor	Erdem, Erkut
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.department	KUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.kuauthor	Can, Ozan Arkan
dc.contributor.kuauthor	Erdem, Aykut
dc.contributor.kuauthor	Kesen, İlker
dc.contributor.kuauthor	Yüret, Deniz
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.contributor.schoolcollegeinstitute	Research Center
dc.date.accessioned	2024-11-09T23:28:32Z
dc.date.issued	2022
dc.description.abstract	How to best integrate linguistic and perceptual processing in multi-modal tasks that involve language and vision is an important open problem. In this work, we argue that the common practice of using language in a top-down manner, to direct visual attention over high-level visual features, may not be optimal. We hypothesize that the use of language to also condition the bottom-up processing from pixels to high-level features can provide benefits to the overall performance. To support our claim, we propose a U-Net-based model and perform experiments on two language-vision dense-prediction tasks: referring expression segmentation and language-guided image colorization. We compare results where either one or both of the top-down and bottom-up visual branches are conditioned on language. Our experiments reveal that using language to control the filters for bottom-up visual processing in addition to top-down attention leads to better results on both tasks and achieves competitive performance. Our linguistic analysis suggests that bottom-up conditioning improves segmentation of objects especially when input text refers to low-level visual concepts. Code is available at https://github.com/ilkerkesen/bvpr.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Turkish Academy of Sciences This work was supported in part by an AI Fellowship to I. Kesen provided by the KUIS AI Center, GEBIP 2018 Award of the Turkish Academy of Sciences to E. Erdem, and BAGEP 2021 Award of the Science Academy to A. Erdem.
dc.identifier.doi	10.1109/CVPRW56347.2022.00507
dc.identifier.isbn	978-1-6654-8739-9
dc.identifier.scopus	2-s2.0-85137780572
dc.identifier.uri	https://doi.org/10.1109/CVPRW56347.2022.00507
dc.identifier.uri	https://hdl.handle.net/20.500.14288/11902
dc.identifier.wos	861612704072
dc.keywords	Words
dc.keywords	Vision, Attention
dc.keywords	Object
dc.language.iso	eng
dc.publisher	Ieee
dc.relation.ispartof	2022 Ieee/Cvf Conference On Computer Vision And Pattern Recognition Workshops (Cvprw 2022)
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.title	Modulating bottom-up and top-down visual processing via language-conditional filters
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Kesen, İlker
local.contributor.kuauthor	Can, Ozan Arkan
local.contributor.kuauthor	Erdem, Aykut
local.contributor.kuauthor	Yüret, Deniz
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit1	Research Center
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	KUIS AI (Koç University & İş Bank Artificial Intelligence Center)
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication	77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication	d437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Modulating bottom-up and top-down visual processing via language-conditional filters

Files

Collections

Publication:
Modulating bottom-up and top-down visual processing via language-conditional filters