CLIP-guided StyleGAN inversion for text-driven real image editing

Publication:
CLIP-guided StyleGAN inversion for text-driven real image editing

dc.contributor.coauthor	Ceylan, Duygu
dc.contributor.coauthor	Erdem, Erkut
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Erdem, Aykut
dc.contributor.kuauthor	Yüret, Deniz
dc.contributor.kuauthor	Baykal, Ahmet Canberk
dc.contributor.kuauthor	Anees, Abdul Basit
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2025-01-19T10:29:09Z
dc.date.issued	2023
dc.description.abstract	Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the Contrastive Language-Image Pre-training (CLIP) embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.issue	5
dc.description.openaccess	Bronze, Green Submitted
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	This work has been partially supported by AI Fellowships to A. C. Baykal and A. Basit Anees provided by the KUIS AI Center, by BAGEP 2021 Award of the Science Academy to A. Erdem, and by an Adobe research gift.
dc.description.volume	42
dc.identifier.doi	10.1145/3610287
dc.identifier.eissn	1557-7368
dc.identifier.issn	0730-0301
dc.identifier.quartile	Q1
dc.identifier.scopus	2-s2.0-85174729821
dc.identifier.uri	https://doi.org/10.1145/3610287
dc.identifier.uri	https://hdl.handle.net/20.500.14288/25840
dc.identifier.wos	1086833300011
dc.keywords	Generative adversarial networks
dc.keywords	Image-to-image translation
dc.keywords	Image editing
dc.language.iso	eng
dc.publisher	Association for Computing Machinery
dc.relation.grantno	AI Fellowships; KUIS AI Center; BAGEP 2021 Award of the Science Academy
dc.relation.ispartof	Acm Transactions on Graphics
dc.subject	Computer science
dc.subject	Software engineering
dc.title	CLIP-guided StyleGAN inversion for text-driven real image editing
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Baykal, Ahmet Canberk
local.contributor.kuauthor	Anees, Abdul Basit
local.contributor.kuauthor	Erdem, Aykut
local.contributor.kuauthor	Yüret, Deniz
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: IR05144.pdf
Size:: 30.15 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: CLIP-guided StyleGAN inversion for text-driven real image editing

Files

Original bundle

Collections

Publication:
CLIP-guided StyleGAN inversion for text-driven real image editing