Publication:
CLIP-guided StyleGAN inversion for text-driven real image editing

dc.contributor.coauthorCeylan, Duygu
dc.contributor.coauthorErdem, Erkut
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorErdem, Aykut
dc.contributor.kuauthorYüret, Deniz
dc.contributor.kuauthorBaykal, Ahmet Canberk
dc.contributor.kuauthorAnees, Abdul Basit
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2025-01-19T10:29:09Z
dc.date.issued2023
dc.description.abstractResearchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the Contrastive Language-Image Pre-training (CLIP) embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.issue5
dc.description.openaccessBronze, Green Submitted
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipThis work has been partially supported by AI Fellowships to A. C. Baykal and A. Basit Anees provided by the KUIS AI Center, by BAGEP 2021 Award of the Science Academy to A. Erdem, and by an Adobe research gift.
dc.description.volume42
dc.identifier.doi10.1145/3610287
dc.identifier.eissn1557-7368
dc.identifier.issn0730-0301
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-85174729821
dc.identifier.urihttps://doi.org/10.1145/3610287
dc.identifier.urihttps://hdl.handle.net/20.500.14288/25840
dc.identifier.wos1086833300011
dc.keywordsGenerative adversarial networks
dc.keywordsImage-to-image translation
dc.keywordsImage editing
dc.language.isoeng
dc.publisherAssociation for Computing Machinery
dc.relation.grantnoAI Fellowships; KUIS AI Center; BAGEP 2021 Award of the Science Academy
dc.relation.ispartofAcm Transactions on Graphics
dc.subjectComputer science
dc.subjectSoftware engineering
dc.titleCLIP-guided StyleGAN inversion for text-driven real image editing
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.kuauthorBaykal, Ahmet Canberk
local.contributor.kuauthorAnees, Abdul Basit
local.contributor.kuauthorErdem, Aykut
local.contributor.kuauthorYüret, Deniz
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1College of Engineering
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
IR05144.pdf
Size:
30.15 MB
Format:
Adobe Portable Document Format