Deep learning for image/video restoration and super-resolution

Publication:
Deep learning for image/video restoration and super-resolution

dc.contributor.department	Department of Electrical and Electronics Engineering
dc.contributor.kuauthor	Faculty Member, Tekalp, Ahmet Murat
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.date.accessioned	2024-11-09T13:12:37Z
dc.date.issued	2022
dc.description.abstract	Recent advances in neural signal processing led to significant improvements in the performance of learned image/video restoration and super-resolution (SR). An important benefit of data-driven deep learning approaches to image processing is that neural models can be optimized for any differentiable loss function, including perceptual loss functions, leading to perceptual image/video restoration and SR, which cannot be easily handled by traditional model-based methods. We start with a brief problem statement and a short discussion on traditional vs. data-driven solutions. We next review recent advances in neural architectures, such as residual blocks, dense connections, residual-in-residual dense blocks, residual blocks with generative neurons, self-attention and visual transformers. We then discuss loss functions and evaluation (assessment) criteria for image/video restoration and SR, including fidelity (distortion) and perceptual criteria, and the relation between them, where we briefly review the perception vs. distortion trade-off. We can consider learned image/video restoration and SR as learning either a nonlinear regressive mapping from degraded to ideal images based on the universal approximation theorem, or a generative model that captures the probability distribution of ideal images. We first review regressive inference via residual and/or dense convolutional networks (ConvNet). We also show that using a new architecture with residual blocks based on a generative neuron model can outperform classical residual ConvNets in peak-signal-to-noise ratio (PSNR). We next discuss generative inference based on adversarial training, such as SRGAN and ESRGAN, which can reproduce realistic textures, or based on normalizing flow such as SRFlow by optimizing log-likelihood. We then discuss problems in applying supervised training to real-life restoration and SR, including overfitting image priors and overfitting the degradation model seen in the training set. We introduce multiple-model SR and real-world SR (from unpaired training data) formulations to overcome these problems. Integration of traditional model-based methods and deep learning for non-blind restoration/SR is introduced as another solution to model overfitting in supervised learning. In learned video restoration and SR (VSR), we first discuss how to best exploit temporal correlations in video, including sliding temporal window vs. recurrent architectures for propagation, and aligning frames in the pixel domain using optical flow vs. in the feature space using deformable convolutions. We next introduce early fusion with feature-space alignment, employed by the EDVR network, which obtains excellent PSNR performance. However, it is well-known that videos with the highest PSNR may not be the most appealing to humans, since minimizing the mean-square error may result in blurring of details. We then address perceptual optimization of VSR models to obtain natural texture and motion. Although perception-distortion tradeoff has been well studied for images, few works address perceptual VSR. In addition to using perceptual losses, such as MS-SSIM, LPIPS, and/or adversarial training, we also discuss explicit loss functions/criteria to enforce and evaluate temporal consistency. We conclude with a discussion of open problems.
dc.description.fulltext	YES
dc.description.indexedby	Scopus
dc.description.issue	1
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	TÜBİTAK
dc.description.sponsorship	Scientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.sponsorship	TUBITAK 2247-A
dc.description.version	Author's final manuscript
dc.description.volume	13
dc.identifier.doi	10.1561/0600000100
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR03706
dc.identifier.issn	1572-2740
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-85130244443
dc.identifier.uri	https://doi.org/10.1561/0600000100
dc.keywords	Backpropagation
dc.keywords	Convolution
dc.keywords	Convolutional neural networks
dc.keywords	Economic and social effects
dc.keywords	Image reconstruction
dc.language.iso	eng
dc.publisher	Now Publisher
dc.relation.grantno	120C156
dc.relation.ispartof	Foundations and Trends in Computer Graphics and Vision
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10566
dc.subject	Engineering
dc.title	Deep learning for image/video restoration and super-resolution
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Tekalp, Ahmet Murat
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Electrical and Electronics Engineering
relation.isOrgUnitOfPublication	21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery	21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10566.pdf
Size:: 1.79 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Deep learning for image/video restoration and super-resolution

Files

Original bundle

Collections

Publication:
Deep learning for image/video restoration and super-resolution