Publications with Fulltext
Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6
Browse
8 results
Search Results
Publication Open Access End to end rate distortion optimized learned hierarchical bi-directional video compression(Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Yılmaz, Mustafa Akın; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207; N/AConventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/.Publication Open Access Emotion dependent domain adaptation for speech driven affective facial feature synthesis(Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Electrical and Electronics Engineering; Erzin, Engin; Sadiq, Rizwan; Faculty Member; Department of Electrical and Electronics Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; 34503; N/AAlthough speech driven facial animation has been studied extensively in the literature, works focusing on the affective content of the speech are limited. This is mostly due to the scarcity of affective audio-visual data. In this article, we improve the affective facial animation using domain adaptation by partially reducing the data scarcity. We first define a domain adaptation to map affective and neutral speech representations to a common latent space in which cross-domain bias is smaller. Then the domain adaptation is used to augment affective representations for each emotion category, including angry, disgust, fear, happy, sad, surprise, and neutral, so that we can better train emotion-dependent deep audio-to-visual (A2V) mapping models. Based on the emotion-dependent deep A2V models, the proposed affective facial synthesis system is realized in two stages: first, speech emotion recognition extracts soft emotion category likelihoods for the utterances; then a soft fusion of the emotion-dependent A2V mapping outputs form the affective facial synthesis. Experimental evaluations are performed on the SAVEE audio-visual dataset. The proposed models are assessed with objective and subjective evaluations. The proposed affective A2V system achieves significant MSE loss improvements in comparison to the recent literature. Furthermore, the resulting facial animations of the proposed system are preferred over the baseline animations in the subjective evaluations.Publication Open Access Federated dropout learning for hybrid beamforming with spatial path index modulation in multi-user MMWave-MIMO systems(Institute of Electrical and Electronics Engineers (IEEE), 2021) Mishra, Kumar Vijay; Department of Electrical and Electronics Engineering; Ergen, Sinem Çöleri; Elbir, Ahmet Musab; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 7211; N/AMillimeter wave multiple-input multiple-output (mmWave-MIMO) systems with small number of radio-frequency (RF) chains have limited multiplexing gain. Spatial path index modulation (SPIM) is helpful in improving this gain by utilizing additional signal bits modulated by the indices of spatial paths. In this paper, we introduce model-based and model-free frameworks for beamformer design in multi-user SPIM-MIMO systems. We first design the beamformers via model-based manifold optimization algorithm. Then, we leverage federated learning (FL) with dropout learning (DL) to train a learning model on the local dataset of users, who estimate the beamformers by feeding the model with their channel data. The DL randomly selects different set of model parameters during training, thereby further reducing the transmission overhead compared to conventional FL. Numerical experiments show that the proposed framework exhibits higher spectral efficiency than the state-of-the-art SPIM-MIMO methods and mmWave-MIMO, which relies on the strongest propagation path. Furthermore, the proposed FL approach provides at least 10 times lower transmission overhead than the centralized learning techniques.Publication Open Access Tri-op redactable blockchains with block modification, removal, and insertion(TÜBİTAK, 2022) Dousti, Mohammad Sadeq; Department of Computer Engineering; Küpçü, Alptekin; Faculty Member; Department of Computer Engineering; College of Engineering; 168060In distributed computations and cryptography, it is desirable to record events on a public ledger, such that later alterations are computationally infeasible. An implementation of this idea is called blockchain, which is a distributed protocol that allows the creation of an immutable ledger. While such an idea is very appealing, the ledger may be contaminated with incorrect, illegal, or even dangerous data, and everyone running the blockchain protocol has no option but to store and propagate the unwanted data. The ledger is bloated over time, and it is not possible to remove redundant information. Finally, missing data cannot be inserted later. Redactable blockchains were invented to allow the ledger to be mutated in a controlled manner. To date, redactable blockchains support at most two types of redactions: block modification and removal. The next logical step is to support block insertions. However, we show that this seemingly innocuous enhancement renders all previous constructs insecure. We put forward a model for blockchains supporting all three redaction operations and construct a blockchain that is provably secure under this formal definition.Publication Open Access AfriKI: machine-in-the-loop Afrikaans poetry generation(Association for Computational Linguistics (ACL), 2021) Baş, Anıl; Department of Comparative Literature; van Heerden, Imke; Other; Department of Comparative Literature; College of Social Sciences and Humanities; 318142This paper proposes a generative language model called AfriKI. Our approach is based on an LSTM architecture trained on a small corpus of contemporary fiction. With the aim of promoting human creativity, we use the model as an authoring tool to explore machine-in-the-loop Afrikaans poetry generation. To our knowledge, this is the first study to attempt creative text generation in Afrikaans.Publication Open Access Training socially engaging robots: modeling backchannel behaviors with batch reinforcement learning(Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Computer Engineering; Department of Electrical and Electronics Engineering; Hussain, Nusrah; Erzin, Engin; Sezgin, Tevfik Metin; Yemez, Yücel; PhD Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; Graduate School of Sciences and Engineering; N/A; 34503; 18632; 107907A key aspect of social human-robot interaction is natural non-verbal communication. In this work, we train an agent with batch reinforcement learning to generate nods and smiles as backchannels in order to increase the naturalness of the interaction and to engage humans. We introduce the Sequential Random Deep Q-Network (SRDQN) method to learn a policy for backchannel generation, that explicitly maximizes user engagement. The proposed SRDQN method outperforms the existing vanilla Q-learning methods when evaluated using off-policy policy evaluation techniques. Furthermore, to verify the effectiveness of SRDQN, a human-robot experiment has been designed and conducted with an expressive 3d robot head. The experiment is based on a story-shaping game designed to create an interactive social activity with the robot. The engagement of the participants during the interaction is computed from user's social signals like backchannels, mutual gaze and adjacency pair. The subjective feedback from participants and the engagement values strongly indicate that our framework is a step forward towards the autonomous learning of a socially acceptable backchanneling behavior.Publication Open Access BlockSim-Net: a network-based blockchain simulator(TÜBİTAK, 2022) Ramachandran, Prashanthi; Agrawal, Nandini; Department of Computer Engineering; Biçer, Osman; Küpçü, Alptekin; Faculty Member; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; N/A; 168060Since its proposal by Eyal and Sirer (CACM '13), selfish mining attacks on proof-of-work blockchains have been studied extensively. The main body of this research aims at both studying the extent of its impact and defending against it. Yet, before any practical defense is deployed in a real world blockchain system, it needs to be tested for security and dependability. However, real blockchain systems are too complex to conduct any test on or benchmark the developed protocols. Instead, some simulation environments have been proposed recently, such as BlockSim (Maher et al., SIGMETRICS Perform. Eval. Rev. '19), which is a modular and easy-to-use blockchain simulator. However, BlockSim's structure is insufficient to capture the essence of a real blockchain network, as the simulation of an entire network happens over a single CPU. Such a lack of decentralization can cause network issues such as propagation delays being simulated in an unrealistic manner. In this work, we propose BlockSim-Net, a modular, efficient, high performance, distributed, network-based blockchain simulator that is parallelized to better reflect reality in a blockchain simulation environment.Publication Open Access MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish(Springer, 2021) Çıtamak, Begüm; Çağlayan, Ozan; Kuyu, Menekşe; Erdem, Erkut; Madhyastha, Pranava; Specia, Lucia; Department of Computer Engineering; Erdem, Aykut; Faculty Member; Department of Computer Engineering; College of Engineering; 20331Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.