Publications without Fulltext
Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/3
Browse
74 results
Search Results
Publication Metadata only Multi-scale deformable alignment and content-adaptive inference for flexible-rate bi-directional video compression(IEEE Computer Society, 2023) Department of Electrical and Electronics Engineering; Yılmaz, Mustafa Akın; Ulaş, Ökkeş Uğur; Tekalp, Ahmet Murat; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of EngineeringThe lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding1.Publication Metadata only Correlative information maximization: a biologically plausible approach to supervised deep neural networks without weight symmetry(Neural Information Processing Systems 36, 2023) Pehlevan, Cengiz; Department of Electrical and Electronics Engineering; Bozkurt, Barışcan; Erdoğan, Alper Tunga; Department of Electrical and Electronics Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); Graduate School of Sciences and Engineering; College of EngineeringThe backpropagation algorithm has experienced remarkable success in training large-scale artificial neural networks;however, its biological plausibility has been strongly criticized, and it remains an open question whether the brain employs supervised learning mechanisms akin to it. Here, we propose correlative information maximization between layer activations as an alternative normative approach to describe the signal propagation in biological neural networks in both forward and backward directions. This new framework addresses many concerns about the biological-plausibility of conventional artificial neural networks and the backpropagation algorithm. The coordinate descent-based optimization of the corresponding objective, combined with the mean square error loss function for fitting labeled supervision data, gives rise to a neural network structure that emulates a more biologically realistic network of multi-compartment pyramidal neurons with dendritic processing and lateral inhibitory neurons. Furthermore, our approach provides a natural resolution to the weight symmetry problem between forward and backward signal propagation paths, a significant critique against the plausibility of the conventional backpropagation algorithm. This is achieved by leveraging two alternative, yet equivalent forms of the correlative mutual information objective. These alternatives intrinsically lead to forward and backward prediction networks without weight symmetry issues, providing a compelling solution to this long-standing challenge.Publication Metadata only Investigating the effect of body composition differences on seismocardiogram characteristics(IEEE Computer Soc, 2023) Tokmak, Fadime; Department of Electrical and Electronics Engineering; Gürsoy, Beren Semiz; Department of Electrical and Electronics Engineering; College of EngineeringIn seismocardiogram (SCG) analysis, inter-subject variability is observed as the medium between the heart and accelerometer consists of different tissues made of bone, muscle, fat and skin cells of which combination varies across different people. Anatomically, a similar pattern is present in the speech production system, where the vocal cord and vocal tract are considered as the source and medium, respectively. For observing the change of the vocal tract filter while voicing different sounds, linear predictive analysis has been used for years. Thus, it was hypothesized that the medium characteristics of the human thorax would also have a filtering effect on the SCG signals and the differences in the filtering effects would be observed in the respiration (<1 Hz), vibration (1-20 Hz) and acoustic (>20 Hz) characteristics of the SCG signals. To that aim, three different binary classification tasks representing the body composition differences were defined: (i) whether the metabolic age of the subject is more than the real age of the subject, (ii) whether the BMI of the subject is bigger than 25, and (iii) whether the subject is male or female. To understand the metabolism-induced changes in the respiration, vibration and acoustic components, classification experiments were conducted using different frequency bands of the SCG signal. In each case, linear predictive coefficients were extracted and used to train individual classification models for the aforementioned scenarios. With the vibration components (120 Hz), all of the tasks resulted in high performance (0.86, 0.93, 0.93) for age, BMI and gender classification tasks, respectively. This study reveals that the vibration components of SCG make a stable and informative contribution to selected classification tasks, and due to its high generalizability, it is suitable for various practical applications.Publication Metadata only Predicting path loss distributions of a wireless communication system for multiple base station altitudes from satellite images(IEEE, 2022) Güntürk, Bahadır K.; Ateş, Hasan F.; Baykaş, Tunçer; Department of Electrical and Electronics Engineering; Shoer, İbrahim; Department of Electrical and Electronics Engineering; College of EngineeringIt is expected that unmanned aerial vehicles (UAVs) will play a vital role in future communication systems. Optimum positioning of UAVs, serving as base stations, can be done through extensive field measurements or ray tracing simulations when the 3D model of the region of interest is available. In this paper, we present an alternative approach to optimize UAV base station altitude for a region. The approach is based on deep learning;specifically, a 2D satellite image of the target region is input to a deep neural network to predict path loss distributions for different UAV altitudes. The neural network is designed and trained to produce multiple path loss distributions in a single inference;thus, it is not necessary to train a separate network for each altitude.Publication Metadata only Performance measures for video object segmentation and tracking(IEEE-Inst Electrical Electronics Engineers Inc, 2004) Erdem, Çiğdem Eroğlu; Sankur, Bülent; Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207We propose measures to evaluate quantitatively the performance of video object segmentation and tracking methods without ground-truth (GT) segmentation maps. The proposed measures are based on spatial differences of color and motion along the boundary of the estimated video object plane and temporal differences between the color histogram of the current object plane and its predecessors. They can be used to localize (spatially and/or temporally) regions where segmentation results are good or bad; and/or they can be combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results over a sequence. The validity of the proposed performance measures without GT have been demonstrated by canonical correlation analysis with another set of measures with GT on a set of sequences (where GT information is available). Experimental results are presented to evaluate the segmentation maps obtained from various sequences using different segmentation approaches.Publication Metadata only An audio-driven dancing avatar(Springer, 2008) Balci, Koray; Kizoglu, Idil; Akarun, Lale; Canton-Ferrer, Cristian; Tilmanne, Joelle; Bozkurt, Elif; Erdem, A. Tanju; Department of Computer Engineering; N/A; N/A; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Yemez, Yücel; Ofli, Ferda; Demir, Yasemin; Erzin, Engin; Tekalp, Ahmet Murat; Faculty Member; PhD Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; 107907; N/A; N/A; 34503; 26207We present a framework for training and synthesis of an audio-driven dancing avatar. The avatar is trained for a given musical genre using the multicamera video recordings of a dance performance. The video is analyzed to capture the time-varying posture of the dancer's body whereas the musical audio signal is processed to extract the beat information. We consider two different marker-based schemes for the motion capture problem. The first scheme uses 3D joint positions to represent the body motion whereas the second uses joint angles. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In the synthesis phase, an audio signal of unknown musical type is first classified, within a time interval, into one of the genres that have been learnt in the analysis phase, based on mel frequency cepstral coefficients (MFCC). The motion parameters of the corresponding dance figures are then synthesized via the trained HMM structures in synchrony with the audio signal based on the estimated tempo information. Finally, the generated motion parameters, either the joint angles or the 3D joint positions of the body, are animated along with the musical audio using two different animation tools that we have developed. Experimental results demonstrate the effectiveness of the proposed framework.Publication Metadata only On the convergence of ICA algorithms with symmetric orthogonalization(IEEE, 2008) Department of Electrical and Electronics Engineering; Erdoğan, Alper Tunga; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 41624We study the convergence behavior of Independent Component Analysis (ICA) algorithms that are based on the contrast function maximization and that employ symmetric orthogonalization method to guarantee the orthogonality property of the search matrix. In particular, the characterization of the critical points of the corresponding optimization problem and the stationary points of the conventional gradient ascent and fixed point algorithms are obtained. As an interesting and a useful feature of the symmetrical orthogonalization method, we show that the use of symmetric orthogonalization enables the monotonic convergence for the fixed point ICA algorithms that are based on the convex contrast functions.Publication Metadata only Multicamera audio-visual analysis of dance figures(IEEE, 2007) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Ofli, Ferda; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; PhD Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207We present an automated system for multicamera motion capture and audio-visual analysis of dance figures. the multiview video of a dancing actor is acquired using 8 synchronized cameras. the motion capture technique is based on 3D tracking of the markers attached to the person's body in the scene, using stereo color information without need for an explicit 3D model. the resulting set of 3D points is then used to extract the body motion features as 3D displacement vectors whereas MFC coefficients serve as the audio features. in the first stage of multimodal analysis, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of the audio and body motion features, separately, to determine the recurrent elementary audio and body motion patterns. then in the second stage, we investigate the correlation of body motion patterns with audio patterns, that can be used for estimation and synthesis of realistic audio-driven body animation.Publication Metadata only Guest editorial special issue on toward securing Internet of Connected Vehicles (IoV) from virtual vehicle hijacking(Institute of Electrical and Electronics Engineers (IEEE), 2019) Cao, Yue; Kaiwartya, Omprakash; Song, Houbing; Lloret, Jaime; Ahmad, Naveed; Department of Electrical and Electronics Engineering; Ergen, Sinem Çöleri; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 7211N/APublication Metadata only Lossless watermarking for image authentication: a new framework and an implementation(IEEE-Inst Electrical Electronics Engineers Inc, 2006) Çelik, Mehmet Utku; Sharma, Gaurav; Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207We present a novel framework for lossless (invertible) authentication watermarking, which enables zero-distortion reconstruction of the un-watermarked images upon verification. As opposed to earlier. lossless authentication methods that required reconstruction of the original image prior to validation, the new framework allows validation of the watermarked images before recovery of the original image. This reduces computational requirements in situations when either the verification step fails or the zero-distortion reconstruction is not needed. For verified images, integrity of the reconstructed image is ensured by the uniqueness of the reconstruction procedure. The framework also enables public(-key) authentication without granting access to the perfect original and allows for efficient tamper localization. Effectiveness of the framework is demonstrated by implementing the framework using hierarchical image authentication along with lossless generalized-least significant bit data embedding.