Publications without Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/3

Browse

Search Results

Now showing 1 - 10 of 56
  • Placeholder
    Publication
    Enhancement of throat microphone recordings by learning phone-dependent mappings of speech spectra
    (Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    We investigate spectral envelope mapping problem with joint analysis of throat- and acoustic-microphone recordings to enhance throatmicrophone speech. A new phone-dependent GMM-based spectral envelope mapping scheme, which performs the minimum mean square error (MMSE) estimation of the acoustic-microphone spectral envelope, has been proposed. Experimental evaluations are performed to compare the proposed mapping scheme to the state-of-theart GMM-based estimator using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wideband perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed phone-dependent mapping consistently improves performances over the state-of-the-art GMM estimator.
  • Placeholder
    Publication
    Fundamental frequency estimation for heterophonical Turkish music by using VMD
    (Institute of Electrical and Electronics Engineers (IEEE), 2016) Simsek, Berrak Ozturk; Akan, Aydin; Department of Computer Engineering; Bozkurt, Barış; Faculty Member; Department of Computer Engineering; College of Engineering; N/A
    In this study, a new method is presented for the fundamental frequency estimation of heterophonical Turkish makam music recordings that include percusssive instrument by using Variational Mode Decomposition (VMD). VMD is a method to decompose an input signal into an ensemble of sub-signals (modes) which is entirely non-recursive and determines the relevant bands adaptively and estimates the corresponding modes concurrently. In order to decompose a given signal optimally, actuated by the narrow-band properties corresponding to the Intrinsic Mode Function definition used in Emprical Mode Decomposition (EMD), and we seek an ensemble of modes. Simulation results on fundamental frequency estimation of real music data show comparable performance to other common decomposition methods for music signals such as YIN and MELODIA based methods.
  • Placeholder
    Publication
    Source and filter estimation for throat-microphone speech enhancement
    (IEEE-Inst Electrical Electronics Engineers Inc, 2016) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings.
  • Placeholder
    Publication
    PPAD: privacy preserving group-based advertising in online social networks
    (IEEE, 2018) N/A; Department of Computer Engineering; Department of Computer Engineering; Boshrooyeh, Sanaz Taheri; Küpçü, Alptekin; Özkasap, Öznur; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 168060; 113507
    Services provided as free by Online Social Networks (OSN) come with privacy concerns. Users' information kept by OSN providers are vulnerable to the risk of being sold to the advertising firms. To protect user privacy, existing proposals utilize data encryption, which prevents the providers from monetizing users' information. Therefore, the providers would not be financially motivated to establish secure OSN designs based on users' data encryption. Addressing these problems, we propose the first Privacy Preserving Group-Based Advertising (PPAD) system that gives monetizing ability for the OSN providers. PPAD performs profile and advertisement matching without requiring the users or advertisers to be online, and is shown to be secure in the presence of honest but curious servers that are allowed to create fake users or advertisers. We also present advertisement accuracy metrics under various system parameters providing a range of security-accuracy trade-offs.
  • Placeholder
    Publication
    Equilibrium analysis for linear and nonlinear aggregation in network models: applied to mental model aggregation in multilevel organisational learning
    (Taylor & Francis Ltd, 2022) Treur, Jan; Department of Computer Engineering; Canbaloğlu, Gülay; Undergraduate Student; Department of Computer Engineering; College of Engineering; N/A
    In this paper, equilibrium analysis for network models is addressed and applied in particular to a network model of multilevel organisational learning. The equilibrium analysis addresses properties of aggregation characteristics and connectivity characteristics of a network. For aggregation characteristics, it is shown how certain classes of nonlinear functions enable equilibrium analysis of the emerging dynamics within the network like linear functions do. For connectivity characteristics, by using a form of stratification for the network's strongly connected components, it is shown how equilibrium analysis results can be obtained relating equilibrium values in any component to equilibrium values in (independent) components without incoming connections. In addition, concerning aggregation characteristics, two specific types of nonlinear functions for aggregation in networks (weighted euclidean functions and weighted geometric functions) are analysed. It is illustrated in detail how by using certain function transformations also methods for equilibrium analysis based on a symbolic linear equation solver, can be applied to make predictions about equilibrium values for them. All these results are applied to a network model for organisational learning. Finally, it is analysed in some depth how the function transformations applied can be described by the more general notion of function conjugate relation, also often used for coordinate transformations.
  • Placeholder
    Publication
    On optimal selection of lip-motion features for speaker identification
    (IEEE, 2004) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Çetingül, Hasan Ertan; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207
    This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results are also included to demonstrate the performance.
  • Placeholder
    Publication
    Framework for traffic proportional energy efficiency in software defined networks
    (IEEE, 2018) N/A; Department of Computer Engineering; Assefa, Beakal Gizachew; Özkasap, Öznur; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 113507
    Software Defined Networking (SDN) achieves programmability of network elements by separating the control and the forwarding planes, and provides efficiency through optimized routing and flexibility in network management. As the energy costs contribute largely to the overall costs in networks, energy efficiency is a significant design requirement for modern networking mechanisms. However, designing energy efficient solutions is complicated since there is a trade-off between energy efficiency and network performance. In this paper, we propose traffic proportional energy efficient framework for SDN and heuristics algorithm that maintains the tradeoff between efficiency and performance. We also present IP formulation for traffic proportional energy efficiency problem. Comprehensive experiments conducted on Mininet emulator and PDX controller using Abilene, Atlanta, and Nobel-Germany real-world topologies and traffic traces show that our approach saves up to 50% energy while achieving a performance closer to the algorithms prioritizing performance.
  • Placeholder
    Publication
    Robust lip-motion features for speaker identification
    (IEEE, 2005) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Çetingül, Hasan Ertan; Yemez, Yücel; Erzin, Engin; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 107907; 34503; 26207
    This paper addresses the selection of robust lip-motion features for audio-visual open-set speaker identification problem. We consider two alternatives for initial lip motion representation. In the first alternative. the feature vector is composed of the 2D-DCT coefficients of the motion vectors estimated within the detected rectangular mouth region whereas in the second, lip boundaries are tracked over the video frames and only the motion vectors around the lip contour are taken into account along with the shape of the lip boundary. Experimental results of the HMM-based identification system are included for performance comparison of the two lip motion representation alternatives.
  • Placeholder
    Publication
    Using synthetic data for person tracking under adverse weather conditions
    (Elsevier, 2021) Kerim, Abdulrahman; Çelikcan, Ufuk; Erdem, Erkut; Department of Computer Engineering; Erdem, Aykut; Faculty Member; Department of Computer Engineering; College of Engineering; 20331
    Robust visual tracking plays a vital role in many areas such as autonomous cars, surveillance and robotics. Recent trackers were shown to achieve adequate results under normal tracking scenarios with clear weather conditions, standard camera setups and lighting conditions. Yet, the performance of these trackers, whether they are corre-lation filter-based or learning-based, degrade under adverse weather conditions. The lack of videos with such weather conditions, in the available visual object tracking datasets, is the prime issue behind the low perfor-mance of the learning-based tracking algorithms. In this work, we provide a new person tracking dataset of real-world sequences (PTAW172Real) captured under foggy, rainy and snowy weather conditions to assess the performance of the current trackers. We also introduce a novel person tracking dataset of synthetic sequences (PTAW217Synth) procedurally generated by our NOVA framework spanning the same weather conditions in varying severity to mitigate the problem of data scarcity. Our experimental results demonstrate that the perfor-mances of the state-of-the-art deep trackers under adverse weather conditions can be boosted when the avail-able real training sequences are complemented with our synthetically generated dataset during training. (c) 2021 Elsevier B.V. All rights reserved.
  • Placeholder
    Publication
    Ransac-based training data selection for speaker state recognition
    (Isca-Int Speech Communication Assoc, 2011) Erdem, Çiğdem Eroğlu; Erdem, A. Tanju; N/A; Department of Computer Engineering; Bozkurt, Elif; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    We present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge corpora that includes the Intoxication and the Sleepiness Sub-challenges, where each sub-challenge defines a two-class classification task. We aim to perform a RANSAC-based training data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxication Sub-challenge performance.