Publications without Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/3

Browse

Search Results

Now showing 1 - 10 of 59
  • Placeholder
    Publication
    Multimodal analysis of speech prosody and upper body gestures using hidden semi-Markov models
    (Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Bozkurt, Elif; Asta, Shahriar; Özkul, Serkan; Yemez, Yücel; Erzin, Engin; PhD Student; PhD Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; N/A; 107907; 34503
    Gesticulation is an essential component of face-to-face communication, and it contributes significantly to the natural and affective perception of human-to-human communication. In this work we investigate a new multimodal analysis framework to model relationships between intonational and gesture phrases using the hidden semi-Markov models (HSMMs). The HSMM framework effectively associates longer duration gesture phrases to shorter duration prosody clusters, while maintaining realistic gesture phrase duration statistics. We evaluate the multimodal analysis framework by generating speech prosody driven gesture animation, and employing both subjective and objective metrics.
  • Placeholder
    Publication
    SecVLC: secure visible light communication for military vehicular networks
    (Association for Computing Machinery (ACM), 2016) Tsonev, Dobroslav; Burchardt, Harald; N/A; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Uçar, Seyhan; Ergen, Sinem Çöleri; Özkasap, Öznur; PhD Student; Faculty Member; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 7211; 113507
    Technology coined as the vehicular ad hoc network (VANET) is harmonizing with Intelligent Transportation System (ITS) and Intelligent Traffic System (ITF). An application sce- nario of VANET is the military communication where ve- hicles move as a convoy on roadways, requiring secure and reliable communication. However, utilization of radio fre- quency (RF) communication in VANET limits its usage in military applications, due to the scarce frequency band and its vulnerability to security attacks. Visible Light Communi- cation (VLC) has been recently introduced as a more secure alternative, limiting the reception of neighboring nodes with its directional transmission. However, secure vehicular VLC that ensures confidential data transfer among the participat- ing vehicles, is an open problem. In this paper, we propose a secure military light communication protocol (SecVLC) for enabling efficient and secure data sharing. We use the directionality property of VLC to ensure that only target vehicles participate in the communication. Vehicles use full- duplex communication where infra-red (IR) is utilized to share a secret key and VLC is used to receive encrypted data. We experimentally demonstrate the suitability of SecVLC in outdoor scenarios at varying inter-vehicular distances with key metrics of interest, including the security, data packet delivery ratio and delay.
  • Placeholder
    Publication
    Artificial bandwidth extension of speech excitation
    (IEEE, 2015) Department of Computer Engineering; N/A; Erzin, Engin; Turan, Mehmet Ali Tuğtekin; Faculty Member; PhD Student; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 34503; N/A
    In this paper, a new approach that extends narrowband excitation signals to synthesize wide-band speech have been proposed. Bandwidth extension problem is analyzed using source-filter separation framework where a speech signal is decomposed into two independent components. For spectral envelope extension, our former work based on hidden Markov model have been used. For excitation signal extension, the proposed method moves the spectrum based on correlation analysis where the distance between the harmonics and the structure of the excitation signal are preserved in high-bands. In experimental studies, we also apply two other well-known extension techniques for excitation signals comparatively and evaluate the overall performance of proposed system using the PESQ metric. Our findings indicate that the proposed extension method outperforms other two techniques. © 2015 IEEE./ Öz: Bu çalışmada dar bantlı kaynak sinyallerinin bant genişliği artırılarak geniş bantlı konuşma sentezleyen yeni bir yaklaşım önerilmektedir. Bant genişletme problemi kaynak süzgeç analizinin yardımıyla iki bağımsız bileşen üzerinde ayrı ayrı ele alınmıştır. Süzgeç yapısını şekillendiren izgesel zarfı, saklı Markov modeli tabanlı geçmiş çalışmamızı kullanarak iyileştirirken, dar bantlı kaynak sinyalinin genişletilmesi için izgesel kopyalamaya dayalı yeni bir yöntem öneriyoruz. Bu yeni yöntemde dar bantlı kaynak sinyalinin yüksek frekans bileşenlerindeki harmonik yapısını, ilinti analizi ile genişletip geniş bantlı kaynak sinyali sentezlemekteyiz. Öne sürülen bu iyileştirmenin başarımını ölçebilmek için literatürde sıklıkla kullanılan iki ayrı genişletme yöntemi de karşılaştırmalı olarak degerlendirilmekte- dir. Deneysel çalışmalarda öne sürdüğümüz genişletmenin PESQ ölçütüyle nesnel başarımı gösterilmiştir.
  • Placeholder
    Publication
    E_coach
    (IEEE, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Civanlar, Mehmet Reha; Baykan, Eda; Faculty Member; Undergraduated Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; 16372; N/A
    We developed the necessary software to control the playback speed of exercise videos playing on a personal computer, using the heart rate of an individual performing the recorded exercise routine. Moderate exercise, at an appropriate heart rate, is widely regarded today as an excellent way to improve one's health when performed on a regular and frequent basis. One popular form of an indoor exercise program is to use a video "workout" program of aerobic exercise and/or weight training exercises. The "off-the-shelf" exercise videos, while they may target various fitness levels (such as "beginner", "regular", and "advanced"), cannot offer precise adjustments to address each user's current fitness level. The software developed allows for the playback of an exercise video to be adjusted to accommodate the fitness level of the individual user through a closed loop feedback mechanism. The project is being improved for logging and analyzing the performance of an individual who uses the system regularly and for exercise planning. The closed loop feedback mechanism that models the relationship between the heart rate and exercise level, is being improved with the experiments in which subjects incude fit people as well as ones who are sedementary. © 2004 IEEE.
  • Placeholder
    Publication
    A new statistical excitation mapping for enhancement of throat microphone recordings
    (International Speech and Communication Association, 2013) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    In this paper we investigate a new statistical excitation mapping technique to enhance throat-microphone speech using joint analysis of throat- And acoustic-microphone recordings. In a recent study we employed source-filter decomposition to enhance spectral envelope of the throat-microphone recordings. In the source-filter decomposition framework we observed that the spectral envelope difference of the excitation signals of throatand acoustic-microphone recordings is an important source of the degradation in the throat-microphone voice quality. In this study we model spectral envelope difference of the excitation signals as a spectral tilt vector, and we propose a new phone-dependent GMM-based spectral tilt mapping scheme to enhance throat excitation signal. Experiments are performed to evaluate the proposed excitation mapping scheme in comparison with the state-of-the-art throat-microphone speech enhancement techniques using both objective and subjective evaluations. Objective evaluations are performed with the wideband perceptual evaluation of speech quality (ITU-PESQ) metric. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed statistical excitation mapping consistently delivers higher improvements than the statistical mapping of the spectral envelope to enhance the throat-microphone recordings.
  • Placeholder
    Publication
    Coarse-to-fine surface reconstruction from silhouettes and range data using mesh deformation
    (Academic Press Inc Elsevier Science, 2010) N/A; Department of Computer Engineering; Sahillioğlu, Yusuf; Yemez, Yücel; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; 215195; 107907
    We present a coarse-to-fine surface reconstruction method based on mesh deformation to build watertight surface models of complex objects from their silhouettes and range data. The deformable mesh, which initially represents the object visual hull, is iteratively displaced towards the triangulated range surface using the line-of-sight information. Each iteration of the deformation algorithm involves smoothing and restructuring operations to regularize the surface evolution process. We define a non-shrinking and easy-to-compute smoothing operator that fairs the surface separately along its tangential and normal directions. The mesh restructuring operator, which is based on edge split, collapse and flip operations, enables the deformable mesh to adapt its shape to the object geometry without suffering from any geometrical distortions. By imposing appropriate minimum and maximum edge length constraints, the deformable mesh, hence the object surface, can be represented at increasing levels of detail. This coarse-to-fine strategy, that allows high resolution reconstructions even with deficient and irregularly sampled range data, not only provides robustness, but also significantly improves the computational efficiency of the deformation process. We demonstrate the performance of the proposed method on several real objects.
  • Placeholder
    Publication
    Subspace methods for retrieval of general 3D models
    (Academic Press Inc Elsevier Science, 2010) Dutagaci, Helin; Sankur, Buelent; Department of Computer Engineering; Yemez, Yücel; Faculty Member; Department of Computer Engineering; College of Engineering; 107907
    In statistical shape analysis, subspace methods such as PCA, ICA and NMF are commonplace, whereas they have not been adequately investigated for indexing and retrieval of generic 3D models. The main roadblock to the wider employment of these methods seems to be their sensitivity to alignment, itself an ambiguous task in the absence of common natural landmarks. We present a retrieval scheme based comparatively on three subspaces, PCA, ICA and NMF, extracted from the volumetric representations of 3D models. We find that the most propitious 3D distance transform leading to discriminative subspace features is the inverse distance transform. We mitigate the ambiguity of pose normalization with continuous PCA coupled with the use of all feasible axis labeling and reflections. The performance of the sub-space-based retrieval methods on Princeton Shape Benchmark is on a par with the state-of-the-art methods.
  • Placeholder
    Publication
    RGB-D object recognition using deep convolutional neural networks
    (Ieee, 2017) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Zia, Saman; Yüksel, Buket; Yüret, Deniz; Yemez, Yücel; Master Student; Teaching Faculty; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 326941; 179996; 107907
    We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.
  • Placeholder
    Publication
    Semantic segmentation of RGBD videos with recurrent fully convolutional neural networks
    (Ieee, 2017) N/A; Department of Computer Engineering; Yurdakul, Ekrem Emre; Yemez, Yücel; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 107907
    Semantic segmentation of videos using neural networks is currently a popular task, the work done in this field is however mostly on RGB videos. The main reason for this is the lack of large RGBD video datasets, annotated with ground truth information at the pixel level. In this work, we use a synthetic RGBD video dataset to investigate the contribution of depth and temporal information to the video segmentation task using convolutional and recurrent neural network architectures. Our experiments show the addition of depth information improves semantic segmentation results and exploiting temporal information results in higher quality output segmentations.
  • Placeholder
    Publication
    Optimizing instance selection for statistical machine translation with feature decay algorithms
    (IEEE-Inst Electrical Electronics Engineers Inc, 2015) N/A; Department of Computer Engineering; Yüret, Deniz; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 179996
    We introduce FDa5 for efficient parameterization, optimization, and implementation of feature decay algorithms (FDa), A class of instance selection algorithms that use feature decay. FDa increase the diversity of the selected training set by devaluing features (i.e., n-grams) that have already been included. FDa5 decides which instances to select based on three functions used for initializing and decaying feature values and scaling sentence scores controlled with five parameters. We present optimization techniques that allow FDa5 to adapt these functions to in-domain and out-of-domain translation tasks for different language pairs. in a transductive learning setting, selection of training instances relevant to the test set can improve the final translation quality. in machine translation experiments performed on the 2 million sentence English-German section of the Europarl corpus, we show that a subset of the training set selected by FDa5 can gain up to 3.22 BLEU points compared to a randomly selected subset of the same size, can gain up to 0.41 BLEU points compared to using all of the available training data using only 15% of it, and can reach within 0.5 BLEU points to the full training set result by using only 2.7% of the full training data. FDa5 peaks at around 8M words or 15% of the full training set. in an active learning setting, FDa5 minimizes the human effort by identifying the most informative sentences for translation and FDa gains up to 0.45 BLEU points using 3/5 of the available training data compared to using all of it and 1.12 BLEU points compared to random training set. in translation tasks involving English and Turkish, A morphologically rich language, FDa5 can gain up to 11.52 BLEU points compared to a randomly selected subset of the same size, can achieve the same BLEU score using as little as 4% of the data compared to random instance selection, and can exceed the full dataset result by 0.78 BLEU points. FDa5 is able to reduce the time to build a statistical machine translation system to about half with 1M words using only 3% of the space for the phrase table and 8% of the overall space when compared with a baseline system using all of the training data available yet still obtain only 0.58 BLEU points difference with the baseline system in out-of-domain translation.