Researcher:
Kanak, Alper

Loading...
Profile Picture
ORCID

Job Title

Master Student

First Name

Alper

Last Name

Kanak

Name

Name Variants

Kanak, Alper

Email Address

Birth Date

Search Results

Now showing 1 - 3 of 3
  • Placeholder
    Publication
    Multimodal speaker identification with audio-video processing
    (Ieee, 2003) Department of Computer Engineering; N/A; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Yemez, Yücel; Kanak, Alper; Erzin, Engin; Tekalp, Ahmet Murat; Faculty Member; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; 107907; N/A; 34503; 26207
    In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical row vectors and then encoded as a feature vector in a magnitude-direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coeffcients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.
  • Placeholder
    Publication
    Transport protocol mechanisms for wireless networking: a review and comparative simulation study
    (Springer-Verlag Berlin, 2003) N/A; N/A; Department of Computer Engineering; Kanak, Alper; Özkasap, Öznur; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 113507
    Increasing popularity of wireless services has triggered the need for efficient wireless transport mechanisms. TCP, being the reliable transport level protocol widely used in wired network world, was not designed with heterogeneity in mind. The problem with the adaptation of TCP to the evolving wireless settings is because of the assumption that packet loss and unusual delays are mainly caused by congestion. TCP originally assumes that packet loss is very small. on the other hand, wireless links often suffer from high bit error rates and broken connectivity due to handoffs. A range of schemes, namely end-to-end, split-connection and link-layer protocols, has been proposed to improve the performance of transport mechanisms, in particular TCP, on wireless settings. In this study, we examine these mechanisms for wireless transport, and discuss our comparative simulation results of end-to-end TCP versions (Tahoe, Reno, NewReno and SACK) in various network settings including wireless LANs and wired-cum-wireless scenarios.
  • Placeholder
    Publication
    Joint audio-video processing for biometric speaker identification
    (IEEE, 2003) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Kanak, Alper; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207
    We present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted from each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with Mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are included to demonstrate the system performance.