Publication:
Source and filter estimation for throat-microphone speech enhancement

dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorTuran, Mehmet Ali Tuğtekin
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuprofilePhD Student
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.date.accessioned2024-11-10T00:11:32Z
dc.date.issued2016
dc.description.abstractIn this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue2
dc.description.openaccessYES
dc.description.volume24
dc.identifier.doi10.1109/TASLP.2015.2499040
dc.identifier.eissn2329-9304
dc.identifier.issn2329-9290
dc.identifier.urihttp://dx.doi.org/10.1109/TASLP.2015.2499040
dc.identifier.urihttps://hdl.handle.net/20.500.14288/17493
dc.identifier.wos367950900001
dc.keywordsSpeech enhancement
dc.keywordsThroat microphone
dc.keywordsGaussian mixture model
dc.keywordsStatistical mapping
dc.keywordsArtificial bandwidth extension
dc.keywordsMaximum-likelihood
dc.keywordsRecognition
dc.languageEnglish
dc.publisherIEEE-Inst Electrical Electronics Engineers Inc
dc.sourceIEEE-Acm Transactions on Audio Speech and Language Processing
dc.subjectAcoustics
dc.subjectElectrical electronic engineering
dc.titleSource and filter estimation for throat-microphone speech enhancement
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-3822-235X
local.contributor.authorid0000-0002-2715-2368
local.contributor.kuauthorTuran, Mehmet Ali Tuğtekin
local.contributor.kuauthorErzin, Engin
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files