Improving phoneme recognition of throat microphone speech recordings using transfer learning

Publication:
Improving phoneme recognition of throat microphone speech recordings using transfer learning

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Erzin, Engin
dc.contributor.kuauthor	Turan, Mehmet Ali Tuğtekin
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T23:59:19Z
dc.date.issued	2021
dc.description.abstract	Throat microphones (TM) are a type of skin-attached non-acoustic sensors, which are robust to environmental noise but carry a lower signal bandwidth characterization than the traditional close-talk microphones (CM). Attaining high-performance phoneme recognition is a challenging task when the training data from a degrading channel, such as TM, is limited. In this paper, we address this challenge for the TM speech recordings using a transfer learning approach based on the stacked denoising auto-encoders (SDA). The proposed transfer learning approach defines an SDA-based domain adaptation framework to map the source domain CM representations and the target domain TM representations into a common latent space, where the mismatch across TM and CM is eliminated to better train an acoustic model and to improve the TM phoneme recognition. For the phoneme recognition task, we use the convolutional neural network (CNN) and the hidden Markov model (HMM) based CNN/HMM hybrid system, which delivers better acoustic modeling performance compared to the conventional Gaussian mixture model (GMM) based models. In the experimental evaluations, we observed more than 12% relative phoneme error rate (PER) improvement for the TM recordings with the proposed transfer learning approach compared to baseline performances.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	NO
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Scientific and Technological Research Council of Turkey (TUBITAK) [217E107] This work was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) under grant number 217E107.
dc.description.volume	129
dc.identifier.doi	10.1016/j.specom.2021.02.004
dc.identifier.eissn	1872-7182
dc.identifier.issn	0167-6393
dc.identifier.scopus	2-s2.0-85102974819
dc.identifier.uri	https://doi.org/10.1016/j.specom.2021.02.004
dc.identifier.uri	https://hdl.handle.net/20.500.14288/15621
dc.identifier.wos	639454800004
dc.keywords	Phoneme recognition
dc.keywords	Feature augmentation
dc.keywords	Transfer learning
dc.keywords	Throat microphone
dc.keywords	Denoising auto-encoder
dc.keywords	Convolutional Neural-networks
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.ispartof	Speech Communication
dc.subject	Acoustics
dc.subject	Computer Science
dc.subject	Artificial intelligence
dc.subject	Electrical electronic Engineering
dc.subject	Telecommunications
dc.title	Improving phoneme recognition of throat microphone speech recordings using transfer learning
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Turan, Mehmet Ali Tuğtekin
local.contributor.kuauthor	Erzin, Engin
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Improving phoneme recognition of throat microphone speech recordings using transfer learning

Files

Collections

Publication:
Improving phoneme recognition of throat microphone speech recordings using transfer learning