Publication: Improving phoneme recognition of throat microphone speech recordings using transfer learning
dc.contributor.department | N/A | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Turan, Mehmet Ali Tuğtekin | |
dc.contributor.kuauthor | Erzin, Engin | |
dc.contributor.kuprofile | PhD Student | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.other | Department of Computer Engineering | |
dc.contributor.schoolcollegeinstitute | Graduate School of Sciences and Engineering | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.yokid | N/A | |
dc.contributor.yokid | 34503 | |
dc.date.accessioned | 2024-11-09T23:59:19Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Throat microphones (TM) are a type of skin-attached non-acoustic sensors, which are robust to environmental noise but carry a lower signal bandwidth characterization than the traditional close-talk microphones (CM). Attaining high-performance phoneme recognition is a challenging task when the training data from a degrading channel, such as TM, is limited. In this paper, we address this challenge for the TM speech recordings using a transfer learning approach based on the stacked denoising auto-encoders (SDA). The proposed transfer learning approach defines an SDA-based domain adaptation framework to map the source domain CM representations and the target domain TM representations into a common latent space, where the mismatch across TM and CM is eliminated to better train an acoustic model and to improve the TM phoneme recognition. For the phoneme recognition task, we use the convolutional neural network (CNN) and the hidden Markov model (HMM) based CNN/HMM hybrid system, which delivers better acoustic modeling performance compared to the conventional Gaussian mixture model (GMM) based models. In the experimental evaluations, we observed more than 12% relative phoneme error rate (PER) improvement for the TM recordings with the proposed transfer learning approach compared to baseline performances. | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | NO | |
dc.description.sponsorship | Scientific and Technological Research Council of Turkey (TUBITAK) [217E107] This work was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) under grant number 217E107. | |
dc.description.volume | 129 | |
dc.identifier.doi | 10.1016/j.specom.2021.02.004 | |
dc.identifier.eissn | 1872-7182 | |
dc.identifier.issn | 0167-6393 | |
dc.identifier.scopus | 2-s2.0-85102974819 | |
dc.identifier.uri | http://dx.doi.org/10.1016/j.specom.2021.02.004 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/15621 | |
dc.identifier.wos | 639454800004 | |
dc.keywords | Phoneme recognition | |
dc.keywords | Feature augmentation | |
dc.keywords | Transfer learning | |
dc.keywords | Throat microphone | |
dc.keywords | Denoising auto-encoder | |
dc.keywords | Convolutional Neural-networks | |
dc.language | English | |
dc.publisher | Elsevier | |
dc.source | Speech Communication | |
dc.subject | Acoustics | |
dc.subject | Computer Science | |
dc.subject | Artificial intelligence | |
dc.subject | Electrical electronic Engineering | |
dc.subject | Telecommunications | |
dc.title | Improving phoneme recognition of throat microphone speech recordings using transfer learning | |
dc.type | Journal Article | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-3822-235X | |
local.contributor.authorid | 0000-0002-2715-2368 | |
local.contributor.kuauthor | Turan, Mehmet Ali Tuğtekin | |
local.contributor.kuauthor | Erzin, Engin | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae |