Publication:
Exploring modulation spectrum features for speech-based depression level classification

dc.contributor.coauthorToledo-Ronen, Orith
dc.contributor.coauthorSorin, Alexander
dc.contributor.departmentN/A
dc.contributor.kuauthorBozkurt, Elif
dc.contributor.kuprofilePhD Student
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokidN/A
dc.date.accessioned2024-11-10T00:04:50Z
dc.date.issued2014
dc.description.abstractIn this paper, we propose a Modulation Spectrum-based manageable feature set for detection of depressed speech. Modulation Spectrum (MS) is obtained from the conventional speech spectrogram by spectral analysis along the temporal trajectories of the acoustic frequency bins. While MS representation of speech provides rich and high-dimensional joint frequency information, extraction of discriminative features from it remains as an open question. We propose a lower dimensional representation, which first employs a Melfrequency filterbank in the acoustic frequency domain and Discrete Cosine Transform in the modulation frequency domain, and then applies feature selection in both domains. We compare and fuse the proposed feature set with other complementary prosodic and spectral features at the feature and decision levels. In our experiments, we use Support Vector Machines for discriminating the depressed speech in a speaker-independent fashion. Feature-level fusion of the proposed MS-based features with other prosodic and spectral features after dimension reduction provides up to ~9% improvement over the baseline results and also correlates the most with clinical ratings of patients' depression level.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsorshipAmazon
dc.description.sponsorshipBaidu
dc.description.sponsorshipet al.
dc.description.sponsorshipGoogle
dc.description.sponsorshipTemasek Laboratories at Nanyang Technological University (TL at NTU)
dc.description.sponsorshipWeChat
dc.identifier.doiN/A
dc.identifier.issn2308-457X
dc.identifier.linkhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84910070415&partnerID=40&md5=5bfebfff94737517e4b29deea7566531
dc.identifier.scopus2-s2.0-84910070415
dc.identifier.uriN/A
dc.identifier.urihttps://hdl.handle.net/20.500.14288/16342
dc.identifier.wos395050100253
dc.keywordsDecision fusion
dc.keywordsDepression assessment
dc.keywordsFeature fusion
dc.keywordsModulation spectrum
dc.keywordsProsody Discrete cosine transforms
dc.keywordsFeature extraction
dc.keywordsFrequency domain analysis
dc.keywordsModulation
dc.keywordsSpectrum analysis
dc.keywordsSpeech
dc.keywordsSpeech communication
dc.keywordsDecision fusion
dc.keywordsDepression assessment
dc.keywordsFeature fusion
dc.keywordsModulation spectrum
dc.keywordsProsody
dc.keywordsSpeech recognition
dc.languageEnglish
dc.publisherInternational Speech and Communication Association
dc.sourceProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectEngineering
dc.subjectElectrical electronic engineering
dc.titleExploring modulation spectrum features for speech-based depression level classification
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.kuauthorBozkurt, Elif

Files