Unsupervised affective state learning from speech

Publication:
Unsupervised affective state learning from speech

dc.contributor.advisor	Erzin, Engin
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Kuşçu, Gökhan
dc.contributor.program	Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.coverage.spatial	İstanbul
dc.date.accessioned	2025-06-30T04:36:29Z
dc.date.available	2025-04-08
dc.date.issued	2024
dc.description.abstract	The conventional paradigm for estimating continuous emotional states from speech, investigated as a regression problem over time, has been widely acknowledged. This thesis introduces a novel methodology that transposes this challenge into the classification domain by learning clusters of affect contours of uniform lengths. Our approach involves a novel joint clustering and classification scheme, wherein each iteration involves clustering affect contours into independent classes. We seek to classify these classes, identifying distinct clusters with observed intra-class sample similarities. The classification structure integrates an audio feature extractor based on a Wav2Vec 2.0 model, followed by a convolutional neural network (CNN). Concurrently, the clustering component processes a segment of the affect contour, employing a convolutional network for dimensionality reduction and subsequent application of k-means clustering. The classification network predicts these generated clusters. The cumulative loss is then propagated to neural networks for weight updates. Empirical findings reveal that the obtained clusters exhibit distinctive and insightful characteristics. Simultaneously, incorporating a regression head into the trained classification network yields competitive audio-only performance on the RECOLA and USC CreativeIT datasets regarding continuous emotion recognition (CER). The results for CER are compared against baselines and existing literature, illustrating the efficacy of our approach. Our results demonstrate that while achieving competitive continuous emotion recognition performance, our approach, fundamentally a classification framework, converts the nature of the well-studied continuous regression problem.
dc.description.abstract	Konuşmadan sürekli duygu durumlarını tahmin etmeye yönelik geleneksel paradigma, zaman içinde bir regresyon problemi olarak yorumlanmış ve yaygın olarak kabul görmüştür. Bu tez, tek tip uzunluklara sahip duygu konturları kümeleri oluşturarak bu zorluğu sınıflandırma alanına aktaran yeni bir metodoloji sunmaktadır. Bu yaklaşım, her yinelemede duygulanım konturlarının bağımsız sınıflar halinde kümelenmesini içeren yeni bir ortak kümeleme ve sınıflandırma şeması içermektedir. Gözlemlenen sınıf içi örnek benzerlikleri ile farklı kümeleri tanımlayarak bu sınıfları sınıflandırmaya çalışılmaktadır. Sınıflandırma yapısı, Wav2Vec 2.0 modeline dayalı bir ses özelliği çıkarıcıyı ve ardından bir evrişimli sinir ağını (CNN) içermektedir. Eş zamanlı olarak, kümeleme bileşeni, boyutsallığı azaltmak ve ardından k-means kümelemesini uygulamak için bir evrişimli ağ kullanarak etki konturunun eşit uzunluktaki bir birimini işler. Kümeleme bileşeni tarafından üretilen kümeler sınıflandırma ağı tarafından tahmin edilir. Kümülatif kayıp daha sonra ağırlık güncellemeleri için sinir ağına yayılır. Ampirik bulgular, elde edilen kümelerin birbirinden farklı ayırt edici özellikler sergilediğini ortaya koymaktadır. Eş zamanlı olarak, eğitilmiş sınıflandırma ağına bir regresyon başlığının dahil edilmesi, RECOLA ve USC CreativeIT veri kümelerinde sadece ses kullanıldığında literatürdeki sonuçları yakalayan bir performans sağlar. Bu sonuçlar, regresyon baz performansları ve mevcut literatür ile karşılaştırılarak yaklaşımımızın etkinliği gösterilmiştir. Temelde bir sınıflandırma çerçevesi olan yaklaşımımız rekabetçi sürekli duygu tanıma performansına ulaşırken iyi çalışılmış sürekli regresyon problemi doğasını dönüştürmektedir.
dc.description.fulltext	Yes
dc.format.extent	xii, 49 leaves : tables ; 30 cm.
dc.identifier.embargo	No
dc.identifier.endpage	61
dc.identifier.filenameinventoryno	T_2024_007_GSSE
dc.identifier.uri	https://hdl.handle.net/20.500.14288/29834
dc.identifier.yoktezid	879055
dc.identifier.yoktezlink	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=1pwTzRXnomYf6jwqVORfUQkRBAbLFKVTPY5zcJ8IUUbVFRVdJAodYEe7PBea6pS1
dc.language.iso	eng
dc.publisher	Koç University
dc.relation.collection	KU Theses and Dissertations
dc.rights	restrictedAccess
dc.rights.copyrightsnote	© All Rights Reserved. Accessible to Koç University Affiliated Users Only!
dc.subject	Machine learning, Graphic methods
dc.subject	Machine learning
dc.subject	Artificial intelligence
dc.subject	Supervised learning (Machine learning)
dc.title	Unsupervised affective state learning from speech
dc.title.alternative	Konuşmadan gözetimsiz duygusal durum öğrenme
dc.type	Thesis
dspace.entity.type	Publication
local.contributor.kuauthor	Kuşçu, Gökhan

Collections

Theses & Dissertations

Publication: Unsupervised affective state learning from speech

Files

Collections

Publication:
Unsupervised affective state learning from speech