Researcher:
Sadiq, Rizwan

Loading...
Profile Picture
ORCID

Job Title

PhD Student

First Name

Rizwan

Last Name

Sadiq

Name

Name Variants

Sadiq, Rizwan

Email Address

Birth Date

Search Results

Now showing 1 - 6 of 6
  • Placeholder
    Publication
    Emotion dependent facial animation from affective speech
    (Ieee, 2020) N/A; N/A; N/A; Department of Computer Engineering; Sadiq, Rizwan; Asadiabadi, Sasan; Erzin, Engin; PhD Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In human-to-computer interaction, facial animation in synchrony with affective speech can deliver more naturalistic conversational agents. In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories. In the second stage, we train separate deep estimators within each emotion category to synthesize facial shape from the affective speech. Objective and subjective evaluations are performed over the SAVEE dataset. The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless of the emotion.
  • Placeholder
    Publication
    Affect recognition from lip articulations
    (Institute of Electrical and Electronics Engineers (IEEE), 2017) N/A; Department of Computer Engineering; Sadiq, Rizwan; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using the IEMOCAP database. Our results demonstrate that lip articulations over a set of discriminative phonetic classes improves the affect recognition performance, and attains 3-class recognition rates for the activation, valence and dominance (AVD) attributes as 72.16%, 46.44% and 64.92%, respectively.
  • Placeholder
    Publication
    Multimodal speech driven facial shape animation using deep neural networks
    (Ieee, 2018) N/A; N/A; Department of Computer Engineering; Asadiabadi, Sasan; Sadiq, Rizwan; Erzin, Engin; PhD Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
  • Thumbnail Image
    PublicationOpen Access
    Emotion dependent domain adaptation for speech driven affective facial feature synthesis
    (Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Electrical and Electronics Engineering; Erzin, Engin; Sadiq, Rizwan; Faculty Member; Department of Electrical and Electronics Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; 34503; N/A
    Although speech driven facial animation has been studied extensively in the literature, works focusing on the affective content of the speech are limited. This is mostly due to the scarcity of affective audio-visual data. In this article, we improve the affective facial animation using domain adaptation by partially reducing the data scarcity. We first define a domain adaptation to map affective and neutral speech representations to a common latent space in which cross-domain bias is smaller. Then the domain adaptation is used to augment affective representations for each emotion category, including angry, disgust, fear, happy, sad, surprise, and neutral, so that we can better train emotion-dependent deep audio-to-visual (A2V) mapping models. Based on the emotion-dependent deep A2V models, the proposed affective facial synthesis system is realized in two stages: first, speech emotion recognition extracts soft emotion category likelihoods for the utterances; then a soft fusion of the emotion-dependent A2V mapping outputs form the affective facial synthesis. Experimental evaluations are performed on the SAVEE audio-visual dataset. The proposed models are assessed with objective and subjective evaluations. The proposed affective A2V system achieves significant MSE loss improvements in comparison to the recent literature. Furthermore, the resulting facial animations of the proposed system are preferred over the baseline animations in the subjective evaluations.
  • Thumbnail Image
    PublicationOpen Access
    Multimodal speech driven facial shape animation using deep neural networks
    (Institute of Electrical and Electronics Engineers (IEEE), 2018) Department of Electrical and Electronics Engineering; Erzin, Engin; Sadiq, Rizwan; Asadiabadi, Sasan; Faculty Member; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; 34503; N/A; N/A
    In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
  • Thumbnail Image
    PublicationOpen Access
    Emotion dependent facial animation from affective speech
    (Institute of Electrical and Electronics Engineers (IEEE), 2020) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Sadiq, Rizwan; Asadiabadi, Sasan; Erzin, Engin; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In human-to-computer interaction, facial animation in synchrony with affective speech can deliver more naturalistic conversational agents. In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories. In the second stage, we train separate deep estimators within each emotion category to synthesize facial shape from the affective speech. Objective and subjective evaluations are performed over the SAVEE dataset. The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless of the emotion.