Researcher:
Asadiabadi, Sasan

Loading...
Profile Picture
ORCID

Job Title

PhD Student

First Name

Sasan

Last Name

Asadiabadi

Name

Name Variants

Asadiabadi, Sasan

Email Address

Birth Date

Search Results

Now showing 1 - 9 of 9
  • Placeholder
    Publication
    Emotion dependent facial animation from affective speech
    (Ieee, 2020) N/A; N/A; N/A; Department of Computer Engineering; Sadiq, Rizwan; Asadiabadi, Sasan; Erzin, Engin; PhD Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In human-to-computer interaction, facial animation in synchrony with affective speech can deliver more naturalistic conversational agents. In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories. In the second stage, we train separate deep estimators within each emotion category to synthesize facial shape from the affective speech. Objective and subjective evaluations are performed over the SAVEE dataset. The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless of the emotion.
  • Placeholder
    Publication
    A deep learning approach for data driven vocal tract area function estimation
    (IEEE, 2018) N/A; Department of Computer Engineering; Asadiabadi, Sasan; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    In this paper we present a data driven vocal tract area function (VTAF) estimation using Deep Neural Networks (DNN). We approach the VTAF estimation problem based on sequence to sequence learning neural networks, where regression over a sliding window is used to learn arbitrary non-linear one-to-many mapping from the input feature sequence to the target articulatory sequence. We propose two schemes for efficient estimation of the VTAF; (1) a direct estimation of the area function values and (2) an indirect estimation via predicting the vocal tract boundaries. We consider acoustic speech and phone sequence as two possible input modalities for the DNN estimators. Experimental evaluations are performed over a large data comprising acoustic and phonetic features with parallel articulatory information from the USC-TIMIT database. Our results show that the proposed direct and indirect schemes perform the VTAF estimation with mean absolute error (MAE) rates lower than 1.65 mm, where the direct estimation scheme is observed to perform better than the indirect scheme.
  • Placeholder
    Publication
    Multimodal speech driven facial shape animation using deep neural networks
    (Ieee, 2018) N/A; N/A; Department of Computer Engineering; Asadiabadi, Sasan; Sadiq, Rizwan; Erzin, Engin; PhD Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
  • Placeholder
    Publication
    Vocal tract airway tissue boundary tracking for rtMRI using shape and appearance priors
    (Isca-Int Speech Communication Assoc, 2017) N/A; Department of Computer Engineering; Asadiabadi, Sasan; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    Knowledge about the dynamic shape of the vocal tract is the basis of many speech production applications such as, articulatory analysis, modeling and synthesis. Vocal tract airway tissue boundary segmentation in the mid-sagittal plane is necessary as an initial step for extraction of the cross-sectional area function. This segmentation problem is however challenging due to poor resolution of real-time speech MRI, grainy noise and the rapidly varying vocal tract shape. We present a novel approach to vocal tract airway tissue boundary tracking by training a statistical shape and appearance model for human vocal tract. We manually segment a set of vocal tract profiles and utilize a statistical approach to train a shape and appearance model for the tract. An active contour approach is employed to segment the airway tissue boundaries of the vocal tract while restricting the curve movement to the trained shape and appearance model. Then the contours in subsequent frames are tracked using dense motion estimation methods. Experimental evaluations over the mean square error metric indicate significant improvements compared to the state-of-the-art.
  • Thumbnail Image
    PublicationOpen Access
    A deep learning approach for data driven vocal tract area function estimation
    (Institute of Electrical and Electronics Engineers (IEEE), 2018) Department of Computer Engineering; Department of Electrical and Electronics Engineering; Erzin, Engin; Asadiabadi, Sasan; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Sciences; Graduate School of Sciences and Engineering; 34503; N/A
    In this paper we present a data driven vocal tract area function (VTAF) estimation using Deep Neural Networks (DNN). We approach the VTAF estimation problem based on sequence to sequence learning neural networks, where regression over a sliding window is used to learn arbitrary non-linear one-to-many mapping from the input feature sequence to the target articulatory sequence. We propose two schemes for efficient estimation of the VTAF; (1) a direct estimation of the area function values and (2) an indirect estimation via predicting the vocal tract boundaries. We consider acoustic speech and phone sequence as two possible input modalities for the DNN estimators. Experimental evaluations are performed over a large data comprising acoustic and phonetic features with parallel articulatory information from the USC-TIMIT database. Our results show that the proposed direct and indirect schemes perform the VTAF estimation with mean absolute error (MAE) rates lower than 1.65 mm, where the direct estimation scheme is observed to perform better than the indirect scheme.
  • Thumbnail Image
    PublicationOpen Access
    Vocal tract contour tracking in rtMRI using deep temporal regression network
    (Institute of Electrical and Electronics Engineers (IEEE), 2020) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Asadiabadi, Sasan; Erzin, Engin; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    Recent advances in real-time Magnetic Resonance Imaging (rtMRI) provide an invaluable tool to study speech articulation. In this paper, we present an effective deep learning approach for supervised detection and tracking of vocal tract contours in a sequence of rtMRI frames. We train a single input multiple output deep temporal regression network (DTRN) to detect the vocal tract (VT) contour and the separation boundary between different articulators. The DTRN learns the non-linear mapping from an overlapping fixed-length sequence of rtMRI frames to the corresponding articulatory movements, where a blend of the overlapping contour estimates defines the detected VT contour. The detected contour is refined at a post-processing stage using an appearance model to further improve the accuracy of VT contour detection. The proposed VT contour tracking model is trained and evaluated over the USC-TIMIT dataset. Performance evaluation is carried out using three objective assessment metrics for the separating landmark detection, contour tracking and temporal stability of the contour landmarks in comparison with three baseline approaches from the recent literature. Results indicate significant improvements with the proposed method over the state-of-the-art baselines.
  • Thumbnail Image
    PublicationOpen Access
    Automatic vocal tract landmark tracking in rtMRI using fully convolutional networks and Kalman filter
    (Institute of Electrical and Electronics Engineers (IEEE), 2020) Department of Electrical and Electronics Engineering; Erzin, Engin; Asadiabadi, Sasan; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; Graduate School of Sciences and Engineering; 34503; N/A
    Vocal tract (VT) contour detection in real time MRI is a pre-stage to many speech production related applications such as articulatory analysis and synthesis. In this work, we present an algorithm for robust detection of keypoints on the vocal tract in rtMRI sequences using fully convolutional networks (FCN) via a heatmap regression approach. We as well introduce a spatio-temporal stabilization scheme based on a combination of Principal Component Analysis (PCA) and Kalman filter (KF) to extract stable landmarks in space and time. The proposed VT landmark detection algorithm generalizes well across subjects and demonstrates significant improvement over the state of the art baselines, in terms of spatial and temporal errors.
  • Thumbnail Image
    PublicationOpen Access
    Multimodal speech driven facial shape animation using deep neural networks
    (Institute of Electrical and Electronics Engineers (IEEE), 2018) Department of Electrical and Electronics Engineering; Erzin, Engin; Sadiq, Rizwan; Asadiabadi, Sasan; Faculty Member; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; 34503; N/A; N/A
    In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
  • Thumbnail Image
    PublicationOpen Access
    Emotion dependent facial animation from affective speech
    (Institute of Electrical and Electronics Engineers (IEEE), 2020) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Sadiq, Rizwan; Asadiabadi, Sasan; Erzin, Engin; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In human-to-computer interaction, facial animation in synchrony with affective speech can deliver more naturalistic conversational agents. In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories. In the second stage, we train separate deep estimators within each emotion category to synthesize facial shape from the affective speech. Objective and subjective evaluations are performed over the SAVEE dataset. The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless of the emotion.