Publications with Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6

Browse

Search Results

Now showing 1 - 10 of 21
  • Thumbnail Image
    PublicationOpen Access
    Discovering Black Lives Matter events in the United States: Shared Task 3, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Giorgi, Salvatore; Zavarella, Vanni; Tanev, Hristo; Stefanovitch, Nicolas; Hwang, Sy; Hettiarachchi, Hansi; Ranasinghe, Tharindu; Kalyan, Vivek; Tan, Paul; Tan, Shaun; Andrews, Martin; Hu, Tiancheng; Stoehr, Niklas; Re, Francesco Ignazio; Vegh, Daniel; Atzenhofer, Dennis; Curtis, Brenda; Department of Sociology; Hürriyetoğlu, Ali; Teaching Faculty; Department of Sociology; College of Social Sciences and Humanities
    Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events ""in the wild"" from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, assessing each system's ability to evolution of protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 (Spearman) and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall (max.5.08), confirming the high impact of media sourcing in the modelling of protest movements.
  • Thumbnail Image
    PublicationOpen Access
    A deep learning approach for data driven vocal tract area function estimation
    (Institute of Electrical and Electronics Engineers (IEEE), 2018) Department of Computer Engineering; Department of Electrical and Electronics Engineering; Erzin, Engin; Asadiabadi, Sasan; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Sciences; Graduate School of Sciences and Engineering; 34503; N/A
    In this paper we present a data driven vocal tract area function (VTAF) estimation using Deep Neural Networks (DNN). We approach the VTAF estimation problem based on sequence to sequence learning neural networks, where regression over a sliding window is used to learn arbitrary non-linear one-to-many mapping from the input feature sequence to the target articulatory sequence. We propose two schemes for efficient estimation of the VTAF; (1) a direct estimation of the area function values and (2) an indirect estimation via predicting the vocal tract boundaries. We consider acoustic speech and phone sequence as two possible input modalities for the DNN estimators. Experimental evaluations are performed over a large data comprising acoustic and phonetic features with parallel articulatory information from the USC-TIMIT database. Our results show that the proposed direct and indirect schemes perform the VTAF estimation with mean absolute error (MAE) rates lower than 1.65 mm, where the direct estimation scheme is observed to perform better than the indirect scheme.
  • Thumbnail Image
    PublicationOpen Access
    End to end rate distortion optimized learned hierarchical bi-directional video compression
    (Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Yılmaz, Mustafa Akın; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207; N/A
    Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/.
  • Thumbnail Image
    PublicationOpen Access
    PROTEST-ER: retraining BERT for protest event extraction
    (Association for Computational Linguistics (ACL), 2021) Caselli, Tommaso; Basile, Angelo; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Mutlu, Osman; Teaching Faculty; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of Engineering
    We analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.
  • Thumbnail Image
    PublicationOpen Access
    Emotion dependent domain adaptation for speech driven affective facial feature synthesis
    (Institute of Electrical and Electronics Engineers (IEEE), 2022) Department of Electrical and Electronics Engineering; Erzin, Engin; Sadiq, Rizwan; Faculty Member; Department of Electrical and Electronics Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; 34503; N/A
    Although speech driven facial animation has been studied extensively in the literature, works focusing on the affective content of the speech are limited. This is mostly due to the scarcity of affective audio-visual data. In this article, we improve the affective facial animation using domain adaptation by partially reducing the data scarcity. We first define a domain adaptation to map affective and neutral speech representations to a common latent space in which cross-domain bias is smaller. Then the domain adaptation is used to augment affective representations for each emotion category, including angry, disgust, fear, happy, sad, surprise, and neutral, so that we can better train emotion-dependent deep audio-to-visual (A2V) mapping models. Based on the emotion-dependent deep A2V models, the proposed affective facial synthesis system is realized in two stages: first, speech emotion recognition extracts soft emotion category likelihoods for the utterances; then a soft fusion of the emotion-dependent A2V mapping outputs form the affective facial synthesis. Experimental evaluations are performed on the SAVEE audio-visual dataset. The proposed models are assessed with objective and subjective evaluations. The proposed affective A2V system achieves significant MSE loss improvements in comparison to the recent literature. Furthermore, the resulting facial animations of the proposed system are preferred over the baseline animations in the subjective evaluations.
  • Thumbnail Image
    PublicationOpen Access
    On training sketch recognizers for new domains
    (Institute of Electrical and Electronics Engineers (IEEE), 2021) Yeşilbek, Kemal Tuğrul; Department of Computer Engineering; Sezgin, Tevfik Metin; Faculty Member; Department of Computer Engineering; College of Engineering; 18632
    Sketch recognition algorithms are engineered and evaluated using publicly available datasets contributed by the sketch recognition community over the years. While existing datasets contain sketches of a limited set of generic objects, each new domain inevitably requires collecting new data for training domain specific recognizers. This gives rise to two fundamental concerns: First, will the data collection protocol yield ecologically valid data? Second, will the amount of collected data suffice to train sufficiently accurate classifiers? In this paper, we draw attention to these two concerns. We show that the ecological validity of the data collection protocol and the ability to accommodate small datasets are significant factors impacting recognizer accuracy in realistic scenarios. More specifically, using sketch-based gaming as a use case, we show that deep learning methods, as well as more traditional methods, suffer significantly from dataset shift. Furthermore, we demonstrate that in realistic scenarios where data is scarce and expensive, standard measures taken for adapting deep learners to small datasets fall short of comparing favorably with alternatives. Although transfer learning, and extensive data augmentation help deep learners, they still perform significantly worse compared to standard setups (e.g., SVMs and GBMs with standard feature representations). We pose learning from small datasets as a key problem for the deep sketch recognition field, one which has been ignored in the bulk of the existing literature.
  • Thumbnail Image
    PublicationOpen Access
    Federated dropout learning for hybrid beamforming with spatial path index modulation in multi-user MMWave-MIMO systems
    (Institute of Electrical and Electronics Engineers (IEEE), 2021) Mishra, Kumar Vijay; Department of Electrical and Electronics Engineering; Ergen, Sinem Çöleri; Elbir, Ahmet Musab; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 7211; N/A
    Millimeter wave multiple-input multiple-output (mmWave-MIMO) systems with small number of radio-frequency (RF) chains have limited multiplexing gain. Spatial path index modulation (SPIM) is helpful in improving this gain by utilizing additional signal bits modulated by the indices of spatial paths. In this paper, we introduce model-based and model-free frameworks for beamformer design in multi-user SPIM-MIMO systems. We first design the beamformers via model-based manifold optimization algorithm. Then, we leverage federated learning (FL) with dropout learning (DL) to train a learning model on the local dataset of users, who estimate the beamformers by feeding the model with their channel data. The DL randomly selects different set of model parameters during training, thereby further reducing the transmission overhead compared to conventional FL. Numerical experiments show that the proposed framework exhibits higher spectral efficiency than the state-of-the-art SPIM-MIMO methods and mmWave-MIMO, which relies on the strongest propagation path. Furthermore, the proposed FL approach provides at least 10 times lower transmission overhead than the centralized learning techniques.
  • Thumbnail Image
    PublicationOpen Access
    Tri-op redactable blockchains with block modification, removal, and insertion
    (TÜBİTAK, 2022) Dousti, Mohammad Sadeq; Department of Computer Engineering; Küpçü, Alptekin; Faculty Member; Department of Computer Engineering; College of Engineering; 168060
    In distributed computations and cryptography, it is desirable to record events on a public ledger, such that later alterations are computationally infeasible. An implementation of this idea is called blockchain, which is a distributed protocol that allows the creation of an immutable ledger. While such an idea is very appealing, the ledger may be contaminated with incorrect, illegal, or even dangerous data, and everyone running the blockchain protocol has no option but to store and propagate the unwanted data. The ledger is bloated over time, and it is not possible to remove redundant information. Finally, missing data cannot be inserted later. Redactable blockchains were invented to allow the ledger to be mutated in a controlled manner. To date, redactable blockchains support at most two types of redactions: block modification and removal. The next logical step is to support block insertions. However, we show that this seemingly innocuous enhancement renders all previous constructs insecure. We put forward a model for blockchains supporting all three redaction operations and construct a blockchain that is provably secure under this formal definition.
  • Thumbnail Image
    PublicationOpen Access
    AfriKI: machine-in-the-loop Afrikaans poetry generation
    (Association for Computational Linguistics (ACL), 2021) Baş, Anıl; Department of Comparative Literature; van Heerden, Imke; Other; Department of Comparative Literature; College of Social Sciences and Humanities; 318142
    This paper proposes a generative language model called AfriKI. Our approach is based on an LSTM architecture trained on a small corpus of contemporary fiction. With the aim of promoting human creativity, we use the model as an authoring tool to explore machine-in-the-loop Afrikaans poetry generation. To our knowledge, this is the first study to attempt creative text generation in Afrikaans.
  • Thumbnail Image
    PublicationOpen Access
    Multilingual protest news detection - shared task 1, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/A
    Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.