Publications with Fulltext
Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6
Browse
10 results
Search Results
Publication Open Access Discovering Black Lives Matter events in the United States: Shared Task 3, CASE 2021(Association for Computational Linguistics (ACL), 2021) Giorgi, Salvatore; Zavarella, Vanni; Tanev, Hristo; Stefanovitch, Nicolas; Hwang, Sy; Hettiarachchi, Hansi; Ranasinghe, Tharindu; Kalyan, Vivek; Tan, Paul; Tan, Shaun; Andrews, Martin; Hu, Tiancheng; Stoehr, Niklas; Re, Francesco Ignazio; Vegh, Daniel; Atzenhofer, Dennis; Curtis, Brenda; Department of Sociology; HĆ¼rriyetoÄlu, Ali; Teaching Faculty; Department of Sociology; College of Social Sciences and HumanitiesEvaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events ""in the wild"" from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, assessing each system's ability to evolution of protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 (Spearman) and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall (max.5.08), confirming the high impact of media sourcing in the modelling of protest movements.Publication Open Access Does time extend asymmetrically into the past and the future? a multitask crosscultural study(Cambridge University Press (CUP), 2022) Callizo-Romero, Carmen; Tutnjevic, Slavica; Pandza, Maja; Ouellet, Marc; Kranjec, Alexander; Ilic, Sladjana; Gu, Yan; Chahboun, Sobh; Casasanto, Daniel; Santiago, Julio; Department of Psychology; Gƶksun, Tilbe; Faculty Member; Department of Psychology; College of Social Sciences and Humanities; 47278Does temporal thought extend asymmetrically into the past and the future? Do asymmetries depend on cultural differences in temporal focus? Some studies suggest that people in Western (arguably future-focused) cultures perceive the future as being closer, more valued, and deeper than the past (a future asymmetry), while the opposite is shown in East Asian (arguably past-focused) cultures. The proposed explanations of these findings predict a negative relationship between past and future: the more we delve into the future, the less we delve into the past. Here, we report findings that pose a significant challenge to this view. We presented several tasks previously used to measure temporal asymmetry (self-continuity, time discounting, temporal distance, and temporal depth) and two measures of temporal focus to American, Spanish, Serbian, Bosniak, Croatian, Moroccan, Turkish, and Chinese participants (total N = 1,075). There was an overall future asymmetry in all tasks except for temporal distance, but the asymmetry only varied with cultural temporal focus in time discounting. Past and future held a positive (instead of negative) relation in the mind: the more we delve into the future, the more we delve into the past. Finally, the findings suggest that temporal thought has a complex underlying structure.Publication Open Access PROTEST-ER: retraining BERT for protest event extraction(Association for Computational Linguistics (ACL), 2021) Caselli, Tommaso; Basile, Angelo; Department of Sociology; Department of Computer Engineering; HĆ¼rriyetoÄlu, Ali; Mutlu, Osman; Teaching Faculty; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of EngineeringWe analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.Publication Open Access Linguistic and nonlinguistic evaluation of motion events in a path-focused language(Cambridge University Press (CUP), 2022) Aktan Erciyes, Aslı; Department of Psychology; AkbuÄa, YiÄitcan Emir; Dik, Feyza Nur; Gƶksun, Tilbe; Faculty Member; Department of Psychology; Graduate School of Social Sciences and Humanities; College of Social Sciences and Humanities; N/A; N/A; 47278This study examines how properties of path (the trajectory of motion) and manner (how an action is performed) components of motion events are reflected in linguistic and nonlinguistic motion event conceptualization in a path-focused language, Turkish. In two experiments, we investigated how path and manner differed in salience (i.e., prominence) and ease of expression (EoE, i.e., effort of describing), and how these factors were related to lexicalization and similarity judgments of motion events. In Experiment 1, participants rated motion events based on path and manner salience and EoE and expressed path and manner in a written format. Results indicated that manner was rated as more salient and path as easier to express. Path salience and EoE were related to both types (i.e., number of different expressions) and the total number of paths and manners used. However, manner EoE but not salience was associated with only types and the total number of manners used. In Experiment 2, participants rated the similarity of motion event pairs created using the ratings in Experiment 1. We found that higher manner salience and EoE difference were associated with lower similarity ratings. These findings suggest that salience and EoE of path and manner are related to both linguistic and nonlinguistic aspects of motion event conceptualization.Publication Open Access Do typological differences in the expression of causality influence preschool children's causal event construal?(Cambridge University Press (CUP), 2022) Ger, Ebru; Stoll, Sabine; Daum, Moritz M.; Department of Psychology; KĆ¼ntay, Aylin C.; Gƶksun, Tilbe; Faculty Member; Department of Psychology; College of Social Sciences and Humanities; 178879; 47278This study investigated whether cross-linguistic differences in causal expressions influence the mapping of causal language on causal events in three- to four-year-old Swiss-German learners and Turkish learners. In Swiss-German, causality is mainly expressed syntactically with lexical causatives (e.g., asse 'to eat' vs. fuettere 'to feed'). In Turkish, causality is expressed both syntactically and morphologically - with a verbal suffix (e.g., yemek 'to eat' vs. yeDIRmek 'to feed'). Moreover, unlike Swiss-German, Turkish allows argument ellipsis (e.g., 'The mother feeds empty set). Here, we used pseudo-verbs to test whether and how well Swiss-German-learning children inferred a causal meaning from lexical causatives compared to Turkish-learning children tested in three conditions: lexical causatives, morphological causatives, and morphological causatives with object ellipsis. Swiss-German-learning children and Turkish-learning children in all three conditions reliably inferred causal meanings, and did so to a similar extent. The findings suggest that, as young as age 3, children learning two different languages similarly make use of language-specific causality cues (syntactic and morphological alike) to infer causal meanings.Publication Open Access Multilingual protest news detection - shared task 1, CASE 2021(Association for Computational Linguistics (ACL), 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; Department of Computer Engineering; HĆ¼rriyetoÄlu, Ali; YƶrĆ¼k, Erdem; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/ABenchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.Publication Open Access Challenges and applications of automated extraction of socio-political events from text (CASE 2021): workshop and shared task report(Association for Computational Linguistics (ACL), 2021) Tanev, Hristo; Zavarella, Vanni; Piskorski, Jakub; Yeniterzi, Reyyan; Villavicencio, Aline; Department of Sociology; Department of Computer Engineering; HĆ¼rriyetoÄlu, Ali; YƶrĆ¼k, Erdem; Mutlu, Osman; YĆ¼ret, Deniz; Teaching Faculty; Faculty Member; Researcher; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; N/A; 28982; N/A; 179996This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.Publication Open Access Early parental causal language input predicts children's later causal verb understanding(Cambridge University Press (CUP), 2021) Aktan Erciyes, Aslı; Department of Psychology; Gƶksun, Tilbe; Faculty Member; Department of Psychology; College of Social Sciences and Humanities; 47278How does parental causal input relate to children's later comprehension of causal verbs? Causal constructions in verbs differ across languages. Turkish has both lexical and morphological causatives. We asked whether (1) parental causal language input varied for different types of play (guided vs. free play), (2) early parental causal language input predicted children's causal verb understanding. Twenty-nine infants participated at three timepoints. Parents used lexical causatives more than morphological ones for guided-play for both timepoints, but for free-play, the same difference was only found at Time 2. For Time 3, children were tested on a verb comprehension and a vocabulary task. Morphological causative input, but not lexical causative input, during free-play predicted children's causal verb comprehension. For guided-play, the same relation did not hold. Findings suggest a role of specific types of causal input on children's understanding of causal verbs that are received in certain play contexts.Publication Open Access Artificial bandwidth extension of spectral envelope along a Viterbi path(Elsevier, 2013) Department of Computer Engineering; YaÄlı, Can; Turan, Mehmet Ali TuÄtekin; Erzin, Engin; Master Student; Faculty Member; Department of Computer Engineering; College of Engineering; N/A; N/A; 34503In this paper, we propose a hidden Markov model (HMM)-based wideband spectral envelope estimation method for the artificial bandwidth extension problem. The proposed HMM-based estimator decodes an optimal Viterbi path based on the temporal contour of the narrowband spectral envelope and then performs the minimum mean square error (MMSE) estimation of the wideband spectral envelope on this path. Experimental evaluations are performed to compare the proposed estimator to the state-of-the-art HMM and Gaussian mixture model based estimators using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wideband perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed wideband spectral envelope estimator consistently improves performances over the state-of-the-art estimators. (C) 2012 Elsevier B.V. All rights reserved.Publication Open Access Mukayese: Turkish NLP strikes back(Association for Computational Linguistics (ACL), 2022) KurtuluÅ, Emirhan; GƶktoÄan, Arda; Department of Computer Engineering; YĆ¼ret, Deniz; Safaya, Ali; Faculty Member; Department of Computer Engineering; KoƧ Ćniversitesi Ä°Å Bankası Yapay Zeka Uygulama ve AraÅtırma Merkezi (KUIS AI)/ KoƧ University Ä°Å Bank Artificial Intelligence Center (KUIS AI); College of Engineering; Graduate School of Sciences and Engineering; 179996; N/AHaving sufficient resources for language X lifts it from the under-resourced languages class, but not necessarily from the under-researched class. In this paper, we address the problem of the absence of organized benchmarks in the Turkish language. We demonstrate that languages such as Turkish are left behind the state-of-the-art in NLP applications. As a solution, we present MUKAYESE, a set of NLP benchmarks for the Turkish language that contains several NLP tasks. We work on one or more datasets for each benchmark and present two or more baselines. Moreover, we present four new bench-marking datasets in Turkish for language modeling, sentence segmentation, and spell checking.