Publications without Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/3

Browse

Search Results

Now showing 1 - 5 of 5
  • Placeholder
    Publication
    Challenges and applications of automated extraction of socio-political events from text (case 2021): workshop and shared task report
    (Association for Computational Linguistics (ACL), 2021) Tanev, Hristo; Zavarella, Vanni; Piskorski, Jakub; Yeniterzi, Reyyan; Villavicencio, Aline; Department of Sociology; Department of Sociology; N/A; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Yüret, Deniz; Teaching Faculty; Faculty Member; PhD Student; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; N/A; 28982; N/A; 179996
    This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.
  • Placeholder
    Publication
    The causal news corpus: annotating causal relations in event sentences from news
    (EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, 2022) Tan, Fiona Anting; Caselli, Tommaso; Oostdijk, Nelleke; Nomoto, Tadashi; Hettiarachchi, Hansi; Ameer, Iqra; Uca, Onur; Liza, Farhana Ferdousi; Hu, Tiancheng; Department of Sociology; Hürriyetoğlu, Ali; Teaching Faculty; Department of Sociology; College of Social Sciences and Humanities; N/A
    Despite the importanceofunderstandingcausality, corporaaddressingcausal relationsare limited. There isadiscrepancy betweenexistingannotationguidelinesofeventcausalityandconventionalcausalitycorporathat focusmoreonlinguistics. Manyguidelinesrestrict themselvestoincludeonlyexplicit relationsorclause-basedarguments. Therefore,weproposean annotationschemaforeventcausalitythataddressestheseconcerns.Weannotated3,559eventsentencesfromprotestevent newswithlabelsonwhether itcontainscausal relationsornot. OurcorpusisknownastheCausalNewsCorpus(CNC).A neuralnetworkbuiltuponastate-of-the-artpre-trainedlanguagemodelperformedwellwith81.20%F1scoreontest set, and83.46%in5-foldscross-validation. CNCistransferableacrosstwoexternalcorpora:CausalTimeBank(CTB)andPenn DiscourseTreebank(PDTB).Leveragingeachoftheseexternaldatasetsfortraining,weachieveduptoapproximately64%F1 ontheCNCtestsetwithoutadditionalfine-tuning. CNCalsoservedasaneffectivetrainingandpre-trainingdataset for the twoexternalcorpora. Lastly,wedemonstratethedifficultyofourtasktothelaymaninacrowd-sourcedannotationexercise. Ourannotatedcorpusispubliclyavailable,providingavaluableresourceforcausaltextminingresearchers.
  • Placeholder
    Publication
    Multilingual protest news detection - shared task 1, CASE 2021
    (Assoc Computational Linguistics-Acl, 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; N/A; Department of Sociology; Hürriyetoğlu, Ali; Mutlu, Osman; Yörük, Erdem; Teaching Faculty; PhD Student; Faculty Member; Department of Sociology; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Social Sciences and Humanities; N/A; N/A; 28982
    Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
  • Placeholder
    Publication
    Discovering black lives matter events in the United States: shared task 3, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Giorgi, Salvatore; Zavarella, Vanni; Tanev, Hristo; Stefanovitch, Nicolas; Hwang, Sy; Hettiarachchi, Hansi; Ranasinghe, Tharindu; Kalyan, Vivek; Tan, Paul; Tan, Shaun; Andrews, Martin; Hu, Tiancheng; Stoehr, Niklas; Re, Francesco Ignazio; Vegh, Daniel; Atzenhofer, Dennis; Curtis, Brenda; Department of Sociology; Hürriyetoğlu, Ali; Teaching Faculty; Department of Sociology; College of Social Sciences and Humanities; N/A
    Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events "in the wild" from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, assessing each system's ability to evolution of protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 (Spearman) and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall (max.5.08), confirming the high impact of media sourcing in the modelling of protest movements.
  • Placeholder
    Publication
    PROTEST-ER: retraining BERT for protest event extraction
    (Assoc Computational Linguistics-Acl, 2021) Caselli, Tommaso; Basile, Angelo; N/A; Department of Sociology; Mutlu, Osman; Hürriyetoğlu, Ali; PhD Student; Teaching Faculty; Department of Sociology; Graduate School of Sciences and Engineering; College of Social Sciences and Humanities; N/A; N/A
    We analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.