Researcher: Duruşan, Fırat
Name Variants
Duruşan, Fırat
Email Address
Birth Date
3 results
Search Results
Now showing 1 - 3 of 3
Publication Open Access Cross-context news corpus for protest event-related knowledge base construction(Massachusetts Institute of Technology (MIT) Press, 2021) Department of Sociology; N/A; Department of Computer Engineering; Yörük, Erdem; Hürriyetoğlu, Ali; Gürel, Burak; Duruşan, Fırat; Yoltar, Çağrı; Mutlu, Osman; Yüret, Deniz; Faculty Member; Teaching Faculty; Faculty Member; Researcher; Researcher; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; 28982; N/A; 219277; N/A; N/A; N/A; 179996We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information, constructing knowledge bases that enable comparative social and political science studies. For each news source, the annotation starts with random samples of news articles and continues with samples drawn using active learning. Each batch of samples is annotated by two social and political scientists, adjudicated by an annotation supervisor, and improved by identifying annotation errors semi-automatically. We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting, contributing to the generalizability and robustness of automated text processing systems. This corpus and the reported results will establish a common foundation in automated protest event collection studies, which is currently lacking in the literature.Publication Open Access Random sampling in corpus design: cross-context generalizability in automated multicountry protest event collection(Sage, 2021) Department of Sociology; Yörük, Erdem; Hürriyetoğlu, Ali; Duruşan, Fırat; Yoltar, Çağrı; Faculty Member; Teaching Faculty; Researcher; Department of Sociology; College of Social Sciences and Humanities; 28982; N/A; N/A; N/AWhat is the most optimal way of creating a gold standard corpus for training a machine learning system that is designed for automatically collecting protest information in a cross-country context? We show that creating a gold standard corpus for training and testing machine learning models on the basis of randomly chosen news articles from news archives yields better performance than selecting news articles on the basis of keyword filtering, which is the most prevalent method currently used in automated event coding. We advance this new bottom-up approach to ensure generalizability and reliability in cross-country comparative protest event collection from international and local news in different countries, languages, sources and time periods, which entails a large variety of event types, actors, and targets. We present the results of comparing our random-sample approach with keyword filtering. We show that the machine learning algorithms, and particularly state-of-the-art deep learning tools, perform much better when they are trained with the gold standard corpus from a randomly selected set of news articles from China, India, and South Africa. Finally, we also present our approach to overcome the major ethical issues that are intrinsic to protest event coding.Publication Open Access A task set proposal for automatic protest information collection across multiple countries(Springer, 2019) Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Yoltar, Çağrı; Yüret, Deniz; Gürel, Burak; Duruşan, Fırat; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Faculty Member; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; Graduate School of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/A; 179996; 219277; N/A; N/AWe propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.