Researcher:
Mutlu, Osman

Loading...
Profile Picture
ORCID

Job Title

PhD Student

First Name

Osman

Last Name

Mutlu

Name

Name Variants

Mutlu, Osman

Email Address

Birth Date

Search Results

Now showing 1 - 10 of 11
  • Placeholder
    Publication
    Challenges and applications of automated extraction of socio-political events from text (case 2021): workshop and shared task report
    (Association for Computational Linguistics (ACL), 2021) Tanev, Hristo; Zavarella, Vanni; Piskorski, Jakub; Yeniterzi, Reyyan; Villavicencio, Aline; Department of Sociology; Department of Sociology; N/A; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Yüret, Deniz; Teaching Faculty; Faculty Member; PhD Student; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; N/A; 28982; N/A; 179996
    This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.
  • Placeholder
    Publication
    Team Howard Beale at SemEval-2019 task 4: hyperpartisan news detection with BERT
    (Association for Computational Linguistics (ACL), 2019) Dayanık, Erenay; Mutlu, Osman; Can, Ozan Arkan; PhD Student; PhD Student; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering
    This paper describes our system for SemEval-2019 Task 4: Hyperpartisan News Detection (Kiesel et al., 2019). We use pretrained BERT (Devlin et al., 2018) architecture and investigate the effect of different fine tuning regimes on the final classification task. We show that additional pretraining on news domain improves the performance on the Hyperpartisan News Detection task. Our system1 ranked 8th out of 42 teams with 78.3% accuracy on the held-out test dataset.
  • Placeholder
    Publication
    Multilingual protest news detection - shared task 1, CASE 2021
    (Assoc Computational Linguistics-Acl, 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; N/A; Department of Sociology; Hürriyetoğlu, Ali; Mutlu, Osman; Yörük, Erdem; Teaching Faculty; PhD Student; Faculty Member; Department of Sociology; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Social Sciences and Humanities; N/A; N/A; 28982
    Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
  • Placeholder
    Publication
    PROTEST-ER: retraining BERT for protest event extraction
    (Assoc Computational Linguistics-Acl, 2021) Caselli, Tommaso; Basile, Angelo; N/A; Department of Sociology; Mutlu, Osman; Hürriyetoğlu, Ali; PhD Student; Teaching Faculty; Department of Sociology; Graduate School of Sciences and Engineering; College of Social Sciences and Humanities; N/A; N/A
    We analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.
  • Thumbnail Image
    PublicationOpen Access
    PROTEST-ER: retraining BERT for protest event extraction
    (Association for Computational Linguistics (ACL), 2021) Caselli, Tommaso; Basile, Angelo; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Mutlu, Osman; Teaching Faculty; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of Engineering
    We analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.
  • Thumbnail Image
    PublicationOpen Access
    Team Howard Beale at SemEval-2019 task 4: hyperpartisan news detection with BERT
    (Association for Computational Linguistics (ACL), 2019) Dayanık, Erenay; Department of Computer Engineering; Mutlu, Osman; Can, Ozan Arkan; Researcher; Department of Computer Engineering; Graduate School of Sciences and Engineering
    This paper describes our system for SemEval-2019 Task 4: Hyperpartisan News Detection (Kiesel et al., 2019). We use pretrained BERT (Devlin et al., 2018) architecture and investigate the effect of different fine tuning regimes on the final classification task. We show that additional pretraining on news domain improves the performance on the Hyperpartisan News Detection task. Our system1 ranked 8th out of 42 teams with 78.3% accuracy on the held-out test dataset.
  • Thumbnail Image
    PublicationOpen Access
    Cross-context news corpus for protest event-related knowledge base construction
    (Massachusetts Institute of Technology (MIT) Press, 2021) Department of Sociology; N/A; Department of Computer Engineering; Yörük, Erdem; Hürriyetoğlu, Ali; Gürel, Burak; Duruşan, Fırat; Yoltar, Çağrı; Mutlu, Osman; Yüret, Deniz; Faculty Member; Teaching Faculty; Faculty Member; Researcher; Researcher; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; 28982; N/A; 219277; N/A; N/A; N/A; 179996
    We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information, constructing knowledge bases that enable comparative social and political science studies. For each news source, the annotation starts with random samples of news articles and continues with samples drawn using active learning. Each batch of samples is annotated by two social and political scientists, adjudicated by an annotation supervisor, and improved by identifying annotation errors semi-automatically. We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting, contributing to the generalizability and robustness of automated text processing systems. This corpus and the reported results will establish a common foundation in automated protest event collection studies, which is currently lacking in the literature.
  • Thumbnail Image
    PublicationOpen Access
    Multilingual protest news detection - shared task 1, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/A
    Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
  • Thumbnail Image
    PublicationOpen Access
    Challenges and applications of automated extraction of socio-political events from text (CASE 2021): workshop and shared task report
    (Association for Computational Linguistics (ACL), 2021) Tanev, Hristo; Zavarella, Vanni; Piskorski, Jakub; Yeniterzi, Reyyan; Villavicencio, Aline; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Yüret, Deniz; Teaching Faculty; Faculty Member; Researcher; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; N/A; 28982; N/A; 179996
    This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.
  • Thumbnail Image
    PublicationOpen Access
    A task set proposal for automatic protest information collection across multiple countries
    (Springer, 2019) Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Yoltar, Çağrı; Yüret, Deniz; Gürel, Burak; Duruşan, Fırat; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Faculty Member; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; Graduate School of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/A; 179996; 219277; N/A; N/A
    We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.