Multilingual protest news detection - shared task 1, CASE 2021

Publication:
Multilingual protest news detection - shared task 1, CASE 2021

dc.contributor.coauthor	Liza, Farhana Ferdousi
dc.contributor.coauthor	Kumar, Ritesh
dc.contributor.coauthor	Ratan, Shyam
dc.contributor.department	Department of Sociology
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Hürriyetoğlu, Ali
dc.contributor.kuauthor	Mutlu, Osman
dc.contributor.kuauthor	Yörük, Erdem
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	College of Social Sciences and Humanities
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T12:27:09Z
dc.date.issued	2021
dc.description.abstract	Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
dc.description.fulltext	YES
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	EU
dc.description.sponsorship	European Union (EU)
dc.description.sponsorship	Horizon 2020
dc.description.sponsorship	European Research Council (ERC)
dc.description.sponsorship	European Commission (EC)
dc.description.sponsorship	Business and Local Government Data Research Centre
dc.description.sponsorship	Economic and Social Research Council (ESRC)
dc.description.studentonlypublication	No
dc.description.studentpublication	Yes
dc.description.version	Publisher version
dc.identifier.WoSQuartile	N/A
dc.identifier.doi	10.18653/v1/2021.case-1.11
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR03277
dc.identifier.isbn	978-1-954085-79-4
dc.identifier.scopus	2-s2.0-85110579899
dc.identifier.uri	https://doi.org/10.18653/v1/2021.case-1.11
dc.identifier.wos	694853100011
dc.keywords	Embedding
dc.keywords	Named entity recognition
dc.keywords	Entailment
dc.language.iso	eng
dc.publisher	Association for Computational Linguistics (ACL)
dc.relation.grantno	714868
dc.relation.grantno	ES/S007156/1
dc.relation.ispartof	Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10061
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject	Interdisciplinary applications
dc.subject	Linguistics
dc.title	Multilingual protest news detection - shared task 1, CASE 2021
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Hürriyetoğlu, Ali
local.contributor.kuauthor	Yörük, Erdem
local.contributor.kuauthor	Mutlu, Osman
relation.isGoalOfPublication	0e554614-34c1-41f1-b6c4-0096c1d59305
relation.isGoalOfPublication.latestForDiscovery	0e554614-34c1-41f1-b6c4-0096c1d59305
relation.isOrgUnitOfPublication	10f5be47-fab1-42a1-af66-1642ba4aff8e
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	10f5be47-fab1-42a1-af66-1642ba4aff8e
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	3f7621e3-0d26-42c2-af64-58a329522794
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10061.pdf
Size:: 302.75 KB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Multilingual protest news detection - shared task 1, CASE 2021

Files

Original bundle

Collections

Publication:
Multilingual protest news detection - shared task 1, CASE 2021