Publication:
Multilingual protest news detection - shared task 1, CASE 2021

dc.contributor.coauthorLiza, Farhana Ferdousi
dc.contributor.coauthorKumar, Ritesh
dc.contributor.coauthorRatan, Shyam
dc.contributor.departmentDepartment of Sociology
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorHürriyetoğlu, Ali
dc.contributor.kuauthorMutlu, Osman
dc.contributor.kuauthorYörük, Erdem
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Social Sciences and Humanities
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2024-11-09T12:27:09Z
dc.date.issued2021
dc.description.abstractBenchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
dc.description.fulltextYES
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuEU
dc.description.sponsorshipEuropean Union (EU)
dc.description.sponsorshipHorizon 2020
dc.description.sponsorshipEuropean Research Council (ERC)
dc.description.sponsorshipEuropean Commission (EC)
dc.description.sponsorshipBusiness and Local Government Data Research Centre
dc.description.sponsorshipEconomic and Social Research Council (ESRC)
dc.description.versionPublisher version
dc.identifier.doi10.18653/v1/2021.case-1.11
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR03277
dc.identifier.isbn978-1-954085-79-4
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85110579899
dc.identifier.urihttps://hdl.handle.net/20.500.14288/1734
dc.identifier.wos694853100011
dc.keywordsEmbedding
dc.keywordsNamed entity recognition
dc.keywordsEntailment
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics (ACL)
dc.relation.grantno714868
dc.relation.grantnoES/S007156/1
dc.relation.ispartofProceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10061
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectInterdisciplinary applications
dc.subjectLinguistics
dc.titleMultilingual protest news detection - shared task 1, CASE 2021
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorHürriyetoğlu, Ali
local.contributor.kuauthorYörük, Erdem
local.contributor.kuauthorMutlu, Osman
local.publication.orgunit1College of Social Sciences and Humanities
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1College of Engineering
local.publication.orgunit2Department of Sociology
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication10f5be47-fab1-42a1-af66-1642ba4aff8e
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery10f5be47-fab1-42a1-af66-1642ba4aff8e
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication3f7621e3-0d26-42c2-af64-58a329522794
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
10061.pdf
Size:
302.75 KB
Format:
Adobe Portable Document Format