Publication: The causal news corpus: annotating causal relations in event sentences from news
Program
KU-Authors
KU Authors
Co-Authors
Tan, Fiona Anting
Caselli, Tommaso
Oostdijk, Nelleke
Nomoto, Tadashi
Hettiarachchi, Hansi
Ameer, Iqra
Uca, Onur
Liza, Farhana Ferdousi
Hu, Tiancheng
Editor & Affiliation
Compiler & Affiliation
Translator
Other Contributor
Date
Language
Embargo Status
Journal Title
Journal ISSN
Volume Title
Alternative Title
Abstract
Despite the importanceofunderstandingcausality, corporaaddressingcausal relationsare limited. There isadiscrepancy betweenexistingannotationguidelinesofeventcausalityandconventionalcausalitycorporathat focusmoreonlinguistics. Manyguidelinesrestrict themselvestoincludeonlyexplicit relationsorclause-basedarguments. Therefore,weproposean annotationschemaforeventcausalitythataddressestheseconcerns.Weannotated3,559eventsentencesfromprotestevent newswithlabelsonwhether itcontainscausal relationsornot. OurcorpusisknownastheCausalNewsCorpus(CNC).A neuralnetworkbuiltuponastate-of-the-artpre-trainedlanguagemodelperformedwellwith81.20%F1scoreontest set, and83.46%in5-foldscross-validation. CNCistransferableacrosstwoexternalcorpora:CausalTimeBank(CTB)andPenn DiscourseTreebank(PDTB).Leveragingeachoftheseexternaldatasetsfortraining,weachieveduptoapproximately64%F1 ontheCNCtestsetwithoutadditionalfine-tuning. CNCalsoservedasaneffectivetrainingandpre-trainingdataset for the twoexternalcorpora. Lastly,wedemonstratethedifficultyofourtasktothelaymaninacrowd-sourcedannotationexercise. Ourannotatedcorpusispubliclyavailable,providingavaluableresourceforcausaltextminingresearchers.
Source
Publisher
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Subject
Computer Science, Interdisciplinary applications, Linguistics
Citation
Has Part
Source
LREC 2022: Thirteen International Conference on Language Resources and Evaluation
