Publication: The causal news corpus: annotating causal relations in event sentences from news
Program
KU-Authors
KU Authors
Co-Authors
Tan, Fiona Anting
Caselli, Tommaso
Oostdijk, Nelleke
Nomoto, Tadashi
Hettiarachchi, Hansi
Ameer, Iqra
Uca, Onur
Liza, Farhana Ferdousi
Hu, Tiancheng
Publication Date
Language
Embargo Status
Journal Title
Journal ISSN
Volume Title
Alternative Title
Abstract
Despite the importanceofunderstandingcausality, corporaaddressingcausal relationsare limited. There isadiscrepancy betweenexistingannotationguidelinesofeventcausalityandconventionalcausalitycorporathat focusmoreonlinguistics. Manyguidelinesrestrict themselvestoincludeonlyexplicit relationsorclause-basedarguments. Therefore,weproposean annotationschemaforeventcausalitythataddressestheseconcerns.Weannotated3,559eventsentencesfromprotestevent newswithlabelsonwhether itcontainscausal relationsornot. OurcorpusisknownastheCausalNewsCorpus(CNC).A neuralnetworkbuiltuponastate-of-the-artpre-trainedlanguagemodelperformedwellwith81.20%F1scoreontest set, and83.46%in5-foldscross-validation. CNCistransferableacrosstwoexternalcorpora:CausalTimeBank(CTB)andPenn DiscourseTreebank(PDTB).Leveragingeachoftheseexternaldatasetsfortraining,weachieveduptoapproximately64%F1 ontheCNCtestsetwithoutadditionalfine-tuning. CNCalsoservedasaneffectivetrainingandpre-trainingdataset for the twoexternalcorpora. Lastly,wedemonstratethedifficultyofourtasktothelaymaninacrowd-sourcedannotationexercise. Ourannotatedcorpusispubliclyavailable,providingavaluableresourceforcausaltextminingresearchers.
Source
Publisher
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Subject
Computer Science, Interdisciplinary applications, Linguistics
Citation
Has Part
Source
LREC 2022: Thirteen International Conference on Language Resources and Evaluation