Publication: The causal news corpus: annotating causal relations in event sentences from news
Program
KU-Authors
KU Authors
Co-Authors
Tan, Fiona Anting
Caselli, Tommaso
Oostdijk, Nelleke
Nomoto, Tadashi
Hettiarachchi, Hansi
Ameer, Iqra
Uca, Onur
Liza, Farhana Ferdousi
Hu, Tiancheng
Advisor
Publication Date
2022
Language
English
Type
Conference proceeding
Journal Title
Journal ISSN
Volume Title
Abstract
Despite the importanceofunderstandingcausality, corporaaddressingcausal relationsare limited. There isadiscrepancy betweenexistingannotationguidelinesofeventcausalityandconventionalcausalitycorporathat focusmoreonlinguistics. Manyguidelinesrestrict themselvestoincludeonlyexplicit relationsorclause-basedarguments. Therefore,weproposean annotationschemaforeventcausalitythataddressestheseconcerns.Weannotated3,559eventsentencesfromprotestevent newswithlabelsonwhether itcontainscausal relationsornot. OurcorpusisknownastheCausalNewsCorpus(CNC).A neuralnetworkbuiltuponastate-of-the-artpre-trainedlanguagemodelperformedwellwith81.20%F1scoreontest set, and83.46%in5-foldscross-validation. CNCistransferableacrosstwoexternalcorpora:CausalTimeBank(CTB)andPenn DiscourseTreebank(PDTB).Leveragingeachoftheseexternaldatasetsfortraining,weachieveduptoapproximately64%F1 ontheCNCtestsetwithoutadditionalfine-tuning. CNCalsoservedasaneffectivetrainingandpre-trainingdataset for the twoexternalcorpora. Lastly,wedemonstratethedifficultyofourtasktothelaymaninacrowd-sourcedannotationexercise. Ourannotatedcorpusispubliclyavailable,providingavaluableresourceforcausaltextminingresearchers.
Description
Source:
LREC 2022: Thirteen International Conference on Language Resources and Evaluation
Publisher:
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Keywords:
Subject
Computer Science, Interdisciplinary applications, Linguistics