“EUDETECTOR: Leveraging Language Model to Identify EU-Related News”

“EUDETECTOR: Leveraging Language Model to Identify EU-Related News”

  • Post category:Events

Paper presentation by project coordinator L3s at the The Web Conference!

Authors: Koustav Rudra, Danny Tran, Miroslav Shaltev

Abstract: News media reflects the present state of a country or region to its audiences. Thousands of news are posted and consumed by a large group of diverse audiences across the world. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or hand-crafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

For more information: https://msnews.github.io/program.html and https://www2021.thewebconf.org/program/workshops/