Home // ALLDATA 2016, The Second International Conference on Big Data, Small Data, Linked Data and Open Data // View article


Link Detection Based on Named Entity Keywords in Turkish News Corpus

Authors:
Hamid Ahmadlouei
Hayri Sever
Erhan Mengusoglu

Keywords: story link detectiotopic detection and tracking, vector space model, information retrieval, named entity

Abstract:
In this study, we investigate the influence of Named Entities (NEs) on the task of Story Link Detection (SLD), which is one of the important subtask in Topic Detection and Tracking (TDT). TDT aims at developing algorithms for either clustering documents, e.g., online news, and then tracking new ones with respect to a predetermined topic or otherwise detecting a new topic. Furthermore, SLD focuses on determining whether the stories are about the same topic or not. Vector Space Model (VSM) was used as a base method in this work. The performance of VSM reported on All-words and Named Entities (NE) separately. Named entity intersection checking between two news stories is also studied for determining if the two stories are linked or not. Additionally, we investigated the effect of controlled entity intersection on the performance of previous VSM based methods. Combination of these methods provides improvement in estimation of whether two stories are linked or not. Experimental results reported in the literature indicates that NE inside the news stories has an important role in VSM performance. NE intersection checks between news stories combined with VSM have a substantial effect on performance of the VSM in SLD task. Since this study focuses on investigating the impact of NE combination in determining similarity between news, instead of using common automatic NE extraction methods, which is the case in the literature, we extracted NE manually from the dataset. All of the experiments in this work are based on the Turkish news corpus named BIL-COL2005, compatible with TDT standards.

Pages: 71 to 75

Copyright: Copyright (c) IARIA, 2016

Publication date: February 21, 2016

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-61208-457-2

Location: Lisbon, Portugal

Dates: from February 21, 2016 to February 25, 2016