Home // SEMAPRO 2011, The Fifth International Conference on Advances in Semantic Processing // View article


Local theme detection and annotation with keywords for narrow and wide domain short text collections

Authors:
Svetlana Vladimirovna Popova
Ivan Alexandrovich Khodyrev

Keywords: narrow domain short text clustering; automatic annotation; hierarchical clustering; Pearson correlation

Abstract:
This paper presents a clustering approach for text collections and automatic detection of topic and keywords for clusters. Present research focuses on narrow domain short texts such as short news and scientific paper abstracts. We propose a term selection method, which helps to significantly improve hierarchic clustering quality, and also the automatic algorithm to annotate clusters with keywords and topic names. The results of clustering are good comparing with the results of other approaches and our algorithm also allows extracting keywords for each cluster, using the information about the size of a cluster and word frequencies in documents.

Pages: 49 to 55

Copyright: Copyright (c) IARIA, 2011

Publication date: November 20, 2011

Published in: conference

ISSN: 2308-4510

ISBN: 978-1-61208-175-5

Location: Lisbon, Portugal

Dates: from November 20, 2011 to November 25, 2011