Home // SEMAPRO 2011, The Fifth International Conference on Advances in Semantic Processing // View article
Local theme detection and annotation with keywords for narrow and wide domain short text collections
Authors:
Svetlana Vladimirovna Popova
Ivan Alexandrovich Khodyrev
Keywords: narrow domain short text clustering; automatic annotation; hierarchical clustering; Pearson correlation
Abstract:
This paper presents a clustering approach for text collections and automatic detection of topic and keywords for clusters. Present research focuses on narrow domain short texts such as short news and scientific paper abstracts. We propose a term selection method, which helps to significantly improve hierarchic clustering quality, and also the automatic algorithm to annotate clusters with keywords and topic names. The results of clustering are good comparing with the results of other approaches and our algorithm also allows extracting keywords for each cluster, using the information about the size of a cluster and word frequencies in documents.
Pages: 49 to 55
Copyright: Copyright (c) IARIA, 2011
Publication date: November 20, 2011
Published in: conference
ISSN: 2308-4510
ISBN: 978-1-61208-175-5
Location: Lisbon, Portugal
Dates: from November 20, 2011 to November 25, 2011