Home // eKNOW 2014, The Sixth International Conference on Information, Process, and Knowledge Management // View article
Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation
Authors:
Toshiaki Funatsu
Yoichi Tomiura
Emi Ishita
Kosuke Furusawa
Keywords: LDA; topic analysis; Gibbs sampling
Abstract:
Determining the topic of a document is necessary to understand the content of the document efficiently. Latent Dirichlet Allocation (LDA) is a method of analyzing topics. In LDA, a topic is treated as an unobservable variable to establish a probabilistic distribution of words. We can interpret the topic with a list of words that appear with high probability in the topic. This method works well when determining a topic included in many documents having a variety of contents. However, it is difficult to interpret the topic just using conventional LDA when determining the topic in a set of article abstracts found by a keyword search, because their contents are limited and similar. We propose a method to estimate representative words of each topic from an LDA result. Experimental results show that our method provides better information for interpreting a topic than LDA does.
Pages: 112 to 117
Copyright: Copyright (c) IARIA, 2014
Publication date: March 23, 2014
Published in: conference
ISSN: 2308-4375
ISBN: 978-1-61208-329-2
Location: Barcelona, Spain
Dates: from March 23, 2014 to March 27, 2014