Home // ALLDATA 2021, The Seventh International Conference on Big Data, Small Data, Linked Data and Open Data // View article


Drifting and Popularity: A Study of Time Series Analysis of Topics

Authors:
Muhammad Haseeb UR Rehman Khan
Kei Wakabayashi

Keywords: DTM; LDA; Topic Modeling; Time Series Analysis.

Abstract:
Topic modeling is extensively used for the Natural Language Processing (NLP) problems of summarizing, organizing, and understanding large document datasets. Latent Dirichlet Allocation (LDA) is widely used for the collection of topics, whereas Dynamic Topic Model (DTM) is famous for the time-series topic analysis. However, by estimating the number of occurrences of topics in each time slice, we can obtain time-series topic popularity using standard LDA. Therefore, if this can be extracted with LDA, then why do we need DTM which has a very high computation cost? The purpose of this research is to determine, either time-series topic information can be extracted from LDA or we need DTM. Topic drifting and popularity are two fundamental aspects of time-series topic analysis. We conducted experiments with multiple datasets to check the reliability of the information extracted from both models. We used Jensen-Shannon (JS) similarity-based analysis to check for information overlap. We constructed time-series topic popularity graphs for both models from the document-topic distributions and compared the results. Our results show that there is notable DTM topic drifting information in some cases and sometimes no or vague topic drifting. Topic drifting embedded in DTM topics makes this model less favorable for topic popularity analysis. On the other hand, LDA topics with no time transition information provided concrete results of topic popularity.

Pages: 16 to 22

Copyright: Copyright (c) IARIA, 2021

Publication date: April 18, 2021

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-61208-842-6

Location: Porto, Portugal

Dates: from April 18, 2021 to April 22, 2021