Home // ALLDATA 2019, The Fifth International Conference on Big Data, Small Data, Linked Data and Open Data // View article


Research of Topics Discovery and Tech Evolution Based on Text Preprocessed Latent Dirichlet Allocation Model

Authors:
Li Wang
Xiang Shen
Xiwen Liu

Keywords: LDA; automatic term identification; preprocessed text; visualization

Abstract:
Computational science and Data Science are inspiring the intelligent analysis and information service today. Machine learning text analysis is changing the traditional analysis methods. This article discusses the benefits of unsupervised learning approaches in patent text mining. Patent data of GaN industry were preprocessed by filter model based on NLTK Toolkit to identify the tech terms and then clustered based on Latent Dirichlet Allocation model to find the latent topics which were visualized. Based on group operation new emerging terms ranked by TFIDF through every year were used to reveal the research and development focus evolution. This research offers a demonstration of the proposed method based on 26,854 GaN patents. The results show 20 Research and Development topics with tech terms in GaN industry and present a Research and Development focus evolution based new emerging terms of every year which provides a clue for more detail analyses later. Our results show a efficent way to find technology focus evolution from a large scale text data.

Pages: 1 to 4

Copyright: Copyright (c) IARIA, 2019

Publication date: March 24, 2019

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-61208-700-9

Location: Valencia, Spain

Dates: from March 24, 2019 to March 28, 2019