Home // ALLDATA 2019, The Fifth International Conference on Big Data, Small Data, Linked Data and Open Data // View article
Authors:
Li Wang
Xiang Shen
Xiwen Liu
Keywords: LDA; automatic term identification; preprocessed text; visualization
Abstract:
Computational science and Data Science are inspiring the intelligent analysis and information service today. Machine learning text analysis is changing the traditional analysis methods. This article discusses the benefits of unsupervised learning approaches in patent text mining. Patent data of GaN industry were preprocessed by filter model based on NLTK Toolkit to identify the tech terms and then clustered based on Latent Dirichlet Allocation model to find the latent topics which were visualized. Based on group operation new emerging terms ranked by TFIDF through every year were used to reveal the research and development focus evolution. This research offers a demonstration of the proposed method based on 26,854 GaN patents. The results show 20 Research and Development topics with tech terms in GaN industry and present a Research and Development focus evolution based new emerging terms of every year which provides a clue for more detail analyses later. Our results show a efficent way to find technology focus evolution from a large scale text data.
Pages: 1 to 4
Copyright: Copyright (c) IARIA, 2019
Publication date: March 24, 2019
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-61208-700-9
Location: Valencia, Spain
Dates: from March 24, 2019 to March 28, 2019