Home // SOTICS 2019, The Ninth International Conference on Social Media Technologies, Communication, and Informatics // View article
Identifying Obstacles in Data Sharing by Automatic Extraction of Problematic Points in Documents
Authors:
Yan Wan
Yalu Wang
Guanhao Chen
Jinping Gao
Keywords: open data; feature extraction; data sharing obstacles.
Abstract:
This paper aims to propose an automatic viewpoint extraction method using open data obstacle extraction as an example. Open data (data sharing) is very important because it reduces job repeatability and increase productivity and openness of work. However, open data in China is not as well developed as we wish. It is hindered by various problems, such as the willingness to share, the incompatible of data formats, etc. In order to identify different problems, then allocate to relevant parties to tackle these problems, we adopt an automatic extraction algorithm of natural language processing techniques, to automatically identify problematic points (obstacles) of data sharing from relevant literature. In this paper, we first construct a vocabulary for “obstacles”, so that machines can find “obstacles” in literature more accurately. Then, an extraction algorithm combined with word2vec and Pointwise Mutual Information (PMI) is proposed, to automatically find the sentences that talk about “obstacles” of open data in documents. An experiment of this method is carried out and analyzed. It shows that the proposed method can be a very good tool for similar tasks that need to find viewpoint from a large amount of documents but cannot be done by simple keyword searches.
Pages: 21 to 24
Copyright: Copyright (c) IARIA, 2019
Publication date: November 24, 2019
Published in: conference
ISSN: 2326-9294
ISBN: 978-1-61208-757-3
Location: Valencia, Spain
Dates: from November 24, 2019 to November 28, 2019