Home // ALLDATA 2020, The Sixth International Conference on Big Data, Small Data, Linked Data and Open Data // View article
Authors:
Samah Salem
Fouzia Benchikha
Keywords: linked data; quality assessment; semantic relations; synonym predicates; profiling statistics; DBpedia.
Abstract:
Over the past years, an increasing number of datasets have been published as part of the Web of Data, reaching more than 1,200 datasets in 2019. However, many datasets, totaling a large quantity of RDF triples, are without ontology or with an incomplete one. As a result, they suffer more and more from quality problems. Assessing linked data quality for fitness for use is a current research problem that we are interested in. In this paper, we propose a novel approach for the assessment of quality between RDF triples without requiring schema information. It allows assessing the quality of datasets by detecting errors and eventually measuring the error rate using synonym predicates techniques, profiling statistics, and quality verification cases. Promising results are obtained on the DBpedia dataset where several data quality issues have been detected, such as inaccurate values, redundant predicates, and redundant triples.
Pages: 8 to 13
Copyright: Copyright (c) IARIA, 2020
Publication date: February 23, 2020
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-61208-775-7
Location: Lisbon, Portugal
Dates: from February 23, 2020 to February 27, 2020