Home // SEMAPRO 2016, The Tenth International Conference on Advances in Semantic Processing // View article
Ontologies-based Optical Character Recognition-error Correction Method for Bar Graphs
Authors:
Sarunya Kanjanawattana
Masaomi Kimura
Keywords: OCR-error correction; dependency parsing; ontology; edit distance; two-dimensional bar graphs.
Abstract:
Graphs provide an effective method for briefly presenting significant information appearing in academic literature. Readers can benefit from automatic graph information extraction. The conventional technique uses optical character recognition (OCR). However, OCR results can be imperfect because its performance depends on factors such as image quality. This becomes a critical problem because misrecognition provides incorrect information to readers and causes misleading communication. Numerous publications have appeared in recent years documenting OCR performance improvement and OCR result correction; however, only a few studies have focused on the use of semantics to solve this problem. In this study, we propose a novel method for OCR-error correction using several techniques, including ontologies, natural language processing, and edit distance. The input of this study includes bar graphs and associated information, such as their captions and cited paragraphs. We implemented five conditions to cover all possible situations for acquiring the most similar words as substitutes for incorrect OCR results. Moreover, we used DBpedia and WordNet to find word categories and part-of-speech tags. We evaluated our method by comparing performance rates, i.e., accuracy and precision, with our previous method using only the edit distance technique. As a result, our method provided higher performance rates than the other method. Our method’s overall accuracy reached 81%, while that of the other method was 54%. Based on the evidence, we conclude that our solution to the OCR problem is effective.
Pages: 1 to 8
Copyright: Copyright (c) IARIA, 2016
Publication date: October 9, 2016
Published in: conference
ISSN: 2308-4510
ISBN: 978-1-61208-507-4
Location: Venice, Italy
Dates: from October 9, 2016 to October 13, 2016