Home // CONTENT 2012, The Fourth International Conference on Creative Content Technologies // View article
A Document Analysis System for Linking Cross-document Entities
Authors:
Manabu Ohta
Atsuhiro Takasu
Keywords: digital library, information extraction, CRF
Abstract:
This paper proposes an entity extraction and matching system for digital documents. Digital documents usually contain many links to their relevant information, but they do not cover all the links. Entity extraction and matching systems are used to detect such implicit links. They usually consist of several steps such as parsing, dictionary matching, and classification. Some of these steps, however, inevitably cause errors, which must be managed properly so that the process of subsequent steps is not degraded. We have therefore been developing an entity extraction and matching system focusing on managing the errors incurred at each step. This paper overviews the system and explains some techniques we have developed to improve the quality of entity extraction and matching because the system can be a key solution to content management for institutional repositories and academic societies as well as digital libraries.
Pages: 14 to 20
Copyright: Copyright (c) IARIA, 2012
Publication date: July 22, 2012
Published in: conference
ISSN: 2308-4162
ISBN: 978-1-61208-220-2
Location: Nice, France
Dates: from July 22, 2012 to July 27, 2012