Home // CONTENT 2012, The Fourth International Conference on Creative Content Technologies // View article


A Document Analysis System for Linking Cross-document Entities

Authors:
Manabu Ohta
Atsuhiro Takasu

Keywords: digital library, information extraction, CRF

Abstract:
This paper proposes an entity extraction and matching system for digital documents. Digital documents usually contain many links to their relevant information, but they do not cover all the links. Entity extraction and matching systems are used to detect such implicit links. They usually consist of several steps such as parsing, dictionary matching, and classification. Some of these steps, however, inevitably cause errors, which must be managed properly so that the process of subsequent steps is not degraded. We have therefore been developing an entity extraction and matching system focusing on managing the errors incurred at each step. This paper overviews the system and explains some techniques we have developed to improve the quality of entity extraction and matching because the system can be a key solution to content management for institutional repositories and academic societies as well as digital libraries.

Pages: 14 to 20

Copyright: Copyright (c) IARIA, 2012

Publication date: July 22, 2012

Published in: conference

ISSN: 2308-4162

ISBN: 978-1-61208-220-2

Location: Nice, France

Dates: from July 22, 2012 to July 27, 2012