Home // International Journal On Advances in Life Sciences, volume 1, numbers 2 and 3, 2009 // View article
An Ontology Learning Framework Using Focused Crawler and Text Mining
Authors:
Hiep Phuc Luong
Susan Gauch
Qiang Wang
Anne Maglia
Keywords: ontology learning; focused crawler; SVM; text mining; amphibian ontology
Abstract:
Manual ontology construction is costly, time-consuming, error-prone and inflexible to change. To address these problems, researchers hope that an automated process will result in faster and better ontology construction and enrichment. Ontology learning has become recently a major area of research whose goal is to facilitate the construction of ontologies by decreasing the amount of effort required to produce an ontology for a new domain. However, most of current approaches are dealing with some specific tasks or a part of the ontology learning process rather than providing complete support to users. There are few studies that attempt to automate the entire ontology learning process from the collection of domain-specific literature, filtering out documents irrelevant to the domain, to text mining to build new ontologies or enrich existing ones. In this paper, we present a complete framework for ontology learning that enables us to retrieve documents from the Web using focused crawling and then use a SVM (Support Vector Machine) classifier to identify domain-specific documents and perform text mining in order to extract useful information for the ontology enrichment process. Our experimental results of this framework in the amphibian morphology domain support our belief that we can use SVM and text mining approaches to improve the identification of documents and relevant words suitable for the ontology enrichment. This paper reports on the overall system architecture and our initial experiments of all phases in our ontology learning framework, i.e., document focused crawling, document classification and information extraction using text mining techniques to enrich the domain ontology.
Pages: 99 to 109
Copyright: Copyright (c) to authors, 2009. Used with permission.
Publication date: December 1, 2009
Published in: journal
ISSN: 1942-2660