Home // INTENSIVE 2011, The Third International Conference on Resource Intensive Applications and Services // View article


Crawlzilla - A Toolkit for Deploying Cluster Search Engine Quickly and Easily

Authors:
Yang Shun-Fa
Chen Wa-Ue
Kuo Wen-Chieh

Keywords: Search Engine, Nutch, Hadoop, Java Open Source

Abstract:
Nutch is one of the most well-know and best search engine project for crawling enterprise or personal internal web sites, but many system administrators encounter difficulties to setup and use due to the complicated operation process. In this paper, we present Crawlzilla, an open source search engine tool built on top of Hadoop and Nutch. Crawlzilla integrates related useful packages to reduce installation and setup steps, assists system administrators to quickly deploy their own private search engine within the intra website, and also supplies cluster feature to build distributed search engine environment. In addition, it also provides two friendly interfaces for system administrators. The one is used to manage system environment operated on terminal window, the other interface based on web page help system administrators or users for creating their own search engines.

Pages: 16 to 21

Copyright: Copyright (c) IARIA, 2011

Publication date: May 22, 2011

Published in: conference

ISBN: 978-1-61208-135-9

Location: Venice/Mestre, Italy

Dates: from May 22, 2011 to May 27, 2011