Home // ICIW 2014, The Ninth International Conference on Internet and Web Applications and Services // View article


Scalable Web Content Understanding Framework

Authors:
Yang Sun
Hyungsik Shin
Sayandev Mukherjee
Ronald Sujithan
Hongfeng Yin
Yoshikazu Akinaga
Pero Subasic

Keywords: contextual tagging; advertising; content understanding engine.

Abstract:
The contextualization of an unknown web page is a fundamental need in many online applications. We propose a new framework known as the Content Understanding Engine (CUE) that allows computational stages to be composed with different technologies to contextualize an unknown URL. We describe how this computation pipeline interfaces with our Big Data infrastructure and how this approach simplifies deployment to private or public cloud environments. The implementation details of this framework are provided along with a use case to demonstrate the value of the CUE. We provide the results from our evaluation of this pipelined architecture with a wide range of URL from different topics.

Pages: 105 to 110

Copyright: Copyright (c) IARIA, 2014

Publication date: July 20, 2014

Published in: conference

ISSN: 2308-3972

ISBN: 978-1-61208-361-2

Location: Paris, France

Dates: from July 20, 2014 to July 24, 2014