Home // IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management // View article


Document Retrieval in Big Data

Authors:
Feifei Pan

Keywords: Document Retrieval; Locality Sensitive Hashing; Big Data

Abstract:
Nearest Neighbor Search for similar document retrieval suffers from efficiency problem when scale to large dataset. In this paper, we introduce an unsupervised approach based on Locality Sensitive Hashing to alleviate its search complexity problem. The advantage of our proposed approach is that it does not need to scan all the documents for retrieving top-K Nearest Neighbors, instead, a number of hash table lookup operations are conducted to retrieve the top-K candidates. Experiments on two massive news and tweets datasets demonstrate that our approach is able to achieve over an order of speedup compared with the traditional Information Retrieval method and maintain reasonable precision.

Pages: 79 to 82

Copyright: Copyright (c) IARIA, 2014

Publication date: July 20, 2014

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-364-3

Location: Paris, France

Dates: from July 20, 2014 to July 24, 2014