Home // IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management // View article
Document Retrieval in Big Data
Authors:
Feifei Pan
Keywords: Document Retrieval; Locality Sensitive Hashing; Big Data
Abstract:
Nearest Neighbor Search for similar document retrieval suffers from efficiency problem when scale to large dataset. In this paper, we introduce an unsupervised approach based on Locality Sensitive Hashing to alleviate its search complexity problem. The advantage of our proposed approach is that it does not need to scan all the documents for retrieving top-K Nearest Neighbors, instead, a number of hash table lookup operations are conducted to retrieve the top-K candidates. Experiments on two massive news and tweets datasets demonstrate that our approach is able to achieve over an order of speedup compared with the traditional Information Retrieval method and maintain reasonable precision.
Pages: 79 to 82
Copyright: Copyright (c) IARIA, 2014
Publication date: July 20, 2014
Published in: conference
ISSN: 2326-9332
ISBN: 978-1-61208-364-3
Location: Paris, France
Dates: from July 20, 2014 to July 24, 2014