Big & Deep Data Analytics using Statistical Significance: An Introductory Survey

Dutta, Sourav

Home // DATA ANALYTICS 2015, The Fourth International Conference on Data Analytics // View article

Big & Deep Data Analytics using Statistical Significance: An Introductory Survey

Authors:
Sourav Dutta

Keywords: statistical big data analytics; chi-2 significance; text and graph mining; clustering; survey.

Abstract:
The explosion of diverse and rich information sources across the world wide web has fostered the need of extremely efficient approaches for storage, management, and retrieval of such enormous data in the order of hundreds of petabytes. Scalable data mining or extraction of interesting summaries, patterns, and association rules from such huge text and sequence data-stores caters to a multitude of applications, such as search engines, financial modeling, climate monitoring, computational biology, text analysis, and social graph mining to name a few. This necessity has led to the growth of recent research directions in big data analytics and deep learning. Statistical significance attributes the occurrence of an event to chance alone or to the presence of an interesting phenomenon. Such techniques enable the detection of anomalies or deviations from the expected distribution, enabling faster and highly accurate approximate data mining or retrieval by quantization into “normal” or “significant” observational sub-classes. This paper provides a brief survey of interesting recent works and possible future exploratory directions incorporating statistical significance for sub-text mining (in blog analysis, spell checks, etc.), outlier detection, and graph mining in the context of big data analytics.

Pages: 106 to 111

Copyright: Copyright (c) IARIA, 2015

Publication date: July 19, 2015

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-423-7

Location: Nice, France

Dates: from July 19, 2015 to July 24, 2015