Home // DATA ANALYTICS 2015, The Fourth International Conference on Data Analytics // View article
Big & Deep Data Analytics using Statistical Significance: An Introductory Survey
Authors:
Sourav Dutta
Keywords: statistical big data analytics; chi-2 significance; text and graph mining; clustering; survey.
Abstract:
The explosion of diverse and rich information sources across the world wide web has fostered the need of extremely efficient approaches for storage, management, and retrieval of such enormous data in the order of hundreds of petabytes. Scalable data mining or extraction of interesting summaries, patterns, and association rules from such huge text and sequence data-stores caters to a multitude of applications, such as search engines, financial modeling, climate monitoring, computational biology, text analysis, and social graph mining to name a few. This necessity has led to the growth of recent research directions in big data analytics and deep learning. Statistical significance attributes the occurrence of an event to chance alone or to the presence of an interesting phenomenon. Such techniques enable the detection of anomalies or deviations from the expected distribution, enabling faster and highly accurate approximate data mining or retrieval by quantization into “normal” or “significant” observational sub-classes. This paper provides a brief survey of interesting recent works and possible future exploratory directions incorporating statistical significance for sub-text mining (in blog analysis, spell checks, etc.), outlier detection, and graph mining in the context of big data analytics.
Pages: 106 to 111
Copyright: Copyright (c) IARIA, 2015
Publication date: July 19, 2015
Published in: conference
ISSN: 2308-4464
ISBN: 978-1-61208-423-7
Location: Nice, France
Dates: from July 19, 2015 to July 24, 2015