Home // eKNOW 2012, The Fourth International Conference on Information, Process, and Knowledge Management // View article


Critical Dimension in Data Mining

Authors:
Divya Suryakumar
Andrew H. Sung
Qingzhong Liu

Keywords: selection; critical dimension; machine learning.

Abstract:
Data mining is an increasingly important means of knowledge acquisition for many applications in diverse fields such as biology, medicine, management, engineering, etc. When tackling a large-scale problem that involves a multitude of potentially relevant factors but lacking a precise formulation or mathematical characterization to allow formal approaches to solution, the available data collected for the application can often be mined to extract knowledge about the problem. Feature ranking and selection, thereby, are immediate issues to consider when one prepares to perform data mining, and the literature contains numerous theoretical and empirical methods of feature selection for a variety of problems. This work in progress paper concerns the related question of critical dimension, i.e., for a specific data mining task, does there exist a minimum number (of features) which is required for a specific learning machine to achieve satisfactory performance? As a first step in addressing this question, a simple ad-hoc method is employed for experiment and it is shown that the phenomenon of critical dimension indeed exists for several of the datasets studied. The implications are that each of these datasets contains irrelevant features or input attributes, which can be eliminated to achieve higher accuracy in model building using learning machines.

Pages: 97 to 100

Copyright: Copyright (c) IARIA, 2012

Publication date: January 30, 2012

Published in: conference

ISSN: 2308-4375

ISBN: 978-1-61208-181-6

Location: Valencia, Spain

Dates: from January 30, 2012 to February 4, 2012