Home // eKNOW 2014, The Sixth International Conference on Information, Process, and Knowledge Management // View article
The Critical Dimension Problem: No Compromise Feature Selection
Authors:
Divya Suryakumar
Andrew Sung
Qingzhong Liu
Keywords: machine learning; ranking; feature reduction; Critical Dimension; large datasets.
Abstract:
The important feature selection problem has been studied extensively and a variety of algorithms has been proposed for data analysis and mining tasks in diverse applications. As the era of “big data” arrives, the development of effective techniques for identifying important features or attributes in very large datasets will be highly valuable in dealing with many of the challenges that come with it. This paper describes work in progress regarding a related general problem: for a given dataset, is there a “Critical Dimension” or minimum number of features that are necessary for achieving good results? In other words, for a dataset with many features, how many are truly relevant and important to be included in, say machine learning and/or data mining tasks to ensure that acceptable performance is achieved? Moreover, if a Critical Dimension indeed exists, how to identify the features that need to be included? The problem is first analyzed formally and shown to be intractable. An ad hoc method is then designed for obtaining approximate solution; next experiments are performed on a selection of datasets of varying sizes to demonstrate that for many datasets there indeed exist a Critical Dimension. The significance of the existence or lack thereof in datasets is explained.
Pages: 145 to 151
Copyright: Copyright (c) IARIA, 2014
Publication date: March 23, 2014
Published in: conference
ISSN: 2308-4375
ISBN: 978-1-61208-329-2
Location: Barcelona, Spain
Dates: from March 23, 2014 to March 27, 2014