A Novel Feature Selection Method Based on a Clustering Algorithm

Mata-Torres, Jonathan A.; Tello-Leal, Edgar; Ramirez-Alcocer, Ulises M.; Romero-Galvan, Gerardo

Home // IMMM 2019, The Ninth International Conference on Advances in Information Mining and Management // View article

A Novel Feature Selection Method Based on a Clustering Algorithm

Authors:
Jonathan A. Mata-Torres
Edgar Tello-Leal
Ulises M. Ramirez-Alcocer
Gerardo Romero-Galvan

Keywords: feature selection; mean shift; clustering; data mining; J48

Abstract:
Nowadays, there is a great interest from academia, the industry, and the government to find potentially useful information to build a prediction model from data with high dimensionality, which has become one of the most important challenges in data mining and machine learning approaches. In this way, feature selection is the process of selecting the most useful features for building models in tasks like classification, regression or clustering, in order to reduce the dimensionality and facilitating the visualization and understanding of the data. In this paper, we propose a feature selection method based on the mean shift clustering algorithm and the Pearson correlation coefficient to contribute to solving some of the challenges in the data analytics systems, of real-time execution. Furthermore, we compare the mean shift method with the renowned Recursive Feature Elimination (RFE) method, as well as with the feature selection method designed by a human expert in the domain. Finally, the subsets of data generated with the attributes selected by the methods are evaluated by the J48 classification algorithm based on a decision tree, using a historical public safety data set. The clustering method proposed has a great advantage over the other methods in the computing time required to recommend a group of selected attributes.

Pages: 32 to 36

Copyright: Copyright (c) IARIA, 2019

Publication date: July 28, 2019

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-731-3

Location: Nice, France

Dates: from July 28, 2019 to August 2, 2019