Home // IMMM 2015, The Fifth International Conference on Advances in Information Mining and Management // View article


Automatic KDD Data Preparation Using Multi-criteria Features

Authors:
Youssef Hmamouche
Christian Ernst
Alain Casali

Keywords: Data Mining; Data Preparation; Outliers; Discretization Methods

Abstract:
We present a new approach for automatic data preparation, applicable in most Knowledge Discovery and Data Mining systems, and using statistical features of the studied database. First, we detect outliers using an approach based on whether data distribution is normal or not. We outline further that, when trying to find the most appropriate discretization method, what is important is not the law followed by a column, but the shape of its density function. That is why we propose an automatic choice for finding the best discretization method based on a multi-criteria (Entropy, Variance, Stability) analysis. Experimental evaluations validate our approach: The very same discretization method is never always the most appropriate.

Pages: 33 to 38

Copyright: Copyright (c) IARIA, 2015

Publication date: June 21, 2015

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-415-2

Location: Brussels, Belgium

Dates: from June 21, 2015 to June 26, 2015