Home // DATA ANALYTICS 2016, The Fifth International Conference on Data Analytics // View article
The 100-fold Cross Validation for Small Sample Method
Authors:
Shuichi Shinmura
Keywords: Fisher’s LDF; logistic regression; two SVMs; three Optimal LDFs (OLDFs); Best Model; LOO.
Abstract:
We establish a new theory of discriminant analysis by mathematical programming (MP) and develop three MPbased optimal linear discriminant functions (Optimal LDFs). Those are Revised IP-OLDF based on a minimum number of misclassification (minimum NM, MNM) criterion by integer programming (IP), Revised LP-OLDF by linear programming (LP) and Revised IPLP-OLDF that is a mixture model of Revised LP-OLDF and Revised IP-OLDF. We evaluate these LDFs with two support vector machines (SVMs), Fisher’s LDF and logistic regression. Although we could compare these LDFs by six different small samples, we could not validate these LDFs by the validation samples. Therefore, we developed “100-fold cross validation for small sample” method that is a combination of k-fold cross validation and re-sampling sample (The Method). By this break-through, we can validate seven LDFs with the 95%confidence interval (CI) of error rates and the discriminant coefficients in the training and validation samples. Especially, we can select the best model with minimum mean error rates in the validation sample (M2) instead of the leave-one-out (LOO) procedure. We compared seven LDFs using six different datasets and showed that the best models of Revised IP-OLDF are better than the other six best models by the Method.
Pages: 29 to 36
Copyright: Copyright (c) IARIA, 2016
Publication date: October 9, 2016
Published in: conference
ISSN: 2308-4464
ISBN: 978-1-61208-510-4
Location: Venice, Italy
Dates: from October 9, 2016 to October 13, 2016