On the selection of the globally optimal prototype subset for NN classification

E. Carrizosa, B. Martín-Barragán, F. Plastria, D. Romero Morales

The Nearest Neighbor classifier has shown to be a powerful tool for multiclass classification. In this note we explore both theoretical properties and empirical behavior of a variant of such method, in which the Nearest Neighbor rule is applied after selecting a set of so-called prototypes, whose cardinality is fixed in advance, by minimizing the empirical misclassification cost. With this we alleviate the two serious drawbacks of the Nearest Neighbor method: high storage requirements and time-consuming queries.

The problem is shown to be NP-Hard. Mixed Integer Programming (MIP) programs are formulated, theoretically compared and solved by a standard MIP solver for problem instances of small size. Large sized problem instances are solved by a metaheuristic yielding good classification rules in reasonable time.

Keywords: Classification, Optimal Prototype Subset, Nearest Neighbor, Dissimilarities, Integer Programming, Variable Neighborhood Search.

go to main page