Identifying Critical Values for Supervised Learning Involving Transactional Data

D. Romero Morales, J. Wang

For large and noisy transactional learning tasks, such as the cancellation rate forecasting problem in revenue management, multivariate insights are difficult to obtain. In this paper, we focus on explaining the influence of individual variables on learning tasks. Combining ideas from discretization and feature selection, we propose an algorithm for identifying values of the predictor variables, that are critical to a learning problem and ranking them by importance. Our algorithm is more selective than existing algorithms from the discretization literature, thanks to its robust score for measuring the goodness of critical values and parallel structure for managing the number of critical values assigned to each variable. On eleven publicly available datasets, we show that the ``critical values'' identified by our algorithm preserve more useful information than those identified by existing algorithms. On two large real-world hotel and airline datasets, our algorithm provides interesting insights on the cancellation behavior of customers, e.g., recent bookings and bookings from small agents are more likely to be canceled, which helps managers better understand the drivers of cancellation.

go to main page