Representative Region Selection: Formulation, Methods and Consistency
Abstract: We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define a class of methods, and we propose new methods within this class. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including the two methods proposed in this paper. We provide an example of how representative regions may be used to explore a data set.