Coinciding with the rise of large-scale statistical learning within the computer vision community has been a dramatic improvement in automated methods for human biometrics, object recognition, and scene parsing, among many other applications. Despite this progress, a tremendous gap exists between the performance of algorithms in the laboratory and the performance of those same methods in the real world. A major contributing factor to this is the way in which machine learning algorithms are typically evaluated: without the expectation that a class unknown to the algorithm at training time will be experienced during operational deployment.
Both recognition and classification are common terms in the computer vision literature. What is the difference? In classification, one assumes there is a given set of classes between which we must discriminate. For recognition, we assume there are some classes we can recognize in a much larger space of things we do not recognize. A motivating question for this tutorial is: What is the general recognition problem? This question, of course, is a central theme in most applications involving visual recognition. How one should approach multi-class recognition is still an open issue. Should it be performed as a series of binary classifications, or by detection, where a search is performed for each of the possible classes? What happens when some classes are ill-sampled, not sampled at all, or undefined?
For some problems, we do not need, and often cannot have, knowledge of the entire set of possible classes. For instance, in a recognition application for biologists, a single species of fish might be of interest. However, the classifier must consider the set of all other possible objects in relevant settings as potential negatives. Similarly, verification problems for security-oriented face matching constrain the target of interest to a single claimed identity, while considering the set of all other possible people as potential impostors. In addressing general object recognition, there is a finite set of known objects in myriad unknown objects, combinations and configurations - labeling something new, novel or unknown should always be a valid outcome. This leads to what is sometimes called "open set" recognition, in comparison to systems that make closed world assumptions or use "closed set" evaluation.
The purpose of this tutorial is to introduce the CVPR audience to this difficult problem in statistical learning specifically in the context of important vision applications. A number of different topics will be introduced, including: recent formalizations of the open set recognition problem, the statistical extreme value theory for visual recognition, which facilitates generalization in probabilistic decision models, and new supervised learning algorithms that minimize the risk of the unknown. Original case studies will be covered for applications related to the analysis of faces, objects and scenes. The tutorial is composed of three parts, each lasting approximately one hour. A complete outline follows.
Part 1: An introduction to the open set recognition problem
Part 2: Statistical extreme value theory for visual recognition
Part 3: Algorithms that minimize the risk of the unknown