Budget Amount *help |
¥2,800,000 (Direct Cost: ¥2,800,000)
Fiscal Year 2002: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 2001: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2000: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Research Abstract |
In many recognition problems, basic procedure is the matching between the input and the data set. This procedure can be characterized as the nearest neighbor problem in high dimensional vector space. If each data resides in the high-dimensional space, nearest neighbor search is getting very difficult depending the dimensionality. For example, well-known kd-tree is useless for the search problem in high dimensional vector space. In this report, we first compare the search algorithms, including brute-force method, kd-tree and LSH(locality sensitivity hashing). We show, LSH can be very effective, although it can give only the approximation solution. To realize fast query in high-dimensional vector space, it is important to reduce the dimensionality and/or to reduce the size of the target data set. LSH can be seen as one of the method to reduce the data set to examine. Based on these preliminary results, we first propose an image query system based on the Gaussian mixture model and PCA(principle component analysis). Then, we show if the distribution of the data set can be described as the "clusters", fast query can be made possible. If data set is made up of several clusters, recognizing the appropriate cluster for each input vector is the key to the fast and reliable search. For that purpose, we have to learn classifiers from examples. Since we cannot expect the classifier 100% accurate, it is desirable to obtain classifiers which give rise to multiple hypotheses. In our report, we describe a method to extend DAG-SVM to make multiple hypotheses.
|