出版社:The Institute of Image Information and Television Engineers
摘要:There are two important issues for accurate concept detection in videos. One is to train a concept detector with a large number of training examples. The other is to extract the feature representation of a shot based on descriptors, which are densely sampled in both the spatial and temporal dimensions. This paper describes two fast and exact methods based on matrix operation, where a large amount of data are processed in a batch without any approximation. The first method trains a concept detector based on batch computation of similarities among many training examples. The second method extracts the feature representation of a shot by computing probability densities of many descriptors in a batch. The experimental results validate the efficiency and effectiveness of our methods. In particular, the concept detection result obtained by our methods was ranked top in the annual worldwide competition, TRECVID 2012 Semantic Indexing (light).