Sequence-based Kernels for Online Concept Detection in Video

Publication from Digital

Bailer W.

Scottsdale, AZ, USA AIEMPro '11: Proceedings of the 4th international workshop on Automated information extraction in media production, pp. 1-6, 11/2011


Kernel methods, e.g. Support Vector Machines, have been successfully applied to classification problems such as concept detection in video. In order to capture concepts and events with longer temporal extent, kernels for sequences of feature vectors have been proposed, e.g. based on temporal pyramid matching or sequence alignment. However, all these approaches need a temporal segmentation of the video, as the kernel is applied to the feature vectors of a segment. In (semi-)supervised
 training, this is not a problem, as the ground truth is annotated on a temporal segment. When performing online concept detection on a live video stream, (i) no segmentation exists and (ii) the latency must be kept as low as possible. Re-evaluating the kernel for each temporal position of a sliding window is prohibitive due to the computational effort. We thus propose variants of the temporal pyramid matching, all subsequences and longest common subsequence kernels, which can be efficiently calculated for a temporal sliding window. An arbitrary kernel function can be plugged in to determine the similarity of feature vectors of individual samples. We evaluate the proposed kernels on the TRECVID 2007 High-level Feature Extraction data set and show
 that the sliding window variants for online detection perform equally well or better than the segment-based ones, while the runtime is reduced by at least 30%.