摘要:In this paper, we describe how information obtained from multiple views usinga network of cameras can be effectively combined to yield a reliable and fast humanactivity recognition system. First, we present a score-based fusion technique for combininginformation from multiple cameras that can handle the arbitrary orientation of the subjectwith respect to the cameras and that does not rely on a symmetric deployment of thecameras. Second, we describe how longer, variable duration, inter-leaved action sequencescan be recognized in real-time based on multi-camera data that is continuously streaming in.Our framework does not depend on any particular feature extraction technique, and as aresult, the proposed system can easily be integrated on top of existing implementationsfor view-specific classifiers and feature descriptors. For implementation and testing of theproposed system, we have used computationally simple locality-specific motion informationextracted from the spatio-temporal shape of a human silhouette as our feature descriptor.This lends itself to an efficient distributed implementation, while maintaining a high framecapture rate. We demonstrate the robustness of our algorithms by implementing them ona portable multi-camera, video sensor network testbed and evaluating system performanceunder different camera network configurations.
关键词:multimedia sensor network; information fusion; camera network; multi-view;score-fusion; activity recognition