出版社:The Institute of Image Information and Television Engineers
摘要:A method is described for detecting a “laughter” scene in a Consumer Generated Video (CGV). The growing number of CGVs is growing, so a video skimming method that allows viewers to quickly find “enjoyable” scenes would be useful. An experiment showed that viewers tend to find CGVs that include a “laughter” scene “enjoyable”. On that basis, we developed a method for detecting “laughter” scenes using prosodic information such as pitch, energy, power spectrum densities and their differential components to estimate the probability of laughter in the soundtrack. In this method, a generalized state space model, which consists of acoustic models and a state-transition model, calculates the probability of laughter to detect a “laughter” scene. An experiment showed that the precision rate of the method was about 83%, which suggests that the method enables effective video skimming.