The VSD benchmark is a collection of ground-truth files based on the extraction of violent events in movies and web videos, together with high level audio and video concepts. It is intended to be used for assessing the quality of methods for the detection of violent scenes and/or the recognition of some high level, violence related, concepts in movies and web videos. 

The data has been produced by Technicolor for the 2012 subset and by the Fudan University and the Ho Chi Minh University of Science for the 2013 and 2014 subsets. It has been described in several publications. A detailed description of the benchmark can be found on our Data Description page. The license conditions are mentioned on the download page.  

This dataset was used in the multimodal benchmark MediaEval, for the 2011, 2012, 2013 and 2014 Affect Task – Violent Scenes Detection.



We would like to thank the MediaEval benchmark and their organizers for their support in the creation of this dataset. We also would like to thank the different co-organizers during all these past years:  

and of course our annotators. 

The creation of this benchmark has also been supported, in part, by:

  • the Quaero Program (
  • the China’s National 973 Program (#2010CB327900)
  • the China’s NSF Projects (#61201387 and #61228205)
  • the VNU-HCM Project (#B2013-26-01)
  • Academy of Finland funding grants no. 255745 and 251170.
  • UEFISCDI SCOUTER (under grant no. 28DPST/30-08-2013).
  • Austrian Science Fund (FWF): P25655
  • EU FP7-ICT-2011-9: project no. 601166 ("PHENICX")



If you make use of the VSD dataset, or refer to its results, please use the following citations:

C.H. DemartyC. Penet, M. Soleymani, G. Gravier. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. In Multimedia Tools and Applications, May 2014. (pdf)

C.H. Demarty, B. Ionescu, Y.G. Jiang, and C. Penet. Benchmarking Violent Scenes Detection in movies. In Proceedings of the 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), 2014. (pdf)

M. Sjöberg, B. Ionescu, Y.G. Jiang, V.L. Quang, M. Schedl and C.H. Demarty. The MediaEval 2014 Affect Task: Violent Scenes Detection. In Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain (2014). (pdf)

C.H. DemartyC. Penet, G. Gravier and M. Soleymani. A benchmarking campaign for the multimodal detection of violent scenes in movies. In Proceedings of the 12th international conference on Computer Vision – Volume Part III (ECCV’12), Andrea Fusiello, Vittorio Murino, and Rita Cucchiara (Eds), Col. Part III. Springer Verlag, Berlin. (pdf)



The ground truth was created from a collection of 32 movies of different genres (from extremely violent movies to non violent movies).


Among the 31 movies used for the benchmark in 2014, 24 are dedicated to the training step and 7 for the test step. High level concepts annotations are only provided on the first 18 movies of the test set.