scispace - formally typeset
Open AccessProceedings ArticleDOI

Violent Scenes Detection Using Mid-Level Violence Clustering

Shinichi Goto, +1 more
- pp 283-296
TLDR
This work proposes a novel system for Violent Scenes Detection, which is based on the combination of visual and audio features with machine learning at segment-level, and in particular, Mid-level Violence Clustering is proposed in order for mid-level concepts to be implicitly learned, without using manually tagged annotations.
Abstract
This work proposes a novel system for Violent Scenes Detection, which is based on the combination of visual and audio features with machine learning at segment-level. Multiple Kernel Learning is applied so that multimodality of videos can be maximized. In particular, Mid-level Violence Clustering is proposed in order for mid-level concepts to be implicitly learned, without using manually tagged annotations. Finally a violence-score for each shot is calculated. The whole system is trained ona dataset from MediaEval 2013 Affect Task and evaluated by its official metric. The obtained results outperformed its best score.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Violent Interaction Detection in Video Based on Deep Learning

TL;DR: A new input modality, image acceleration field is proposed to better extract the motion attributes and experimental results demonstrate that the proposed model for violent interaction detection shows higher accuracy and better robustness.
Journal ArticleDOI

Fast fight detection.

TL;DR: This work proposes a novel method to detect violence sequences that is outperformed in accuracy by state of the art, it has a significantly faster computation time thus making it amenable for real-time applications.
Book ChapterDOI

Violence Detection in Video by Using 3D Convolutional Neural Networks

TL;DR: A novel 3D ConvNets model for violence detection in video without using any prior knowledge is developed and results show that the method achieves superior performance without relying on handcrafted features.
Journal ArticleDOI

Affect in Multimedia: Benchmarking Violent Scenes Detection

TL;DR: In this paper , the authors report on the creation of a publicly available, common evaluation framework for violent scenes detection in Hollywood and YouTube videos, and propose a robust data set, the VSD96 dataset, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (i.e., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines.
Journal ArticleDOI

Breaking down violence detection

TL;DR: A solution which uses audio-visual features (MFCC-based audio and advanced motion features) and proposes to model violence by means of multiple (sub)concepts is presented and the potential of the proposed approach is demonstrated on the standardized datasets of the latest editions of the MediaEval Affect in Multimedia: Violent Scenes Detection task.
References
More filters
Proceedings ArticleDOI

k-means++: the advantages of careful seeding

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Proceedings Article

Visual categorization with bags of keypoints

TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.
Proceedings ArticleDOI

Linear spatial pyramid matching using sparse coding for image classification

TL;DR: An extension of the SPM method is developed, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and a linear SPM kernel based on SIFT sparse codes is proposed, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
Proceedings ArticleDOI

Action recognition by dense trajectories

TL;DR: This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
Proceedings ArticleDOI

Space-time interest points

Laptev, +1 more
TL;DR: This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.
Related Papers (5)