A Novel Learning-Based Framework for Detecting Interesting Events in Soccer Videos
read more
Citations
Affection arousal based highlight extraction for soccer video
Soccer Video Event Annotation by Synchronization of Attack–Defense Clips and Match Reports With Coarse-Grained Time Information
Generic architecture for event detection in broadcast sports video
A hybrid framework for event detection using multi-modal features
A Semi-Automatic Soccer Video Annotation System based on Ontology Paradigm
References
Robust Real-Time Face Detection
An Introduction to Conditional Random Fields for Relational Learning
Video abstraction: A systematic review and classification
Multiscale conditional random fields for image labeling
An Introduction to Conditional Random Fields for Relational Learning
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is the effect of the CRF?
Conditional dependency between mid level features is exploited by the CRF’s that results in the parsing of interesting and non-interesting events.
Q3. What is the first step in detecting interesting events?
The first step in detecting interesting events is to abstract the raw video data into more semantically meaningful streams of information.
Q4. What is the conditional distribution over the hidden states?
The conditional distribution over the hidden states is written as:p(y|x) = 1 Z(x)∏ φc(xc, yc)where Z(x) = ∑y ∏ c∈C φc(xc, yc) is the normalizingpartition function.
Q5. What is the effect of gray thresholding?
The gray thresholding leaves out the whiter portions and after that applying the close operation followed by erosion with structuring element as line (first vertical the horizontal).
Q6. How many frames did the SVM classifier correctly classify?
Out of the 132 frames, their SVM-based classifier correctly classified the frame into a field view and a non-field view in 127 cases, giving an accuracy of 96.21%.
Q7. What is the function that extracts the vector of features from the variable values?
Without loss of generality φc(xc, yc) are described by log linear combinations of feature functions fc() i.e.,φc(xc, yc) = exp(wTc fc(xc, yc))where wTc is the transpose of a weight vector wc and fc(xc, yc) is a function that extracts vector of features from the variable values.
Q8. How many times did the system mark a Zoom In correctly?
For 50 new non-training examples given to the system, it marked a Zoom In correctly in 47 of them, representing an accuracy of 94%.
Q9. What are the features that are used to classify an interesting event?
The low level features, zoom, goal post, crowd and face detection are combined according to their respective dependency to deduce mid level features, zoom in on the goal post, a crowd detection along with goal post detection and goal post detection followed by a face detection.
Q10. What is the main limitation of the work of Duan et al.?
This is also a limitation in the work of Duan et al. [5] who proposes a multi-level hierarchy, but has a rule-based system for event detection.
Q11. What is the probability of a sequence of frames being a goal?
A combination of field view with goal post and zoom in on the goal post has higher probability that the sequence of frames is a goal than just a field view and goal post.
Q12. What are the three main sections of the model?
For their experiments with soccer videos, the authors choose the following low-level features, and outline their detection process in detail in the subsequent sections: Zoom In, Goal Post Detection, Crowd Detection, Field View Detection, and Face Detection.