High-level feature detection from video in TRECVid: a 5-year retrospective of achievements
read more
Citations
Multimodal fusion for multimedia analysis: a survey
A Survey on Visual Content-Based Video Indexing and Retrieval
Domain Adaptation under Target and Conditional Shift
Concept-Based Video Retrieval
Lifelogging: Personal Big Data
References
Content-based image retrieval at the end of the early years
The challenge problem for automated detection of 101 semantic concepts in multimedia
Large-scale concept ontology for multimedia
Estimating average precision with incomplete and imperfect judgments
Semantic concept-based query expansion and re-ranking for multimedia retrieval
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the future works in "High-level feature detection from video in trecvid: a 5-year retrospective of achievements" ?
The targeted effort to design a concept ontology for broadcast news, LSCOM [ 5 ], has also been very influential, since it created the possibility to use the semantic relations between concepts for the search task. Future experiments should be more focused on quantifying the robustness of the technology, how well can detectors be applied in different domains, and on better comparability of the experiments across sites and across collections in order to answer community-wide high-level research questions. Now approaches are consolidating, and it may become more attractive to control more factors in the experimental setting in order to make submissions more comparable across sites. The authors mention a few: • What are the limits on the generalizability of detectors, i. e., how reusable are the detectors, and how can they measure this in an affordable way given the further constraint that changing data sets is expensive ? •
Q3. What is the way to make search in video archives more efficient?
A promising approach to make search in video archives more efficient and effective is to develop automatic indexing techniques that produce descriptions at a higher semantic level that is better attuned to matching information needs.
Q4. What was the main focus of the TRECVid training data in the latter years?
In the latter years this training data consisted of manually annotated shots provided as part of large-scale community-based video annotation activities, an aspect of TRECVid which really allowed the benchmark to focus on system approaches rather than data availability.
Q5. How do you build a high-level feature detector?
High-level feature detectors are usually built by training a classifier (often a support vector machine) on labeled training data.
Q6. What was the main aspect of the feature detection task?
One interesting aspect of the feature detection task was the provision of development data which could be used by participating groups to train their feature detection systems.
Q7. How many hours of training data were used for the evaluation in 2003?
A total of 60 hours (32,318 shots) were used for the evaluation, a big step-up in size, and 10 groups submitted a total of 60 runs which were pooled and only partially assessed because of the large ramp-up in submissions and data volume from the data used in 2002.
Q8. What is the way to facilitate meta-analysis of experiment results across sites?
One way to facilitate meta-analysis of experiment results across sites is to classify systems based on an ontology of experimental choices that has been constructed for the design of a concept detector architecture.
Q9. What is the TRECVid standard for correctness in annotation of feature training data and?
The TRECVid standard for correctness in annotation of feature training data and judging of system output is that of a human – so that examples which are very difficult for systems due to small size, occlusion, etc., are included in the training data and systems that can detect these examples get credit for them – as should be the case in a real system.
Q10. What is the standard for correctness in annotation of feature training data?
When assessing the results of feature detection the authors employ the widely used trec eval software to calculate standard information retrieval measures.
Q11. What is the way to annotate the presence of a feature?
In the broadcast news domain, shots are fairly short, for longer shots, it might make sense to annotate the presence of a feature at the frame level.
Q12. What is the main reason why high-level features are called features?
In turn (and this is the main reason why they are called features), the high-level features can be used as features by a higher level interpretation module, combining different high-level features in a compositional fashion, e.g. ‘car AND fire’.
Q13. What did the participants learn from the previous iterations of the feature detection task?
Throughout the previous iterations of the feature detection task most groups had come to depend on the keyframe as the shot representative and had applied their feature detection techniques to the keyframe rather than the whole shot.