Home
/
Authors
/
Marc Ritter

Author

Marc Ritter

Other affiliations: Chemnitz University of Technology

Bio: Marc Ritter is an academic researcher from Hochschule Mittweida. The author has contributed to research in topics: Computer science & TRECVID. The author has an hindex of 7, co-authored 63 publications receiving 318 citations. Previous affiliations of Marc Ritter include Chemnitz University of Technology.

Topics: Computer science, TRECVID, Workflow, Object detection, Virtual reality ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011
2009
2008

Papers

PDF

Open Access

More filters

TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking

[...]

Jonathan G. Fiscus¹, David Joy, Martial Michel¹, George Awad¹, George Awad², Alan F. Smeaton, Gareth J. F. Jones³, Wessel Kraaij, Georges Quénot, Marc Ritter⁴, Maria Eskevich⁵, Roeland Ordelman⁶, Robin Aly⁶, Benoit Huet⁷, Larson - Show less +11 more•Institutions (7)

National Institute of Standards and Technology¹, Silver Spring Networks², Dublin City University³, Chemnitz University of Technology⁴, Radboud University Nijmegen⁵, University of Twente⁶, Institut Eurécom⁷

14 Nov 2016

TL;DR: TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking George Awad, Jonathan Fiscus, David Joy, Martial Michel, Alan Smeaton, Wessel Kraaij, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, et al.

...read moreread less

Abstract: HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking George Awad, Jonathan Fiscus, David Joy, Martial Michel, Alan Smeaton, Wessel Kraaij, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, et al.

...read moreread less

116 citations

Large-Scale Bird Sound Classification using Convolutional Neural Networks.

[...]

Stefan Kahl, Thomas Wilhelm-Stein, Hussein Hussein, Holger Klinck, Danny Kowerko, Marc Ritter, Maximilian Eibl - Show less +3 more

01 Jan 2017

TL;DR: A method for large-scale bird sound classification in the context of the LifeCLEF 2017 bird identification task was summarized, using a variety of convolutional neural networks to generate features extracted from visual representations of field recordings.

...read moreread less

Abstract: Identifying bird species in audio recordings is a challenging field of research. In this paper, we summarize a method for large-scale bird sound classification in the context of the LifeCLEF 2017 bird identification task. We used a variety of convolutional neural networks to generate features extracted from visual representations of field recordings. The BirdCLEF 2017 training dataset consist of 36.496 audio recordings containing 1500 different bird species. Our approach achieved a mean average precision of 0,605 (official score) and 0,687 considering only foreground species.

...read moreread less

57 citations

Proceedings Article•DOI•

Produce. annotate. archive. repurpose --: accelerating the composition and metadata accumulation of tv content

[...]

Robert Knauf¹, Jens Kürsten¹, Albrecht Kurze¹, Marc Ritter¹, Arne Berger¹, Stephan Heinich¹, Maximilian Eibl¹ - Show less +3 more•Institutions (1)

Chemnitz University of Technology¹

01 Dec 2011

TL;DR: Supporting most aspects of a media provider's real workflows such as production, distribution, content description, archiving, and re-use of video items, a holistic framework to solve issues such as lack of human resources, necessity of parallel media distribution, and retrieving previously archived content through editors or consumers is developed.

...read moreread less

Abstract: Supporting most aspects of a media provider's real workflows such as production, distribution, content description, archiving, and re-use of video items, we developed a holistic framework to solve issues such as lack of human resources, necessity of parallel media distribution, and retrieving previously archived content through editors or consumers.

...read moreread less

15 citations

Proceedings Article•DOI•

Comparing Visual Data Fusion Techniques Using FIR and Visible Light Sensors to Improve Pedestrian Detection

[...]

Jan Thomanek, Marc Ritter¹, Holger Lietz¹, Gerd Wanielik¹•Institutions (1)

Chemnitz University of Technology¹

06 Dec 2011

TL;DR: Three different fusion techniques are proposed to combine the advantages of two vision sensors -- a far-infrared (FIR) and a visible light camera and the results of the pedestrian classification are compared.

...read moreread less

Abstract: Pedestrian detection is an important field in computer vision with applications in surveillance, robotics and driver assistance systems. The quality of such systems can be improved by the simultaneous use of different sensors. This paper proposes three different fusion techniques to combine the advantages of two vision sensors -- a far-infrared (FIR) and a visible light camera. Different fusion methods taken from various levels of information representation are briefly described and finally compared regarding the results of the pedestrian classification.

...read moreread less

15 citations

Journal Article•DOI•

Simulations of camera-based single-molecule fluorescence experiments.

[...]

Richard Börner¹, Danny Kowerko², Mélodie C. A. S. Hadzic¹, Sebastian L. B. König¹, Marc Ritter, Roland K. O. Sigel¹ - Show less +2 more•Institutions (2)

University of Zurich¹, Chemnitz University of Technology²

13 Apr 2018-PLOS ONE

TL;DR: A Matlab-based software that allows for the simulation of camera-based smFRET videos, yielding standardized data sets suitable for benchmarking video processing algorithms and pre-optimizing and evaluating spot detection algorithms using the authors' simulated video test sets.

...read moreread less

Abstract: Single-molecule microscopy has become a widely used technique in (bio)physics and (bio)chemistry. A popular implementation is single-molecule Forster Resonance Energy Transfer (smFRET), for which total internal reflection fluorescence microscopy is frequently combined with camera-based detection of surface-immobilized molecules. Camera-based smFRET experiments generate large and complex datasets and several methods for video processing and analysis have been reported. As these algorithms often address similar aspects in video analysis, there is a growing need for standardized comparison. Here, we present a Matlab-based software (MASH-FRET) that allows for the simulation of camera-based smFRET videos, yielding standardized data sets suitable for benchmarking video processing algorithms. The software permits to vary parameters that are relevant in cameras-based smFRET, such as video quality, and the properties of the system under study. Experimental noise is modeled taking into account photon statistics and camera noise. Finally, we survey how video test sets should be designed to evaluate currently available data analysis strategies in camera-based sm fluorescence experiments. We complement our study by pre-optimizing and evaluating spot detection algorithms using our simulated video test sets.

...read moreread less

13 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Economics and Information Theory

[...]

G. R. Fisher¹•Institutions (1)

University of Southampton¹

01 May 1970

1,935 citations

Proceedings Article•DOI•

CNN architectures for large-scale audio classification

[...]

Shawn Hershey¹, Sourish Chaudhuri¹, Daniel P. W. Ellis¹, Jort F. Gemmeke¹, Aren Jansen¹, R. Channing Moore¹, Manoj Plakal¹, Devin Platt¹, Rif A. Saurous¹, Bryan Seybold¹, Malcolm Slaney¹, Ron Weiss¹, Kevin W. Wilson¹ - Show less +9 more•Institutions (1)

Google¹

05 Mar 2017

TL;DR: In this paper, the authors used various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.

...read moreread less

1,470 citations

Journal Article•DOI•

SIFT Meets CNN: A Decade Survey of Instance Retrieval

[...]

Liang Zheng, Yi Yang, Qi Tian¹•Institutions (1)

University of Texas at San Antonio¹

01 May 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A comprehensive survey of instance retrieval over the last decade, presenting milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods.

...read moreread less

Abstract: In the early days, content-based image retrieval (CBIR) was studied with global features. Since 2003, image retrieval based on local descriptors ( de facto SIFT) has been extensively studied for over a decade due to the advantage of SIFT in dealing with image transformations. Recently, image representations based on the convolutional neural network (CNN) have attracted increasing interest in the community and demonstrated impressive performance. Given this time of rapid evolution, this article provides a comprehensive survey of instance retrieval over the last decade. Two broad categories, SIFT-based and CNN-based methods, are presented. For the former, according to the codebook size, we organize the literature into using large/medium-sized/small codebooks. For the latter, we discuss three lines of methods, i.e., using pre-trained or fine-tuned CNN models, and hybrid methods. The first two perform a single-pass of an image to the network, while the last category employs a patch-based feature extraction scheme. This survey presents milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods. After analyzing and comparing retrieval performance of different categories on several datasets, we discuss promising directions towards generic and specialized instance retrieval.

...read moreread less

554 citations

Proceedings Article•DOI•

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

[...]

Esteban Real¹, Jonathon Shlens¹, Stefano Mazzocchi¹, Xin Pan¹, Vincent Vanhoucke¹ - Show less +1 more•Institutions (1)

Google¹

01 Jul 2017

TL;DR: A new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB), which consists of approximately 380,000 video segments automatically selected to feature objects in natural settings without editing or post-processing.

...read moreread less

Abstract: We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the COCO [32] label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding boxes. Finally, we train and evaluate well-known deep network architectures and report baseline figures for per-frame classification and localization. We also demonstrate how the temporal contiguity of video can potentially be used to improve such inferences. The data set can be found at https://research.google.com/youtube-bb. We hope the availability of such large curated corpus will spur new advances in video object detection and tracking.

...read moreread less

501 citations

Proceedings Article•DOI•

ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification

[...]

Rohit Girdhar¹, Rohit Girdhar², Deva Ramanan², Abhinav Gupta², Josef Sivic, Bryan Russell¹ - Show less +2 more•Institutions (2)

Adobe Systems¹, Carnegie Mellon University²

21 Jul 2017

TL;DR: A new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video and outperforms other baselines with comparable base architectures on HMDB51, UCF101, and Charades video classification benchmarks.

...read moreread less

Abstract: In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video. We do so by integrating state-of-the-art two-stream networks [42] with learnable spatio-temporal feature aggregation [6]. The resulting architecture is end-to-end trainable for whole-video classification. We investigate different strategies for pooling across space and time and combining signals from the different streams. We find that: (i) it is important to pool jointly across space and time, but (ii) appearance and motion streams are best aggregated into their own separate representations. Finally, we show that our representation outperforms the two-stream base architecture by a large margin (13% relative) as well as outperforms other baselines with comparable base architectures on HMDB51, UCF101, and Charades video classification benchmarks.

...read moreread less

410 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

Collapse