scispace - formally typeset
Search or ask a question
Author

Ralph Ewerth

Bio: Ralph Ewerth is an academic researcher from Leibniz University of Hanover. The author has contributed to research in topics: Computer science & TRECVID. The author has an hindex of 15, co-authored 124 publications receiving 986 citations. Previous affiliations of Ralph Ewerth include German National Library of Science and Technology & Information Technology University.


Papers
More filters
Proceedings ArticleDOI
23 Aug 2004
TL;DR: A robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages and is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set.
Abstract: Text localization and recognition in images is important for searching information in digital photo archives, video databases and Web sites However, since text is often printed against a complex background, it is often difficult to detect In this paper, a robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages First, a wavelet transform is applied to the image and the distribution of high-frequency wavelet coefficients is considered to statistically characterize text and non-text areas Then, the k-means algorithm is used to classify text areas in the image The detected text areas undergo a projection analysis in order to refine their localization Finally, a binary segmented text image is generated, to be used as input to an OCR engine The detection performance of our approach is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set

146 citations

Proceedings ArticleDOI
18 Sep 2003
TL;DR: An efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented.
Abstract: Text detection in images or videos is an important step to achieve multimedia content retrieval. In this paper, an efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented. The proposed approach is based on the application of a color reduction technique, a method for edge detection, and the localization of text regions using projection profile analyses and geometrical properties. The output of the algorithm are text boxes with a simplified background, ready to be fed into an OCR engine for subsequent character recognition. Our proposal is robust with respect to different font sizes, font colors, languages and background complexities. The performance of the approach is demonstrated by presenting promising experimental results for a set of images taken from different types of video sequences.

92 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This paper proposes to exploit hierarchical knowledge of multiple partitionings and additionally extract and take the photo’s scene content into account and incorporate contextual information at different spatial resolutions as well as more specific features for various environmental settings are incorporated in the learning process of the convolutional neural network.
Abstract: While the successful estimation of a photo’s geolocation enables a number of interesting applications, it is also a very challenging task. Due to the complexity of the problem, most existing approaches are restricted to specific areas, imagery, or worldwide landmarks. Only a few proposals predict GPS coordinates without any limitations. In this paper, we introduce several deep learning methods, which pursue the latter approach and treat geolocalization as a classification problem where the earth is subdivided into geographical cells. We propose to exploit hierarchical knowledge of multiple partitionings and additionally extract and take the photo’s scene content into account, i.e., indoor, natural, or urban setting etc. As a result, contextual information at different spatial resolutions as well as more specific features for various environmental settings are incorporated in the learning process of the convolutional neural network. Experimental results on two benchmarks demonstrate the effectiveness of our approach outperforming the state of the art while using a significant lower number of training images and without relying on retrieval methods that require an appropriate reference dataset.

62 citations

Proceedings ArticleDOI
23 Aug 2004
TL;DR: An outlier removal algorithm is incorporated into the approach to solve the problem of camera motion estimation in MPEG videos and the minimum number of motion vectors required to obtain satisfactory results is investigated.
Abstract: Several algorithms have been proposed to solve the problem of camera motion estimation in digital videos. However, the distinction between translation along the x-axis (y-axis) and rotation around the y-axis (x-axis) has only rarely been considered, and no approach of this kind is known to us for the MPEG domain. In this paper, we present such an algorithm for camera motion estimation in MPEG videos. For performance reasons it is reasonable to extract motion vectors directly from the compressed stream. However, since motion vectors are optimal with respect to compression, they often do not model real motion adequately and can thus be considered as "outliers" with respect to camera motion estimation. Consequently, an outlier removal algorithm is incorporated into our approach to solve this problem. Furthermore, we have investigated the minimum number of motion vectors required to obtain satisfactory results. Comprehensive experiments with 32 video clips demonstrate the performance of the proposed approach.

48 citations

Journal ArticleDOI
TL;DR: A small diamond search is adapted to the programming model of modern GPUs to exploit their available parallel computing power and memory bandwidth and demonstrates a significant reduction of computation time and a competitive encoding quality compared to a CPU UMHexagonS implementation while enabling the CPU to process other encoding tasks in parallel.
Abstract: The video coding standard H.264 supports video compression with a higher coding efficiency than previous standards. However, this comes at the expense of an increased encoding complexity, in particular for motion estimation which becomes a very time consuming task even for today's central processing units (CPU). On the other hand, modern graphics hardware includes a powerful graphics processing unit (GPU) whose computing power remains idle most of the time. In this paper, we present a GPU based approach to motion estimation for the purpose of H.264 video encoding. A small diamond search is adapted to the programming model of modern GPUs to exploit their available parallel computing power and memory bandwidth. Experimental results demonstrate a significant reduction of computation time and a competitive encoding quality compared to a CPU UMHexagonS implementation while enabling the CPU to process other encoding tasks in parallel.

41 citations


Cited by
More filters
01 Jan 2006

3,012 citations

Journal Article
TL;DR: AspectJ as mentioned in this paper is a simple and practical aspect-oriented extension to Java with just a few new constructs, AspectJ provides support for modular implementation of a range of crosscutting concerns.
Abstract: Aspect] is a simple and practical aspect-oriented extension to Java With just a few new constructs, AspectJ provides support for modular implementation of a range of crosscutting concerns. In AspectJ's dynamic join point model, join points are well-defined points in the execution of the program; pointcuts are collections of join points; advice are special method-like constructs that can be attached to pointcuts; and aspects are modular units of crosscutting implementation, comprising pointcuts, advice, and ordinary Java member declarations. AspectJ code is compiled into standard Java bytecode. Simple extensions to existing Java development environments make it possible to browse the crosscutting structure of aspects in the same kind of way as one browses the inheritance structure of classes. Several examples show that AspectJ is powerful, and that programs written using it are easy to understand.

2,947 citations

Journal Article
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

1,992 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

1,531 citations