scispace - formally typeset
Search or ask a question
Proceedings Article

A new pixel-based fusion framework to enhance object detection in automotive applications

07 Oct 2014-pp 1-8
TL;DR: An early fusion technique to combine the advantages of two or more vision sensors creates a single composite image that will be more comprehensive for further computer vision tasks, e.g. Object Detection.
Abstract: Modern cooperative Advanced Driver Assistance Systems (ADASs) require efficient algorithms and methods for the real-time processing of all sensor data. In particular, systems for combination of different sensors are gaining increasing importance. This paper proposes an early fusion technique to combine the advantages of two or more vision sensors. It creates a single composite image that will be more comprehensive for further computer vision tasks, e.g. Object Detection. After image registration, the presented pixel-based fusion framework transforms the registered sensor images into a common representational format by a multiscale decomposition. A denoising and a multiscale edge detection are applied on the transformed data. Only data with a high Activity Level are considered for the fusion process that based on a probabilistic approach. Finally, the fused image can be the input for a subsequent feature extraction task. The proposed fusion technique is examined on a Pedestrian Detection System based on an infrared and a visible light camera.
Citations
More filters
Journal ArticleDOI
28 Aug 2019-Sensors
TL;DR: A learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions is proposed.
Abstract: Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics.

34 citations

Patent
28 Mar 2017
TL;DR: In this article, a sequence of images obtained by a camera mounted on a vehicle is processed in order to generate Optical Flow data including a list of Motion Vectors being associated with respective features in the sequence.
Abstract: A sequence of images obtained by a camera mounted on a vehicle is processed in order to generate Optical Flow data including a list of Motion Vectors being associated with respective features in the sequence of images. The Optical Flow data is analyzed to calculate a Vanishing Point by calculating the mean point of all intersections of straight lines passing through motion vectors lying in a road. An Horizontal Filter subset is determined taking into account the Vanishing Point and a Bound Box list from a previous frame in order to filter from the Optical Flow the horizontal motion vectors. The subset of Optical Flow is clustered to generate the Bound Box list retrieving the moving objects in a scene. The Bound Box list is sent to an Alert Generation device and an output video shows the input scene where the detected moving objects are surrounded by a Bounding Box.

3 citations

Patent
Strandemar Katrin1
25 Jan 2018
TL;DR: In this article, a set of target fusion parameters at least by modifying baseline fusion parameters associated with the baseline resolution is generated by fusing the IR image and the VL image.
Abstract: Systems and methods disclosed herein, in accordance with one or more embodiments, provide for the generation of a fused image optimized for a target resolution, such as by receiving an infrared (IR) image captured by an IR imaging sensor, receiving a visible light (VL) image captured by a VL imaging sensor, determining a scaling factor based on a difference between the target resolution and a baseline resolution, and determining a set of target fusion parameters at least by modifying, according to the scaling factor, a set of baseline fusion parameters associated with the baseline resolution. A fused image is generated having the target resolution at least by fusing the IR image and the VL image according to the set of target fusion parameters.

3 citations

02 Jun 2018
TL;DR: This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side, an extension of the previous work, related to a clustering system, which only works on fixed cameras.
Abstract: This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side. The camera is supposed to be not only on a vehicle still, e.g. at a traffic light or at an intersection, but also moving slowly, e.g. in a car park. In all of the aforementioned conditions, a driver’s short loss of concentration or distraction can easily lead to a serious accident. A valid support to avoid these kinds of car crashes is represented by the proposed system. It is an extension of our previous work, related to a clustering system, which only works on fixed cameras. Just a vanish point calculation and simple optical flow filtering, to eliminate motion vectors due to the car relative movement, is performed to let the system achieve high performances with different scenarios, cameras and resolutions. The proposed system just uses as input the optical flow, which is hardware implemented in the proposed platform and since the elaboration of the whole system is really speed and power consumption, it is inserted directly in the camera framework, allowing to execute all the processing in real-time. Keywords—Clustering, cross traffic alert, optical flow, real time, vanishing point.

Cites background from "A new pixel-based fusion framework ..."

  • ...approaches have been proposed: image infrared and visible light sensors [9], object detection sensor (radar or camera) and in-vehicle sensor (for steering wheel and speedometer) [10], and so on....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed a spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coefficients, and achieved a performance within a factor log 2 n of the ideal performance of piecewise polynomial and variable-knot spline methods.
Abstract: SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle offers dramatic advantages over traditional linear estimation by nonadaptive kernels; however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatially-adaptive estimation: selective wavelet reconstruction. We show that variable-knot spline fits and piecewise-polynomial fits, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coefficients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality in multivariate normal decision theory which we call the oracle inequality shows that attained performance differs from ideal performance by at most a factor of approximately 2 log n, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variableknot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.

8,153 citations

Yoav Freund1, Robert E. Schapire1
01 Jan 1999
TL;DR: This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines. Some examples of recent applications of boosting are also described.

3,212 citations

Book
11 Aug 2011
TL;DR: The authors describe an algorithm that reconstructs a close approximation of 1-D and 2-D signals from their multiscale edges and shows that the evolution of wavelet local maxima across scales characterize the local shape of irregular structures.
Abstract: A multiscale Canny edge detection is equivalent to finding the local maxima of a wavelet transform. The authors study the properties of multiscale edges through the wavelet theory. For pattern recognition, one often needs to discriminate different types of edges. They show that the evolution of wavelet local maxima across scales characterize the local shape of irregular structures. Numerical descriptors of edge types are derived. The completeness of a multiscale edge representation is also studied. The authors describe an algorithm that reconstructs a close approximation of 1-D and 2-D signals from their multiscale edges. For images, the reconstruction errors are below visual sensitivity. As an application, a compact image coding algorithm that selects important edges and compresses the image data by factors over 30 has been implemented. >

3,187 citations