scispace - formally typeset
Search or ask a question

Showing papers on "Image sensor published in 2020"


Journal ArticleDOI
04 Mar 2020-Nature
TL;DR: It is demonstrated that an image sensor can itself constitute an ANN that can simultaneously sense and process optical images without latency, and is trained to classify and encode images with high throughput, acting as an artificial neural network.
Abstract: Machine vision technology has taken huge leaps in recent years, and is now becoming an integral part of various intelligent systems, including autonomous vehicles and robotics. Usually, visual information is captured by a frame-based camera, converted into a digital format and processed afterwards using a machine-learning algorithm such as an artificial neural network (ANN)1. The large amount of (mostly redundant) data passed through the entire signal chain, however, results in low frame rates and high power consumption. Various visual data preprocessing techniques have thus been developed2-7 to increase the efficiency of the subsequent signal processing in an ANN. Here we demonstrate that an image sensor can itself constitute an ANN that can simultaneously sense and process optical images without latency. Our device is based on a reconfigurable two-dimensional (2D) semiconductor8,9 photodiode10-12 array, and the synaptic weights of the network are stored in a continuously tunable photoresponsivity matrix. We demonstrate both supervised and unsupervised learning and train the sensor to classify and encode images that are optically projected onto the chip with a throughput of 20 million bins per second.

436 citations


Journal ArticleDOI
TL;DR: Flat optics for direct image differentiation is demonstrated, allowing us to significantly shrink the required optical system size, significantly reducing the size and complexity of conventional optical systems.
Abstract: Image processing has become a critical technology in a variety of science and engineering disciplines. Although most image processing is performed digitally, optical analog processing has the advantages of being low-power and high-speed, but it requires a large volume. Here, we demonstrate flat optics for direct image differentiation, allowing us to significantly shrink the required optical system size. We first demonstrate how the differentiator can be combined with traditional imaging systems such as a commercial optical microscope and camera sensor for edge detection with a numerical aperture up to 0.32. We next demonstrate how the entire processing system can be realized as a monolithic compound flat optic by integrating the differentiator with a metalens. The compound nanophotonic system manifests the advantage of thin form factor as well as the ability to implement complex transfer functions, and could open new opportunities in applications such as biological imaging and computer vision. Vertical integration of a metalens to realize compound nanophotonic systems for optical analog image processing is realized, significantly reducing the size and complexity of conventional optical systems.

256 citations


Book ChapterDOI
27 Apr 2020
TL;DR: Li et al. as discussed by the authors proposed a 3D-CVF that combines the camera and LiDAR features using the cross-view spatial feature fusion strategy, which achieved state-of-the-art performance in the KITTI benchmark.
Abstract: In this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection. Because the camera and LiDAR sensor signals have different characteristics and distributions, fusing these two modalities is expected to improve both the accuracy and robustness of 3D object detection. One of the challenges presented by the fusion of cameras and LiDAR is that the spatial feature maps obtained from each modality are represented by significantly different views in the camera and world coordinates; hence, it is not an easy task to combine two heterogeneous feature maps without loss of information. To address this problem, we propose a method called 3D-CVF that combines the camera and LiDAR features using the cross-view spatial feature fusion strategy. First, the method employs auto-calibrated projection, to transform the 2D camera features to a smooth spatial feature map with the highest correspondence to the LiDAR features in the bird’s eye view (BEV) domain. Then, a gated feature fusion network is applied to use the spatial attention maps to mix the camera and LiDAR features appropriately according to the region. Next, camera-LiDAR feature fusion is also achieved in the subsequent proposal refinement stage. The low-level LiDAR features and camera features are separately pooled using region of interest (RoI)-based feature pooling and fused with the joint camera-LiDAR features for enhanced proposal refinement. Our evaluation, conducted on the KITTI and nuScenes 3D object detection datasets, demonstrates that the camera-LiDAR fusion offers significant performance gain over the LiDAR-only baseline and that the proposed 3D-CVF achieves state-of-the-art performance in the KITTI benchmark.

231 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Zhang et al. as discussed by the authors model the HDR-to-LDR image formation pipeline as the dynamic range clipping, non-linear mapping from a camera response function, and quantization.
Abstract: Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) input image is challenging due to missing details in under-/over-exposed regions caused by quantization and saturation of camera sensors. In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDR-to-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization. We then propose to learn three specialized CNNs to reverse these steps. By decomposing the problem into specific sub-tasks, we impose effective physical constraints to facilitate the training of individual sub-networks. Finally, we jointly fine-tune the entire model end-to-end to reduce error accumulation. With extensive quantitative and qualitative experiments on diverse image datasets, we demonstrate that the proposed method performs favorably against state-of-the-art single-image HDR reconstruction algorithms.

167 citations


Proceedings ArticleDOI
13 Feb 2020
TL;DR: Huang et al. as discussed by the authors presented PyNET, a novel pyramidal CNN architecture designed for fine-grained image restoration that implicitly learns to perform all ISP steps such as image demosaicing, denoising, white balancing, color and contrast correction, etc.
Abstract: As the popularity of mobile photography is growing constantly, lots of efforts are being invested now into building complex hand-crafted camera ISP solutions. In this work, we demonstrate that even the most sophisticated ISP pipelines can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device. For this, we present PyNET, a novel pyramidal CNN architecture designed for fine-grained image restoration that implicitly learns to perform all ISP steps such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The model is trained to convert RAW Bayer data obtained directly from mobile camera sensor into photos captured with a professional high-end DSLR camera, making the solution independent of any particular mobile ISP implementation. To validate the proposed approach on the real data, we collected a large-scale dataset consisting of 10 thousand full-resolution RAW-RGB image pairs captured in the wild with the Huawei P20 cameraphone (12.3 MP Sony Exmor IMX380 sensor) and Canon 5D Mark IV DSLR. The experiments demonstrate that the proposed solution can easily get to the level of the embedded P20's ISP pipeline that, unlike our approach, is combining the data from two (RGB + B/W) camera sensors. The dataset, pretrained models and codes used in this paper are available on the project website: https://people.ee.ethz.ch/~ihnatova/pynet.html.

156 citations


Journal ArticleDOI
20 Apr 2020
TL;DR: In this paper, a 1 Mpixel single-photon avalanche diode camera with 3.8 ns time gating and 24 kfps frame rate is presented, fabricated in 180 nm CMOS image sensor technology.
Abstract: We present a 1 Mpixel single-photon avalanche diode camera featuring 3.8 ns time gating and 24 kfps frame rate, fabricated in 180 nm CMOS image sensor technology. We designed two pixels with a pitch of 9.4 µm in 7 T and 5.75 T configurations respectively, achieving a maximum fill factor of 13.4%. The maximum photon detection probability is 27%, median dark count rate is 2.0 cps, variation in gating length is 120 ps, position skew is 410 ps, and rise/fall time is ${ \lt }{550}\;{\rm ps}$<550ps, all FWHM at 3.3 V excess bias. The sensor was used to capture 2D/3D scenes over 2 m with resolution (least significant bit) of 5.4 mm and precision better than 7.8 mm (rms). We demonstrate extended dynamic range in dual exposure operation mode and show spatially overlapped multi-object detection in single-photon time-gated time-of-flight experiments.

156 citations


Journal ArticleDOI
TL;DR: The curved neuromorphic image sensor array integrated with a plano-convex lens derives a pre-processed image from a set of noisy optical inputs without redundant data storage, processing, and communications as well as without complex optics.
Abstract: Conventional imaging and recognition systems require an extensive amount of data storage, pre-processing, and chip-to-chip communications as well as aberration-proof light focusing with multiple lenses for recognizing an object from massive optical inputs. This is because separate chips (i.e., flat image sensor array, memory device, and CPU) in conjunction with complicated optics should capture, store, and process massive image information independently. In contrast, human vision employs a highly efficient imaging and recognition process. Here, inspired by the human visual recognition system, we present a novel imaging device for efficient image acquisition and data pre-processing by conferring the neuromorphic data processing function on a curved image sensor array. The curved neuromorphic image sensor array is based on a heterostructure of MoS2 and poly(1,3,5-trimethyl-1,3,5-trivinyl cyclotrisiloxane). The curved neuromorphic image sensor array features photon-triggered synaptic plasticity owing to its quasi-linear time-dependent photocurrent generation and prolonged photocurrent decay, originated from charge trapping in the MoS2-organic vertical stack. The curved neuromorphic image sensor array integrated with a plano-convex lens derives a pre-processed image from a set of noisy optical inputs without redundant data storage, processing, and communications as well as without complex optics. The proposed imaging device can substantially improve efficiency of the image acquisition and recognition process, a step forward to the next generation machine vision. Designing efficient bio-inspired visual recognition system remains a challenge. Here the authors present a curved neuromorphic image sensor array based on a heterostructure of MoS2 and pV3D3 integrated with a plano-convex lens for efficient image acquisition and data pre-processing.

118 citations


Proceedings ArticleDOI
TL;DR: CenterFusion as discussed by the authors uses a center point detection network to detect objects by identifying their center points on the image and then solves the key data association problem using a novel frustum-based method to associate the radar detections to their corresponding object's center point.
Abstract: The perception system in autonomous vehicles is responsible for detecting and tracking the surrounding objects. This is usually done by taking advantage of several sensing modalities to increase robustness and accuracy, which makes sensor fusion a crucial part of the perception system. In this paper, we focus on the problem of radar and camera sensor fusion and propose a middle-fusion approach to exploit both radar and camera data for 3D object detection. Our approach, called CenterFusion, first uses a center point detection network to detect objects by identifying their center points on the image. It then solves the key data association problem using a novel frustum-based method to associate the radar detections to their corresponding object's center point. The associated radar detections are used to generate radar-based feature maps to complement the image features, and regress to object properties such as depth, rotation and velocity. We evaluate CenterFusion on the challenging nuScenes dataset, where it improves the overall nuScenes Detection Score (NDS) of the state-of-the-art camera-based algorithm by more than 12%. We further show that CenterFusion significantly improves the velocity estimation accuracy without using any additional temporal information. The code is available at this https URL .

111 citations


Proceedings ArticleDOI
01 Mar 2020
TL;DR: A novel neural network architecture for video reconstruction from events that is smaller (38k vs. 10M parameters) and faster than state-of-the-art with minimal impact to performance is proposed.
Abstract: Event cameras are powerful new sensors able to capture high dynamic range with microsecond temporal resolution and no motion blur. Their strength is detecting brightness changes (called events) rather than capturing direct brightness images; however, algorithms can be used to convert events into usable image representations for applications such as classification. Previous works rely on hand-crafted spatial and temporal smoothing techniques to reconstruct images from events. State-of-the-art video reconstruction has recently been achieved using neural networks that are large (10M parameters) and computationally expensive, requiring 30ms for a forward-pass at 640 × 480 resolution on a modern GPU. We propose a novel neural network architecture for video reconstruction from events that is smaller (38k vs. 10M parameters) and faster (10ms vs. 30ms) than state-of-the-art with minimal impact to performance.

111 citations


Journal ArticleDOI
TL;DR: In this article, the authors derived transfer function models that account for all these physical effects and interactions of these models on the imaging resolution of a lens-free on-chip digital holographic microscopy (LFOCDHM) system.
Abstract: Lens-free on-chip digital holographic microscopy (LFOCDHM) is a modern imaging technique whereby the sample is placed directly onto or very close to the digital sensor, and illuminated by a partially coherent source located far above it. The scattered object wave interferes with the reference (unscattered) wave at the plane where a digital sensor is situated, producing a digital hologram that can be processed in several ways to extract and numerically reconstruct an in-focus image using the back propagation algorithm. Without requiring any lenses and other intermediate optical components, the LFOCDHM has unique advantages of offering a large effective numerical aperture (NA) close to unity across the native wide field-of-view (FOV) of the imaging sensor in a cost-effective and compact design. However, unlike conventional coherent diffraction limited imaging systems, where the limiting aperture is used to define the system performance, typical lens-free microscopes only produce compromised imaging resolution that far below the ideal coherent diffraction limit. At least five major factors may contribute to this limitation, namely, the sample-to-sensor distance, spatial and temporal coherence of the illumination, finite size of the equally spaced sensor pixels, and finite extent of the image sub-FOV used for the reconstruction, which have not been systematically and rigorously explored until now. In this article, we derive five transfer function models that account for all these physical effects and interactions of these models on the imaging resolution of LFOCDHM. We also examine how our theoretical models can be utilized to optimize the optical design or predict the theoretical resolution limit of a given LFOCDHM system. We present a series of simulations and experiments to confirm the validity of our theoretical models.

103 citations


Journal ArticleDOI
TL;DR: In this article, the authors demonstrate the first large-scale coherent detector array consisting of 512 pixels and its operation in a 3D imaging system, achieving an accuracy of 3.1~mW using only 4mW of light.
Abstract: Accurate 3D imaging is essential for machines to map and interact with the physical world. While numerous 3D imaging technologies exist, each addressing niche applications with varying degrees of success, none have achieved the breadth of applicability and impact that digital image sensors have achieved in the 2D imaging world. A large-scale two-dimensional array of coherent detector pixels operating as a light detection and ranging (LiDAR) system could serve as a universal 3D imaging platform. Such a system would offer high depth accuracy and immunity to interference from sunlight, as well as the ability to directly measure the velocity of moving objects. However, due to difficulties in providing electrical and photonic connections to every pixel, previous systems have been restricted to fewer than 20 pixels. Here, we demonstrate the first large-scale coherent detector array consisting of 512 ($32 \times 16$) pixels, and its operation in a 3D imaging system. Leveraging recent advances in the monolithic integration of photonic and electronic circuits, a dense array of optical heterodyne detectors is combined with an integrated electronic readout architecture, enabling straightforward scaling to arbitrarily large arrays. Meanwhile, two-axis solid-state beam steering eliminates any tradeoff between field of view and range. Operating at the quantum noise limit, our system achieves an accuracy of $3.1~\mathrm{mm}$ at a distance of 75 metres using only $4~\mathrm{mW}$ of light, an order of magnitude more accurate than existing solid-state systems at such ranges. Future reductions of pixel size using state-of-the-art components could yield resolutions in excess of 20 megapixels for arrays the size of a consumer camera sensor. This result paves the way for the development and proliferation of low cost, compact, and high performance 3D imaging cameras.

Journal ArticleDOI
TL;DR: A decisive advance in 2D integrated circuits is reported, where the device integration scale is increased by tenfold and the functional complexity of 2D electronics is propelled to an unprecedented level.
Abstract: 2D semiconductors, especially transition metal dichalcogenide (TMD) monolayers, are extensively studied for electronic and optoelectronic applications. Beyond intensive studies on single transistors and photodetectors, the recent advent of large-area synthesis of these atomically thin layers has paved the way for 2D integrated circuits, such as digital logic circuits and image sensors, achieving an integration level of ≈100 devices thus far. Here, a decisive advance in 2D integrated circuits is reported, where the device integration scale is increased by tenfold and the functional complexity of 2D electronics is propelled to an unprecedented level. Concretely, an analog optoelectronic processor inspired by biological vision is developed, where 32 × 32 = 1024 MoS2 photosensitive field-effect transistors manifesting persistent photoconductivity (PPC) effects are arranged in a crossbar array. This optoelectronic processor with PPC memory mimics two core functions of human vision: it captures and stores an optical image into electrical data, like the eye and optic nerve chain, and then recognizes this electrical form of the captured image, like the brain, by executing analog in-memory neural net computing. In the highlight demonstration, the MoS2 FET crossbar array optically images 1000 handwritten digits and electrically recognizes these imaged data with 94% accuracy.

20 Mar 2020
TL;DR: Spatially overlapped multi-object detection is experimentally demonstrated in single-photon time-gated ToF for the first time and extended dynamic range is demonstrated in dual exposure operation mode.
Abstract: We present the first 1Mpixel SPAD camera ever reported. The camera features 3.8ns time gating and 24kfps frame rate; it was fabricated in 180nm CIS technology. Two pixels have been designed with a pitch of 9.4$\mu$m in 7T and 5.75T configurations, respectively, achieving a maximum fill factor of 13.4%. The maximum PDP is 27%, median DCR 2.0cps, variation in gating length 120ps, position skew 410ps, and rise/fall time <550ps, all FWHM at 3.3V of excess bias. The sensor was used to capture 2D/3D scenes over 2m with an LSB of 5.4mm and a precision better than 7.8mm. Extended dynamic range is demonstrated in dual exposure operation mode. Spatially overlapped multi-object detection is experimentally demonstrated in single-photon time-gated ToF for the first time.

Journal ArticleDOI
TL;DR: This paper proposes a novel, compact, and inexpensive computational camera for snapshot hyperspectral imaging that consists of a repeated spectral filter array placed directly on the image sensor and a diffuser placed close to the sensor.
Abstract: Hyperspectral imaging is useful for applications ranging from medical diagnostics to agricultural crop monitoring; however, traditional scanning hyperspectral imagers are prohibitively slow and expensive for widespread adoption. Snapshot techniques exist but are often confined to bulky benchtop setups or have low spatio-spectral resolution. In this paper, we propose a novel, compact, and inexpensive computational camera for snapshot hyperspectral imaging. Our system consists of a tiled spectral filter array placed directly on the image sensor and a diffuser placed close to the sensor. Each point in the world maps to a unique pseudorandom pattern on the spectral filter array, which encodes multiplexed spatio-spectral information. By solving a sparsity-constrained inverse problem, we recover the hyperspectral volume with sub-super-pixel resolution. Our hyperspectral imaging framework is flexible and can be designed with contiguous or non-contiguous spectral filters that can be chosen for a given application. We provide theory for system design, demonstrate a prototype device, and present experimental results with high spatio-spectral resolution.

Journal ArticleDOI
01 Sep 2020
TL;DR: In this article, an aquatic-vision-inspired camera that consists of a single monocentric lens and a hemispherical silicon nanorod photodiode array is presented.
Abstract: Conventional wide-field-of-view cameras consist of multi-lens optics and flat image sensor arrays, which makes them bulky and heavy As a result, they are poorly suited to advanced mobile applications such as drones and autonomous vehicles In nature, the eyes of aquatic animals consist of a single spherical lens and a highly sensitive hemispherical retina, an approach that could be beneficial in the development of synthetic wide-field-of-view imaging systems Here, we report an aquatic-vision-inspired camera that consists of a single monocentric lens and a hemispherical silicon nanorod photodiode array The imaging system features a wide field of view, miniaturized design, low optical aberration, deep depth of field and simple visual accommodation Furthermore, under vignetting, the photodiode array enables high-quality panoramic imaging due to the enhanced photodetection properties of the silicon nanorod photodiodes By integrating a single monocentric lens with a hemispherical silicon nanorod photodiode array, a wide-field-of-view camera is created that offers low optical aberration, deep depth of field and simple visual accommodation

Journal ArticleDOI
20 Apr 2020
TL;DR: In this article, 3D dielectric elements are designed to be placed on top of the pixels of image sensors, that sort and focus light based on its color and polarization with efficiency significantly surpassing 2D absorptive and diffractive filters.
Abstract: Three-dimensional elements, with refractive index distribution structured at subwavelength scale, provide an expansive optical design space that can be harnessed for demonstrating multifunctional free-space optical devices. Here we present 3D dielectric elements, designed to be placed on top of the pixels of image sensors, that sort and focus light based on its color and polarization with efficiency significantly surpassing 2D absorptive and diffractive filters. The devices are designed via iterative gradient-based optimization to account for multiple target functions while ensuring compatibility with existing nanofabrication processes, and they are experimentally validated using a scaled device that operates at microwave frequencies. This approach combines arbitrary functions into a single compact element, even where there is no known equivalent in bulk optics, enabling novel integrated photonic applications.

Journal ArticleDOI
TL;DR: This work introduces neural sensors as a methodology to optimize per-pixel shutter functions jointly with a differentiable image processing method, such as a neural network, in an end-to-end fashion and demonstrates how to leverage emerging programmable and re-configurable sensor–processors to implement the optimized exposure functions directly on the sensor.
Abstract: Camera sensors rely on global or rolling shutter functions to expose an image. This fixed function approach severely limits the sensors’ ability to capture high-dynamic-range (HDR) scenes and resolve high-speed dynamics. Spatially varying pixel exposures have been introduced as a powerful computational photography approach to optically encode irradiance on a sensor and computationally recover additional information of a scene, but existing approaches rely on heuristic coding schemes and bulky spatial light modulators to optically implement these exposure functions. Here, we introduce neural sensors as a methodology to optimize per-pixel shutter functions jointly with a differentiable image processing method, such as a neural network, in an end-to-end fashion. Moreover, we demonstrate how to leverage emerging programmable and re-configurable sensor–processors to implement the optimized exposure functions directly on the sensor. Our system takes specific limitations of the sensor into account to optimize physically feasible optical codes and we evaluate its performance for snapshot HDR and high-speed compressive imaging both in simulation and experimentally with real scenes.

Journal ArticleDOI
TL;DR: This article introduces a highly reliable and low-complexity image compression scheme using neighborhood correlation sequence (NCS) algorithm that increases the compression performance and decreases the energy utilization of the sensor nodes with high fidelity.
Abstract: Recently, the advancements in the field of wireless technologies and micro-electro-mechanical systems lead to the development of potential applications in wireless sensor networks (WSNs). The visual sensors in WSN create a significant impact on computer vision based applications such as pattern recognition and image restoration. generate a massive quantity of multimedia data. Since transmission of images consumes more computational resources, various image compression techniques have been proposed. But, most of the existing image compression techniques are not applicable for sensor nodes due to its limitations on energy, bandwidth, memory, and processing capabilities. In this article, we introduce a highly reliable and low-complexity image compression scheme using neighborhood correlation sequence (NCS) algorithm. The NCS algorithm performs the bit reduction operation and then encoded by a codec (such as PPM, Deflate, and Lempel Ziv Markov chain algorithm.) to further compress the image. The proposed NCS algorithm increases the compression performance and decreases the energy utilization of the sensor nodes with high fidelity. Moreover, it achieved a minimum end to end delay of 1074.46 ms at the average bit rate of 4.40 bpp and peak signal to noise ratio of 48.06 on the applied test images. On comparing with state-of-art methods, the proposed method maintains a better tradeoff between compression efficiency and reconstructed image quality.

Journal ArticleDOI
20 Oct 2020
TL;DR: A high-speed 3D imaging system enabled by a state-of-the-art SPAD sensor used in a hybrid imaging mode that can perform multi-event histogramming and guided upscaling of depth data from a native resolution of 64×32 to 256×128 is reported.
Abstract: Imaging systems with temporal resolution play a vital role in a diverse range of scientific, industrial, and consumer applications, e.g., fluorescent lifetime imaging in microscopy and time-of-flight (ToF) depth sensing in autonomous vehicles. In recent years, single-photon avalanche diode (SPAD) arrays with picosecond timing capabilities have emerged as a key technology driving these systems forward. Here we report a high-speed 3D imaging system enabled by a state-of-the-art SPAD sensor used in a hybrid imaging mode that can perform multi-event histogramming. The hybrid imaging modality alternates between photon counting and timing frames at rates exceeding 1000 frames per second, enabling guided upscaling of depth data from a native resolution of 64×32 to 256×128. The combination of hardware and processing allows us to demonstrate high-speed ToF 3D imaging in outdoor conditions and with low latency. The results indicate potential in a range of applications where real-time, high throughput data are necessary. One such example is improving the accuracy and speed of situational awareness in autonomous systems and robotics.

Journal ArticleDOI
TL;DR: A novel fusion framework for multimodal neurological images, which is able to capture small-scale details of input images with original structural details and is superior to several other approaches as it produces better visually fused images with improved computational measures.
Abstract: Multimodal medical image sensor fusion (MMISF) has a significant role for better visualization of the diagnostic statistics computed by integrating the vital information taken from input source images acquired using multimodal imaging sensors. The MMISF also helps the medical professionals in precise diagnosis of several critical diseases and its treatment. Often, images taken from different imaging sensors are degraded by noise interferences during acquisition or data transmission that leads to the false perception of noise as a useful feature of the image. This paper presents a novel fusion framework for multimodal neurological images, which is able to capture small-scale details of input images with original structural details. In its procedural steps, at first, source images get decomposed by the nonsubsampled shearlet transform (NSST) into a low-frequency (lf) and several high-frequency (hf) components to separate out the two basic characteristics of source image, i.e., principal information and edge details. The lf layers get fused with a sparse representation-based model, and the hf components are merged by the guided filtering-based approach. Finally, fused images are reconstructed by employing the inverse NSST. The superiority of the proposed MMISF approach is confirmed by a large extent of analytical experimentations on the different real magnetic resonance single-photon emission computed tomography, magnetic resonance-positron emission tomography and computed tomography magnetic resonance neurological image data sets. Based on all these experimental results, it is stated that the proposed MMISF approach is superior to several other approaches as it produces better visually fused images with improved computational measures.

Proceedings ArticleDOI
01 May 2020
TL;DR: This work presents an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot, capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration.
Abstract: We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then used to recover the camera extrinsics, assuming that the camera intrinsics and joint configuration of the robot manipulator are known. Unlike classic hand-eye calibration systems, our method does not require an off-line calibration step. Rather, it is capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration. We show experimental results for three different robots and camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is comparable to that of classic off-line hand-eye calibration using multiple frames. With additional frames from a static pose, accuracy improves even further. Code, datasets, and pretrained models for three widely-used robot manipulators are made available.1

Journal ArticleDOI
TL;DR: This ultrathin arrayed camera provides a novel and practical direction for diverse mobile, surveillance or medical applications and demonstrates that the multilayered pinhole of the MOE allows high-contrast imaging by eliminating the optical crosstalk between microlenses.
Abstract: Compound eyes found in insects provide intriguing sources of biological inspiration for miniaturised imaging systems. Here, we report an ultrathin arrayed camera inspired by insect eye structures for high-contrast and super-resolution imaging. The ultrathin camera features micro-optical elements (MOEs), i.e., inverted microlenses, multilayered pinhole arrays, and gap spacers on an image sensor. The MOE was fabricated by using repeated photolithography and thermal reflow. The fully packaged camera shows a total track length of 740 μm and a field-of-view (FOV) of 73°. The experimental results demonstrate that the multilayered pinhole of the MOE allows high-contrast imaging by eliminating the optical crosstalk between microlenses. The integral image reconstructed from array images clearly increases the modulation transfer function (MTF) by ~1.57 times compared to that of a single channel image in the ultrathin camera. This ultrathin arrayed camera provides a novel and practical direction for diverse mobile, surveillance or medical applications.

Journal ArticleDOI
TL;DR: This paper proposes a controller that makes use of the image information from an un-calibrated perspective camera mounted at the follower robot without relative position measurement or any communication among the robots.
Abstract: Generally, vision-based controls use various camera sensors and require camera calibration, while the control performance would degrade due to inaccuracy calibration. Therefore, in this paper, the proposed controller only makes use of the image information from an un-calibrated perspective camera mounted at the follower robot without relative position measurement or any communication among the robots. First, the nominal visual formation kinematic model is developed using the camera models. Then it is redescribed as a quadratic programming (QP) with the specified constraints. A neurodynamic optimization based on primal-dual neural network is utilized to ensure the QP being converged to the exact optimal values. Through two-time-scale neuro-dynamical optimization, the gain scheduling of the ancillary state feedback can be realized so that the state variables are constrained within an invariant designed tube. The experiment results provide the verification for the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: The underlying technology, hardware, and algorithms of the SR300, as well as its calibration procedure, are described, and some use cases are outlined, which will provide a full case study of a mass-produced depth sensing product and technology.
Abstract: Intel® RealSense™ SR300 is a depth camera capable of providing a VGA-size depth map at 60 fps and 0.125mm depth resolution. In addition, it outputs an infrared VGA-resolution image and a 1080p color texture image at 30 fps. SR300 form-factor enables it to be integrated into small consumer products and as a front facing camera in laptops and Ultrabooks™. The SR300 depth camera is based on a coded-light technology where triangulation between projected patterns and images captured by a dedicated sensor is used to produce the depth map. Each projected line is coded by a special temporal optical code, that enables a dense depth map reconstruction from its reflection. The solid mechanical assembly of the camera allows it to stay calibrated throughout temperature and pressure changes, drops, and hits. In addition, active dynamic control maintains a calibrated depth output. An extended API LibRS released with the camera allows developers to integrate the camera in various applications. Algorithms for 3D scanning, facial analysis, hand gesture recognition, and tracking are within reach for applications using the SR300. In this paper, we describe the underlying technology, hardware, and algorithms of the SR300, as well as its calibration procedure, and outline some use cases. We believe that this paper will provide a full case study of a mass-produced depth sensing product and technology.

PatentDOI
TL;DR: This study suggests that the designing principle of WISH, which combines optical modulators and computational algorithms to sense high-resolution optical fields, enables improved capabilities in many existing applications while revealing entirely new, hitherto unexplored application areas.
Abstract: A system for a wavefront imaging sensor with high resolution (WISH) comprises a spatial light modulator (SLM), a plurality of image sensors and a processor. The system further includes the SLM and a computational post-processing algorithm for recovering an incident wavefront with a high spatial resolution and a fine phase estimation. In addition, the image sensors work both in a visible electromagnetic (EM) spectrum and outside the visible EM spectrum.

Journal ArticleDOI
TL;DR: An on-chip, widefield fluorescence microscope is presented, which consists of a diffuser placed a few millimeters away from a traditional image sensor, enabling refocusability in post-processing and three-dimensional imaging of sparse samples from a single acquisition.
Abstract: We present an on-chip, widefield fluorescence microscope, which consists of a diffuser placed a few millimeters away from a traditional image sensor. The diffuser replaces the optics of a microscope, resulting in a compact and easy-to-assemble system with a practical working distance of over 1.5 mm. Furthermore, the diffuser encodes volumetric information, enabling refocusability in post-processing and three-dimensional (3D) imaging of sparse samples from a single acquisition. Reconstruction of images from the raw data requires a precise model of the system, so we introduce a practical calibration scheme and a physics-based forward model to efficiently account for the spatially-varying point spread function (PSF). To improve performance in low-light, we propose a random microlens diffuser, which consists of many small lenslets randomly placed on the mask surface and yields PSFs that are robust to noise. We build an experimental prototype and demonstrate our system on both planar and 3D samples.

Journal ArticleDOI
12 Sep 2020-Sensors
TL;DR: A Montecarlo simulator developed in Matlab® for the analysis of a Single Photon Avalanche Diode (SPAD)-based Complementary Metal-Oxide Semiconductor (CMOS) flash Light Detection and Ranging (LIDAR) system is presented.
Abstract: We present a Montecarlo simulator developed in Matlab® for the analysis of a Single Photon Avalanche Diode (SPAD)-based Complementary Metal-Oxide Semiconductor (CMOS) flash Light Detection and Ranging (LIDAR) system. The simulation environment has been developed to accurately model the components of a flash LIDAR system, such as illumination source, optics, and the architecture of the designated SPAD-based CMOS image sensor. Together with the modeling of the background noise and target topology, all of the fundamental factors that are involved in a typical LIDAR acquisition system have been included in order to predict the achievable system performance and verified with an existing sensor.

Journal ArticleDOI
TL;DR: A method to eliminate background activity, and a method and performance index for evaluating filter performance: noise in real (NIR) and real in noise (RIN).
Abstract: Dynamic vision sensor (DVS) is a new type of image sensor, which has application prospects in the fields of automobiles and robots. Dynamic vision sensors are very different from traditional image sensors in terms of pixel principle and output data. Background activity (BA) in the data will affect image quality, but there is currently no unified indicator to evaluate the image quality of event streams. This paper proposes a method to eliminate background activity, and proposes a method and performance index for evaluating filter performance: noise in real (NIR) and real in noise (RIN). The lower the value, the better the filter. This evaluation method does not require fixed pattern generation equipment, and can also evaluate filter performance using natural images. Through comparative experiments of the three filters, the comprehensive performance of the method in this paper is optimal. This method reduces the bandwidth required for DVS data transmission, reduces the computational cost of target extraction, and provides the possibility for the application of DVS in more fields.

Journal ArticleDOI
TL;DR: This work investigates a simple, low-cost, and compact optical coding camera design that supports high-resolution image reconstructions from raw measurements with low pixel counts, and uses an end-to-end framework to simultaneously optimize the optical design and a reconstruction network for obtaining super-resolved images from raw measures.
Abstract: Single Photon Avalanche Photodiodes (SPADs) have recently received a lot of attention in imaging and vision applications due to their excellent performance in low-light conditions, as well as their ultra-high temporal resolution. Unfortunately, like many evolving sensor technologies, image sensors built around SPAD technology currently suffer from a low pixel count. In this work, we investigate a simple, low-cost, and compact optical coding camera design that supports high-resolution image reconstructions from raw measurements with low pixel counts. We demonstrate this approach for regular intensity imaging, depth imaging, as well transient imaging. Our method uses an end-to-end framework to simultaneously optimize the optical design and a reconstruction network for obtaining super-resolved images from raw measurements. The optical design space is that of an engineered point spread function (implemented with diffractive optics), which can be considered an optimized anti-aliasing filter to preserve as much high-resolution information as possible despite imaging with a low pixel count, low fill-factor SPAD array. We further investigate a deep network for reconstruction. The effectiveness of this joint design and reconstruction approach is demonstrated for a range of different applications, including high-speed imaging, and time of flight depth imaging, as well as transient imaging. While our work specifically focuses on low-resolution SPAD sensors, similar approaches should prove effective for other emerging image sensor technologies with low pixel counts and low fill-factors.

Journal ArticleDOI
Leonard C. Kogos1, Yunzhe Li1, Jianing Liu1, Yuyu Li1, Lei Tian1, Roberto Paiella1 
TL;DR: A flat, plasmonic image sensor array is developed that enables high-quality wide-angle vision without lenses or curvature and is described as a lensless planar architecture.
Abstract: The vision system of arthropods such as insects and crustaceans is based on the compound-eye architecture, consisting of a dense array of individual imaging elements (ommatidia) pointing along different directions. This arrangement is particularly attractive for imaging applications requiring extreme size miniaturization, wide-angle fields of view, and high sensitivity to motion. However, the implementation of cameras directly mimicking the eyes of common arthropods is complicated by their curved geometry. Here, we describe a lensless planar architecture, where each pixel of a standard image-sensor array is coated with an ensemble of metallic plasmonic nanostructures that only transmits light incident along a small geometrically-tunable distribution of angles. A set of near-infrared devices providing directional photodetection peaked at different angles is designed, fabricated, and tested. Computational imaging techniques are then employed to demonstrate the ability of these devices to reconstruct high-quality images of relatively complex objects. The compound eyes of arthropods provide a visual advantage by seeing a wide range of angles all at once, but cameras that mimic them are usually curved and bulky. Here, the authors develop a flat, plasmonic image sensor array that enables high-quality wide-angle vision without lenses or curvature.