scispace - formally typeset
Search or ask a question

Showing papers on "Zoom published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, the authors use real sensor data to train a deep network with a novel contextual bilateral loss that is robust to mild misalignment between input and outputs images, achieving state-of-the-art performance in 4X and 8X computational zoom.
Abstract: This paper shows that when applying machine learning to digital zoom, it is beneficial to operate on real, RAW sensor data. Existing learning-based super-resolution methods do not use real sensor data, instead operating on processed RGB images. We show that these approaches forfeit detail and accuracy that can be gained by operating on raw data, particularly when zooming in on distant objects. The key barrier to using real sensor data for training is that ground-truth high-resolution imagery is missing. We show how to obtain such ground-truth data via optical zoom and contribute a dataset, SR-RAW, for real-world computational zoom. We use SR-RAW to train a deep network with a novel contextual bilateral loss that is robust to mild misalignment between input and outputs images. The trained network achieves state-of-the-art performance in 4X and 8X computational zoom. We also show that synthesizing sensor data by resampling high-resolution RGB images is an oversimplified approximation of real sensor data and noise, resulting in worse image quality.

219 citations


Journal ArticleDOI
TL;DR: In this paper, a semantic-aware neural network is used to estimate the scene depth from a single image, which is then combined with a segmentation-based depth adjustment process to synthesize the 3D Ken Burns effect.
Abstract: The Ken Burns effect allows animating still images with a virtual camera scan and zoom. Adding parallax, which results in the 3D Ken Burns effect, enables significantly more compelling results. Creating such effects manually is time-consuming and demands sophisticated editing skills. Existing automatic methods, however, require multiple input images from varying viewpoints. In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks. To address the limitations of existing depth estimation methods such as geometric distortions, semantic distortions, and inaccurate depth boundaries, we develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud. Experiments with a wide variety of image content show that our method enables realistic synthesis results. Our study demonstrates that our system allows users to achieve better results while requiring little effort compared to existing solutions for the 3D Ken Burns effect creation.

186 citations


Proceedings ArticleDOI
23 Mar 2019
TL;DR: This work focuses on improving digital map navigation in AR with mid-air hand gestures, using a horizontal intangible map display and introduces input-mapping transitions that reduce perceived arm fatigue with limited impact on performance.
Abstract: Freehand gesture interaction has long been proposed as a ‘natural’ input method for Augmented Reality (AR) applications, yet has been little explored for intensive applications like multiscale navigation. In multiscale navigation, such as digital map navigation, pan and zoom are the predominant interactions. A position-based input mapping (e.g. grabbing metaphor) is intuitive for such interactions, but is prone to arm fatigue. This work focuses on improving digital map navigation in AR with mid-air hand gestures, using a horizontal intangible map display. First, we conducted a user study to explore the effects of handedness (unimanual and bimanual) and input mapping (position-based and rate-based). From these findings we designed DiveZoom and TerraceZoom, two novel hybrid techniques that smoothly transition between position- and rate-based mappings. A second user study evaluated these designs. Our results indicate that the introduced input-mapping transitions can reduce perceived arm fatigue with limited impact on performance.

47 citations


Journal ArticleDOI
29 Mar 2019
TL;DR: The proposed system Ipanel employs the acoustic signals generated by sliding of fingers on the table for tracking, and is able to support not only commonly used gesture recognition, but also handwriting recognition at high accuracies.
Abstract: This paper explores the possibility of extending the input and interactions beyond the small screen of the mobile device onto ad hoc adjacent surfaces, e.g., a wooden tabletop with acoustic signals. While the existing finger tracking approaches employ the active acoustic signal with a fixed frequency, our proposed system Ipanel employs the acoustic signals generated by sliding of fingers on the table for tracking. Different from active signal tracking, the frequency of the finger-table generated acoustic signals keeps changing, making accurate tracking much more challenging than the traditional approaches with fix frequency signal from the speaker. Unique features are extracted by exploiting the spatio-temporal and frequency domain properties of the generated acoustic signals. The features are transformed into images and then we employ the convolutional neural network (CNN) to recognize the finger movement on the table. Ipanel is able to support not only commonly used gesture (click, flip, scroll, zoom, etc.) recognition, but also handwriting (10 numbers and 26 alphabets) recognition at high accuracies. We implement Ipanel on smartphones, and conduct extensive real environment experiments to evaluate its performance. The results validate the robustness of Ipanel, and show that it maintains high accuracies across different users with varying input behaviours (e.g., input strength, speed and region). Further, Ipanel's performance is robust against different levels of ambient noise and varying surface materials.

42 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: The results show that spatial device gestures can outperform both touch-based techniques and hand gestures in terms of task completion times and user preference.
Abstract: In this paper, we investigate mobile devices as interactive controllers to support the exploration of 3D data spaces in head-mounted Augmented Reality (AR). In future mobile contexts, applications such as immersive analysis or ubiquitous information retrieval will involve large 3D data sets, which must be visualized in limited physical space. This necessitates efficient interaction techniques for 3D panning and zooming. Smartphones as additional input devices are promising because they are familiar and widely available in mobile usage contexts. They also allow more casual and discreet interaction compared to free-hand gestures or voice input. We introduce smartphone-based pan & zoom techniques for 3D data spaces and present a user study comparing five techniques. Our results show that spatial device gestures can outperform both touch-based techniques and hand gestures in terms of task completion times and user preference. We discuss our findings in detail and suggest suitable techniques for specific AR navigation tasks.

40 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work proposes a light-weight video frame interpolation algorithm that allows information to be learned from the high-resolution version of similar objects, with an instance-level supervision that corrects details of object shape and boundaries.
Abstract: We propose a light-weight video frame interpolation algorithm. Our key innovation is an instance-level supervision that allows information to be learned from the high-resolution version of similar objects. Our experiment shows that the proposed method can generate state-of-the-art results across different datasets, with fractional computation resources (time and memory) of competing methods. Given two image frames, a cascade network creates an intermediate frame with 1) a flow-warping module that computes coarse bi-directional optical flow and creates an interpolated image via flow-based warping, followed by 2) an image synthesis module to make fine-scale corrections. In the learning stage, object detection proposals are generated on the interpolated image. Lower resolution objects are zoomed into, and the learning algorithms using an adversarial loss trained on high-resolution objects to guide the system towards the instance-level refinement corrects details of object shape and boundaries.

37 citations


Journal ArticleDOI
TL;DR: The result was a multiplatform 3D visualization capable of displaying 3D models in LOD3, as well as providing user interfaces for exploring the scene using “on the ground” and “from the air” types of first person view interactions.
Abstract: Developers have long used game engines for visualizing virtual worlds for players to explore. However, using real-world data in a game engine is always a challenging task, since most game engines have very little support for geospatial data. This paper presents our findings from exploring the Unity3D game engine for visualizing large-scale topographic data from mixed sources of terrestrial laser scanner models and topographic map data. Level of detail (LOD) 3 3D models of two buildings of the Universitas Gadjah Mada campus were obtained using a terrestrial laser scanner converted into the FBX format. Mapbox for Unity was used to provide georeferencing support for the 3D model. Unity3D also used road and place name layers via Mapbox for Unity based on OpenStreetMap (OSM) data. LOD1 buildings were modeled from topographic map data using Mapbox, and 3D models from the terrestrial laser scanner replaced two of these buildings. Building information and attributes, as well as visual appearances, were added to 3D features. The Unity3D game engine provides a rich set of libraries and assets for user interactions, and custom C# scripts were used to provide a bird’s-eye-view mode of 3D zoom, pan, and orbital display. In addition to basic 3D navigation tools, a first-person view of the scene was utilized to enable users to gain a walk-through experience while virtually inspecting the objects on the ground. For a fly-through experience, a drone view was offered to help users inspect objects from the air. The result was a multiplatform 3D visualization capable of displaying 3D models in LOD3, as well as providing user interfaces for exploring the scene using “on the ground” and “from the air” types of first person view interactions. Using the Unity3D game engine to visualize mixed sources of topographic data creates many opportunities to optimize large-scale topographic data use.

34 citations


Posted Content
TL;DR: This work proposes a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask, and addresses two challenging issues in video object segmentation: segmentation of small objects and occlusion of objects across time.
Abstract: In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can be difficult to compute. To this end, we propose a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask. This conditioning is performed through a novel routing algorithm for attention-based efficient capsule selection. We address two challenging issues in video object segmentation: 1) segmentation of small objects and 2) occlusion of objects across time. The issue of segmenting small objects is addressed with a zooming module which allows the network to process small spatial regions of the video. Apart from this, the framework utilizes a novel memory module based on recurrent networks which helps in tracking objects when they move out of frame or are occluded. The network is trained end-to-end and we demonstrate its effectiveness on two benchmark video object segmentation datasets; it outperforms current offline approaches on the Youtube-VOS dataset while having a run-time that is almost twice as fast as competing methods. The code is publicly available at this https URL.

29 citations


Journal ArticleDOI
TL;DR: An AR system with optical zoom function as well as a function of image registration via two LC lenses is demonstrated in order to help people see better by magnifying the virtual image and adjusting the location of virtual image.
Abstract: An optical-see-through augmented reality (AR) system assists our daily work by augmenting our sense with computer-generated information. Two of optical challenges of AR are image registration and vision correction due to fixed optical properties of the optical elements of AR systems. In this paper, we demonstrated an AR system with optical zoom function as well as a function of image registration via two LC lenses in order to help people see better by magnifying the virtual image and adjusting the location of virtual image. The operating principles are introduced, and experiments are performed. The concept demonstrated in this paper could be further extended to other electro-optical devices as long as the devices exhibit the capability of phase modulations.

23 citations


Journal ArticleDOI
TL;DR: Kyrix is an integrated system that provides the developer with a concise and expressive declarative language along with a backend support for performance optimization of large‐scale data and is expressive and flexible in that it can support the developer in creating a wide range of customized visualizations across different application domains and data types.
Abstract: Pan and zoom are basic yet powerful interaction techniques for exploring large datasets. However, existing zoomable UI toolkits such as Pad++ and ZVTM do not provide the backend database support and data‐driven primitives that are necessary for creating large‐scale visualizations. This limitation in existing general‐purpose toolkits has led to many purpose‐built solutions (e.g. Google Maps and ForeCache) that address the issue of scalability but cannot be easily extended to support visualizations beyond their intended data types and usage scenarios. In this paper, we introduce Kyrix to ease the process of creating general and large‐scale web‐based pan/zoom visualizations. Kyrix is an integrated system that provides the developer with a concise and expressive declarative language along with a backend support for performance optimization of large‐scale data. To evaluate the scalability of Kyrix, we conducted a set of benchmarked experiments and show that Kyrix can support high interactivity (with an average latency of 100 ms or below) on pan/zoom visualizations of 100 million data points. We further demonstrate the accessibility of Kyrix through an observational study with 8 developers. Results indicate that developers can quickly learn Kyrix's underlying declarative model to create scalable pan/zoom visualizations. Finally, we provide a gallery of visualizations and show that Kyrix is expressive and flexible in that it can support the developer in creating a wide range of customized visualizations across different application domains and data types.

22 citations


Proceedings ArticleDOI
02 May 2019
TL;DR: It is found that visual context should be limited for low-distance navigation, but added for far- distance navigation; that timelines should be oriented along the longer axis, especially on mobile; and that, as compared to default techniques, double click, hold, and rub zoom appear to scale worse with task difficulty, whereas brush and especially ortho zoom seem to scale better.
Abstract: Pan and zoom timelines and sliders help us navigate large time series data. However, designing efficient interactions can be difficult. We study pan and zoom methods via crowd-sourced experiments on mobile and computer devices, asking which designs and interactions provide faster target acquisition. We find that visual context should be limited for low-distance navigation, but added for far-distance navigation; that timelines should be oriented along the longer axis, especially on mobile; and that, as compared to default techniques, double click, hold, and rub zoom appear to scale worse with task difficulty, whereas brush and especially ortho zoom seem to scale better. Software and data used in this research are available as open source.

Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work describes audiovisual zooming as a generalized eigenvalue problem and proposes an algorithm for efficient computation on mobile platforms, whereby an auditory FOV is formed to match the visual.
Abstract: When capturing videos on a mobile platform, often the target of interest is contaminated by the surrounding environment. To alleviate the visual irrelevance, camera panning and zooming provide the means to isolate a desired field of view (FOV). However, the captured audio is still contaminated by signals outside the FOV. This effect is unnatural---for human perception, visual and auditory cues must go hand-in-hand. We present the concept ofAudiovisual Zooming, whereby an auditory FOV is formed to match the visual. Our framework is built around the classic idea of beamforming, a computational approach to enhancing sound from a single direction using a microphone array. Yet, beamforming on its own can not incorporate the auditory FOV, as the FOV may include an arbitrary number of directional sources. We formulate our audiovisual zooming as a generalized eigenvalue problem and propose an algorithm for efficient computation on mobile platforms. To inform the algorithmic and physical implementation, we offer a theoretical analysis of our algorithmic components as well as numerical studies for understanding various design choices of microphone arrays. Finally, we demonstrate audiovisual zooming on two different mobile platforms: a mobile smartphone and a 360$^\circ $ spherical imaging system for video conference settings.

Journal ArticleDOI
TL;DR: An efficient and robust approach to retrieve an optimal first-order design of a double-sided telecentric zoom lens based on the particle swarm optimization (PSO) algorithm, which shows the great potential of the proposed method in retrieving proper initial designs of complex optical systems.
Abstract: In this paper, we propose an efficient and robust approach to retrieve an optimal first-order design of a double-sided telecentric zoom lens based on the particle swarm optimization (PSO) algorithm. In this method, the design problem is transformed to realize a zoom system with fixed positions of both the front focal point and the rear focal point during zooming. Equations are derived for the paraxial design of the basic parameters of a three-component zoom lens in the framework of geometrical optics. We implement the PSO algorithm in MATLAB to design some test cases to verify the feasibility. As the computational work is completed by the optimization algorithm, instead of the traditional trial-and-error method, our proposed method is efficient and low-threshold. By a simulation result, it is verified that the described method is stable and necessary in finding a proper initial configuration of a zoom lens with two fixed foci as well as a required zoom ratio. Furthermore, a compact initial design of a three-component 2X zoom system with two fixed foci is proposed. Based on the initial design data, a double-sided telecentric zoom system is developed. The result shows the great potential of our proposed method in retrieving proper initial designs of complex optical systems.

Journal ArticleDOI
TL;DR: A system, used to support maintenance procedures through AR, which tries to address the validation problem and can effectively help the user in detecting and avoiding errors during the maintenance process.
Abstract: Maintenance has been one of the most important domains for augmented reality (AR) since its inception. AR applications enable technicians to receive visual and audio computer-generated aids while performing different activities, such as assembling, repairing, or maintenance procedures. These procedures are usually organized as a sequence of steps, each one involving an elementary action to be performed by the user. However, since it is not possible to automatically validate the users actions, they might incorrectly execute or miss some steps. Thus, a relevant open problem is to provide users with some sort of automated verification tool. This paper presents a system, used to support maintenance procedures through AR, which tries to address the validation problem. The novel technology consists of a computer vision algorithm able to evaluate, at each step of a maintenance procedure, if the user correctly completed the assigned task or not. The validation occurs by comparing an image of the final status of the machinery, after the user has performed the task, and a virtual 3D representation of the expected final status. Moreover, in order to avoid false positives, the system can identify both motions in the scene and changes in the camera’s zoom and/or position, thus enhancing the robustness of the validation phase. Tests demonstrate that the proposed system can effectively help the user in detecting and avoiding errors during the maintenance process.

Proceedings Article
01 Jan 2019
TL;DR: Kyrix is presented, an end-to-end system for developing scalable details-on-demand data exploration applications that provides developers with a declarative model for easy specification of general visualizations and a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems.
Abstract: Scalable interactive visual data exploration is crucial in many domains due to increasingly large datasets generated at rapid rates. Details-on-demand provides a useful interaction paradigm for exploring large datasets, where users start at an overview, find regions of interest, zoom in to see detailed views, zoom out and then repeat. This paradigm is the primary user interaction mode of widely-used systems such as Google Maps, Aperture Tiles and ForeCache. These earlier systems, however, are highly customized with hardcoded visual representations and optimizations. A more general framework is needed to facilitate the development of visual data exploration systems at scale. In this paper, we present Kyrix, an end-to-end system for developing scalable details-on-demand data exploration applications. Kyrix provides developers with a declarative model for easy specification of general visualizations. Behind the scenes, Kyrix utilizes a suite of performance optimization techniques to achieve a response time within 500ms for various user interactions. We also report results from a performance study which shows that a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems.

Journal ArticleDOI
TL;DR: In this article, a pair of conjugate metasurfaces were used to generate a focused accelerating beam for chromatic focal shift control and a wide tunable focal length range of 4.8 mm (a 667-diopter change).
Abstract: Two key metrics for imaging systems are their magnification and optical bandwidth. While high-quality imaging systems today achieve bandwidths spanning the whole visible spectrum and large changes in magnification via optical zoom, these often entail lens assemblies with bulky elements unfit for size-constrained applications. Metalenses present a methodology for miniaturization but their strong chromatic aberrations and the lack of a varifocal achromatic element limit their utility. While exemplary broadband achromatic metalenses are realizable via dispersion engineering, in practice, these designs are limited to small physical apertures as large area lenses would require phase compensating scatterers with aspect ratios infeasible for fabrication. Many applications, however, necessitate larger areas to collect more photons for better signal-to-noise ratio and furthermore must also operate with unpolarized light. In this paper, we simultaneously achieve achromatic operation at visible wavelengths and varifocal control using a polarization-insensitive, hybrid optical-digital system with area unconstrained by dispersion-engineered scatterers. We derive phase equations for a pair of conjugate metasurfaces that generate a focused accelerating beam for chromatic focal shift control and a wide tunable focal length range of 4.8 mm (a 667-diopter change). Utilizing this conjugate pair, we realize a near spectrally invariant point spread function across the visible regime. We then combine the metasurfaces with a post-capture deconvolution algorithm to image full-color patterns under incoherent white light, demonstrating an achromatic 5x zoom range. Simultaneously achromatic and varifocal metalenses could have applications in various fields including augmented reality, implantable microscopes, and machine vision sensors.

Posted Content
TL;DR: In this paper, the authors proposed sZoom, a framework to automatically zoom into a high-resolution surveillance video, which selectively zooms into the sensitive regions of the video to present details of the scene, while still preserving the overall context required for situation assessment.
Abstract: Current cameras are capable of recording high resolution video. While viewing on a mobile device, a user can manually zoom into this high resolution video to get more detailed view of objects and activities. However, manual zooming is not suitable for surveillance and monitoring. It is tiring to continuously keep zooming into various regions of the video. Also, while viewing one region, the operator may miss activities in other regions. In this paper, we propose sZoom, a framework to automatically zoom into a high resolution surveillance video. The proposed framework selectively zooms into the sensitive regions of the video to present details of the scene, while still preserving the overall context required for situation assessment. A multi-variate Gaussian penalty is introduced to ensure full coverage of the scene. The method achieves near real-time performance through a number of timing optimizations. An extensive user study shows that, while watching a full HD video on a mobile device, the system enhances the security operator's efficiency in understanding the details of the scene by 99% on the average compared to a scaled version of the original high resolution video. The produced video achieved 46% higher ratings for usefulness in a surveillance task.

Journal ArticleDOI
TL;DR: A hybrid driving variable-focus optofluidic lens that has one water-oil interface shifted by an applied voltage and one tunable Polydimethylsiloxane lens deformed by pumping liquid in or out of the cavity is proposed.
Abstract: Conventional optofluidic lens usually has only one interface, which means that the zoom range is small, and the ability to correct aberrations is poor. In this paper, we propose a hybrid driving variable-focus optofluidic lens. It has one water-oil interface shifted by an applied voltage and one tunable Polydimethylsiloxane (PDMS) lens deformed by pumping liquid in or out of the cavity. The proposed lens combines the advantages of electrowetting lens and mechanical lens. Therefore, it can provide a large focal length tuning range with good image quality. The shortest positive and negative focal length are ∼6.02 mm and ∼−11.15 mm, respectively. The maximum resolution of our liquid lens can be reached 18 lp/mm. We also designed and fabricated a zoom system using the hybrid driving variable-focus optofluidic lens. In the experiment, the zoom range of the system is 14 mm∼30 mm and the zoom ratio is ∼2.14× without any mechanical moving parts. Its applications for zoom telescope system and zoom microscope and so on are foreseeable.

Journal ArticleDOI
TL;DR: This paper defines the boundaries of the reachable curvatures for a full range of monolithic sensors, and discusses how the curved focal plane shape is related to the imaged scenes and optical parameters.
Abstract: Curved sensors are a suitable technological solution to enhance the vast majority of optical systems. In this work, we show the entire process to create curved sensor-based optical systems and the possibilities they offer. This paper defines the boundaries of the reachable curvatures for a full range of monolithic sensors. We discuss how the curved focal plane shape is related to the imaged scenes and optical parameters. Two camera prototypes are designed, realized and tested, demonstrating a new compact optical architecture for a 40 degree compact objective, as well as a wide field fisheye zoom objective using a convex sensor to image a 180 degree field of view.

Journal ArticleDOI
TL;DR: A fast and space efficient MRF DGS algorithm designed based on the properties of the parameter matching objective function characterized with full dictionary simulations, MRF ZOOM, which can dramatically save MRf DGS time without sacrificing matching accuracy.
Abstract: Objective: Magnetic resonance fingerprinting (MRF) is a new technique for simultaneously quantifying multiple MR parameters using one temporally resolved MR scan. In MRF, MR signal is manipulated to have distinct temporal behavior with regard to different combinations of the underlying MR parameters and across spatial regions. The temporal behavior of acquired MR signal is then used as a key to find its unique counterpart in a MR signal dictionary. The dictionary generation and searching (DGS) process represents the most important part of MRF, which however can be intractable because of the disk space requirement and the computational demand exponentially increases with the number of MR parameters, spatial coverage, and spatial resolution. The goal of this paper was to develop a fast and space efficient MRF DGS algorithm. Methods: The optimal DGS algorithm: MRF ZOOM was designed based on the properties of the parameter matching objective function characterized with full dictionary simulations. Both synthetic data and in-vivo data were used to validate the method. Conclusion: MRF ZOOM can dramatically save MRF DGS time without sacrificing matching accuracy. Significance: MRF ZOOM can facilitate a wide range of MRF applications.

Journal ArticleDOI
TL;DR: A systematic approach to automatically retrieve the first-order designs of three-component zoom systems with fixed spacing between focal points based on Particle Swarm Optimization (PSO) algorithm is proposed.
Abstract: In this paper, we propose a systematic approach to automatically retrieve the first-order designs of three-component zoom systems with fixed spacing between focal points based on Particle Swarm Optimization (PSO) algorithm. In this method, equations are derived for the first-order design of a three-component zoom lens system in the framework of geometrical optics to decide its basic optical parameters. To realize the design, we construct the mathematical model of the special zoom system with two fixed foci based on Gaussian reduction. In the optimization phase, we introduce a new merit function as a performance metric to optimize the first-order design, considering maximum zoom ratio, total optical length and aberration term. The optimization is performed by iteratively improving a candidate solution under the specific merit function in the multi-dimensional parametric space. The proposed method is demonstrated through several examples, which cover almost all the common application scenarios. The results show that this method is a practical and powerful tool for automatically retrieving the optimal first-order design for complex optical systems.

Journal ArticleDOI
TL;DR: A novel human-3DTV interaction system based on a set of simple free-hand gestures for direct-touch interaction with a virtual interface to facilitate human-2D and 3D television interaction is presented.
Abstract: The input method based on free-hand gestures has gradually become a hot research direction in the field of human-computer interaction. Hand gestures like sign languages, however, demand quite a lot of knowledge and practice for interaction, and air writing methods require their users to hold the arm and hand in mid-air for a period of time. These methods limit the user experience and get severer when a large number of gestures are required. To address the problem, this paper presents a novel human-3DTV interaction system based on a set of simple free-hand gestures for direct-touch interaction with a virtual interface to facilitate human-3DTV interaction. Specifically, our system projects a virtual interface in front of the user who wears the 3D shutter glass, and the user just stretches the arm and touches the virtual interface like performing on a smart phone with a touch screen, using gestures such as Click, Slide, Hold, Drag and Zoom In/Out. Our system is able to recognize the user’s gesture fast and accurately, as the system only needs to search for a small region neighboring the virtual interface for a small set of gesture types. Because we adopt the key gestures using on smart phones, our free-hand gestures can be easily used by anyone with only a brief training. The users feel more comfortable than traditional gesture input methods and can effectively interact with 3DTV using our system. We report a comprehensive user study on accuracy and speed to validate the advantages of the proposed human-3DTV interaction system.

Posted Content
TL;DR: A hierarchical graph neural network is proposed to detect abnormal lesions from medical images by automatically zooming into ROIs and achieves comparable AUC with state-of-the-art methods on mammogram analysis for breast cancer diagnosis.
Abstract: In clinical practice, human radiologists actually review medical images with high resolution monitors and zoom into region of interests (ROIs) for a close-up examination. Inspired by this observation, we propose a hierarchical graph neural network to detect abnormal lesions from medical images by automatically zooming into ROIs. We focus on mammogram analysis for breast cancer diagnosis for this study. Our proposed network consist of two graph attention networks performing two tasks: (1) node classification to predict whether to zoom into next level; (2) graph classification to classify whether a mammogram is normal/benign or malignant. The model is trained and evaluated on INbreast dataset and we obtain comparable AUC with state-of-the-art methods.

Journal ArticleDOI
Hou Changlun1, Yize Ren1, Yufan Tan1, Xin Qing1, Zang Yue1 
TL;DR: A 3X optical zoom system with two pairs of Alvarez lenses was developed and exhibits a promising future for space-constrained application such as mobile phone camera module, wearable imaging system and endoscopic system.
Abstract: In this paper, we proposed an ultra slim optical zoom system with Alvarez freeform lenses. A 3X optical zoom system with two pairs of Alvarez lenses was developed. Alvarez lenses are fabricated by precise injection molding and the movable elements are actuated by voice coil motors. A slim camera module was fabricated with a size of 25 mm(width)×25 mm(length)×6 mm(height). The zooming and imaging capabilities of this Alvarez zoom system are demonstrated experimentally. The prototype exhibits a promising future for space-constrained application such as mobile phone camera module, wearable imaging system and endoscopic system.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A novel dual-camera system that can simultaneously capture zooming-in targets by combining an ultrafast pan-tilt camera and a fixed wide-view camera that can generate five hundred virtual cameras with different zooming views in a second is developed.
Abstract: In this paper, we develop a novel dual-camera system that can simultaneously capture zooming-in targets by combining an ultrafast pan-tilt camera and a fixed wide-view camera. According to the positions of all the targets recognized with deep learning in the wide-view camera, the pan and tilt angles of multiple pan-tilt cameras are controlled virtually on the ultra-fast pan-tilt camera through multi-thread viewpoint control to simultaneously capture the zooming-in images of all the targets. Our system can generate five hundred virtual cameras with different zooming views in a second, and the effectiveness of our system is demonstrated by showing experimental results in simultaneous zoom shooting for multiple running persons and cars in the range of 70 m or more in a natural outdoor scene.

Book ChapterDOI
25 Apr 2019
TL;DR: This work performs a step-wise deformation of a bounding box with the goal of tightly framing the object, and proposes different reward functions to lead to a better guidance of the agent while following its search trajectories.
Abstract: We present a reinforcement learning approach for detecting objects within an image. Our approach performs a step-wise deformation of a bounding box with the goal of tightly framing the object. It uses a hierarchical tree-like representation of predefined region candidates, which the agent can zoom in on. This reduces the number of region candidates that must be evaluated so that the agent can afford to compute new feature maps before each step to enhance detection quality. We compare an approach that is based purely on zoom actions with one that is extended by a second refinement stage to fine-tune the bounding box after each zoom step. We also improve the fitting ability by allowing for different aspect ratios of the bounding box. Finally, we propose different reward functions to lead to a better guidance of the agent while following its search trajectories. Experiments indicate that each of these extensions leads to more correct detections. The best performing approach comprises a zoom stage and a refinement stage, uses aspect-ratio modifying actions and is trained using a combination of three different reward metrics.

Posted Content
TL;DR: Kyrix as discussed by the authors is an end-to-end system for developing scalable details-on-demand data exploration applications, which provides developers with a declarative model for easy specification of general visualizations.
Abstract: Scalable interactive visual data exploration is crucial in many domains due to increasingly large datasets generated at rapid rates. Details-on-demand provides a useful interaction paradigm for exploring large datasets, where users start at an overview, find regions of interest, zoom in to see detailed views, zoom out and then repeat. This paradigm is the primary user interaction mode of widely-used systems such as Google Maps, Aperture Tiles and ForeCache. These earlier systems, however, are highly customized with hardcoded visual representations and optimizations. A more general framework is needed to facilitate the development of visual data exploration systems at scale. In this paper, we present Kyrix, an end-to-end system for developing scalable details-on-demand data exploration applications. Kyrix provides developers with a declarative model for easy specification of general visualizations. Behind the scenes, Kyrix utilizes a suite of performance optimization techniques to achieve a response time within 500ms for various user interactions. We also report results from a performance study which shows that a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems.

Journal ArticleDOI
TL;DR: The combination of classical zoom lens concepts with tuneable lenses offers the possibility to reach smaller system lengths for macroscopic zoom lenses while requiring only a small focal tuning range of the tuneable lens.
Abstract: Classical zoom lenses are based on movements of sub-modules along the optical axis. Generally, a constant image plane position requires at least one nonlinear sub-module movement. This nonlinearity poses a challenge for the mechanical implementation. Tuneable lenses can change their focal length without moving along the optical axis. This offers the possibility of small system lengths. Since the focal range of tuneable lenses with significant aperture diameters is still limited, the use of tuneable optics in zoom lenses is usually restricted to miniaturized applications. To solve the challenge of the nonlinear movement in classical zoom lenses and the limitations of tuneable lenses for macroscopic applications we propose a combination of both concepts. The resulting ‘Hybrid Zoom Lens’ involves linear movements of sub modules as well as changing the focal length of a tuneable lens. The movements of the sub-modules and the focal length tuning of the lens are already determined by the collinear layout of the zoom lens. Therefore, we focus on collinear considerations and develop a method that allows a targeted choice of specific collinear layouts for our ‘Hybrid zoom lenses’. Based on examples and an experimental setup we demonstrate the feasibility of our approach. We apply the proposed method to examples of classical zoom lenses and zoom lenses based exclusively on tuneable lenses. Thereby we are able to show possible advantages of our ‘Hybrid zoom lenses’ over these widespread system types. We demonstrate important collinear considerations for the integration of tuneable lenses into a zoom lens. We show that the combination of classical zoom lens concepts with tuneable lenses offers the possibility to reach smaller system lengths for macroscopic zoom lenses while requiring only a small focal tuning range of the tuneable lens.

Journal ArticleDOI
TL;DR: A background on mobile video analysis tools is provided, along with strategies to help physical educators discover ways to effectively implement this engaging technology into their curriculum.
Abstract: Physical educators are discovering the benefits of using video analysis to support their instruction and assessment. Slow-motion playback, zoom, and voice-over narration are just some of the featur...

Proceedings ArticleDOI
10 Nov 2019
TL;DR: This work provides a visual analytics tool, VA4JVM, for error traces produced by either the Java Virtual Machine, or by Java Pathfinder, and shows in examples how filtering and zooming in can highlight a problem without having to read lengthy textual data.
Abstract: Analyzing executions of concurrent software is very difficult. Even if a trace is available, such traces are very hard to read and interpret. A textual trace contains a lot of data, most of which is not relevant to the issue at hand. Past visualization attempts either do not show concurrent behavior, or result in a view that is overwhelming for the user. We provide a visual analytics tool, VA4JVM, for error traces produced by either the Java Virtual Machine, or by Java Pathfinder. Its key features are a layout that spatially associates events with threads, a zoom function, and the ability to filter event data in various ways. We show in examples how filtering and zooming in can highlight a problem without having to read lengthy textual data.