scispace - formally typeset
Search or ask a question

Showing papers by "Shi-Min Hu published in 2018"


Book ChapterDOI
08 Sep 2018
TL;DR: This paper uses an instance-level salient object detector to automatically generate salient instances (candidate objects) for training images, and proposes a graph-partitioning-based clustering algorithm that outperforms state-of-the-art weakly supervised alternatives by a large margin.
Abstract: Effectively bridging between image level keyword annotations and corresponding image pixels is one of the main challenges in weakly supervised semantic segmentation. In this paper, we use an instance-level salient object detector to automatically generate salient instances (candidate objects) for training images. Using similarity features extracted from each salient instance in the whole training set, we build a similarity graph, then use a graph partitioning algorithm to separate it into multiple subgraphs, each of which is associated with a single keyword (tag). Our graph-partitioning-based clustering algorithm allows us to consider the relationships between all salient instances in the training set as well as the information within them. We further show that with the help of attention information, our clustering algorithm is able to correct certain wrong assignments, leading to more accurate results. The proposed framework is general, and any state-of-the-art fully-supervised network structure can be incorporated to learn the segmentation network. When working with DeepLab for semantic segmentation, our method outperforms state-of-the-art weakly supervised alternatives by a large margin, achieving \(65.6\%\) mIoU on the PASCAL VOC 2012 dataset. We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.

107 citations


Journal ArticleDOI
Sen-Zhe Xu1, Jun Hu1, Miao Wang1, Tai-Jiang Mu1, Shi-Min Hu1 
TL;DR: A novel online deep learning framework to learn the stabilization transformation for each unsteady frame, given historical steady frames, composed of a generative network with spatial transformer networks embedded in different layers, and generates a stable frame for the incoming unstable frame by computing an appropriate affine transformation.
Abstract: Video stabilization is necessary for many hand‐held shot videos. In the past decades, although various video stabilization methods were proposed based on the smoothing of 2D, 2.5D or 3D camera paths, hardly have there been any deep learning methods to solve this problem. Instead of explicitly estimating and smoothing the camera path, we present a novel online deep learning framework to learn the stabilization transformation for each unsteady frame, given historical steady frames. Our network is composed of a generative network with spatial transformer networks embedded in different layers, and generates a stable frame for the incoming unstable frame by computing an appropriate affine transformation. We also introduce an adversarial network to determine the stability of apiece of video. The network is trained directly using the pair of steady and unsteady videos. Experiments show that our method can produce similar results as traditional methods, moreover, it is capable of handling challenging unsteady video of low quality, where traditional methods fail, such as video with heavy noise or multiple exposures. Our method runs in real time, which is much faster than traditional methods.

48 citations


Journal ArticleDOI
TL;DR: The proposed method outperforms state-of-the-art systems in terms of the accuracy of both recovered camera trajectories and reconstructed models and has implemented the proposed algorithm on the GPU, achieving real-time 3D scanning frame rates and updating the reconstructed model on thefly.
Abstract: We present an integrated approach for reconstructing high-fidelity three-dimensional (3D) models using consumer RGB-D cameras. RGB-D registration and reconstruction algorithms are prone to errors from scanning noise, making it hard to perform 3D reconstruction accurately. The key idea of our method is to assign a probabilistic uncertainty model to each depth measurement, which then guides the scan alignment and depth fusion. This allows us to effectively handle inherent noise and distortion in depth maps while keeping the overall scan registration procedure under the iterative closest point framework for simplicity and efficiency. We further introduce a local-to-global, submap-based, and uncertainty-aware global pose optimization scheme to improve scalability and guarantee global model consistency. Finally, we have implemented the proposed algorithm on the GPU, achieving real-time 3D scanning frame rates and updating the reconstructed model on-the-fly. Experimental results on simulated and real-world data demonstrate that the proposed method outperforms state-of-the-art systems in terms of the accuracy of both recovered camera trajectories and reconstructed models.

41 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper investigated six popular blending algorithms (feather blending, multi-band blending, modified Poisson blending, mean value coordinate blending and convolution pyramid blending) for real-time panoramic videos.
Abstract: Unlike image blending algorithms, video blending algorithms have been little studied. In this paper, we investigate six popular blending algorithms—feather blending, multi-band blending, modified Poisson blending, mean value coordinate blending, multi-spline blending, and convolution pyramid blending. We consider their application to blending realtime panoramic videos, a key problem in various virtual reality tasks. To evaluate the performances and suitabilities of the six algorithms for this problem, we have created a video benchmark with several videos captured under various conditions. We analyze the time and memory needed by the above six algorithms, for both CPU and GPU implementations (where readily parallelizable). The visual quality provided by these algorithms is also evaluated both objectively and subjectively. The video benchmark and algorithm implementations are publicly available. 1 1 http://cg.cs.tsinghua.edu.cn/blending/

30 citations


Journal ArticleDOI
Yuan Liang1, Xiting Wang1, Song-Hai Zhang1, Shi-Min Hu1, Shixia Liu1 
TL;DR: To better convey the compositions of a large number of example photos, this work has developed a multi-level, example photo layout method to balance multiple factors such as compactness, aspect ratio, composition distance, stability, and overlaps.
Abstract: We present a visual analysis method for interactively recomposing a large number of photos based on example photos with high-quality composition. The recomposition method is formulated as a matching problem between photos. The key to this formulation is a new metric for accurately measuring the composition distance between photos. We have also developed an earth-mover-distance-based online metric learning algorithm to support the interactive adjustment of the composition distance based on user preferences. To better convey the compositions of a large number of example photos, we have developed a multi-level, example photo layout method to balance multiple factors such as compactness, aspect ratio, composition distance, stability, and overlaps. By introducing an EulerSmooth-based straightening method, the composition of each photos is clearly displayed. The effectiveness and usefulness of the method has been demonstrated by the experimental results, user study, and case studies.

28 citations


Journal ArticleDOI
TL;DR: A novel temporally adaptive symplectic Euler scheme for MPM with regional time stepping (RTS), where different time steps are used in different regions, and a time stepping scheduler operating at the granularity of small blocks to maintain a natural consistency with the hybrid particle/grid nature of MPM is proposed.
Abstract: Spatially and temporally adaptive algorithms can substantially improve the computational efficiency of many numerical schemes in computational mechanics and physics‐based animation. Recently, a crucial need for temporal adaptivity in the Material Point Method (MPM) is emerging due to the potentially substantial variation of material stiffness and velocities in multi‐material scenes. In this work, we propose a novel temporally adaptive symplectic Euler scheme for MPM with regional time stepping (RTS), where different time steps are used in different regions. We design a time stepping scheduler operating at the granularity of small blocks to maintain a natural consistency with the hybrid particle/grid nature of MPM. Our method utilizes the Sparse Paged Grid (SPGrid) data structure and simultaneously offers high efficiency and notable ease of implementation with a practical multi‐threaded particle‐grid transfer strategy. We demonstrate the efficacy of our asynchronous MPM method on various examples including elastic objects, granular media, and fluids.

26 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A novel cascaded 3D convolutional network architecture is introduced, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner.
Abstract: We present a data-driven approach to reconstructing high-resolution and detailed volumetric representations of 3D shapes. Although well studied, algorithms for volumetric fusion from multi-view depth scans are still prone to scanning noise and occlusions, making it hard to obtain high-fidelity 3D reconstructions. In this paper, inspired by recent advances in efficient 3D deep learning techniques, we introduce a novel cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner. To this end, we also develop an algorithm for end-to-end training of the proposed cascaded structure. Qualitative and quantitative experimental results on both simulated and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work in terms of quality and fidelity of reconstructed models.

26 citations


Posted Content
TL;DR: A newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images, suitable for training robust neural networks for various tasks, particularly detection and recognition.
Abstract: We introduce Chinese Text in the Wild, a very large dataset of Chinese text in street view images. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, detection and recognition of text in natural images is still a challenging problem, especially for more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images. This is a challenging dataset with good diversity. It contains planar text, raised text, text in cities, text in rural areas, text under poor illumination, distant text, partially occluded text, etc. For each character in the dataset, the annotation includes its underlying character, its bounding box, and 6 attributes. The attributes indicate whether it has complex background, whether it is raised, whether it is handwritten or printed, etc. The large size and diversity of this dataset make it suitable for training robust neural networks for various tasks, particularly detection and recognition. We give baseline results using several state-of-the-art networks, including AlexNet, OverFeat, Google Inception and ResNet for character recognition, and YOLOv2 for character detection in images. Overall Google Inception has the best performance on recognition with 80.5% top-1 accuracy, while YOLOv2 achieves an mAP of 71.0% on detection. Dataset, source code and trained models will all be publicly available on the website.

24 citations


Journal ArticleDOI
TL;DR: The proposed general learning framework enables new portrait image editing applications such as occlusion removal and portrait extrapolation, and is evaluated on publicly-available portrait image datasets, and outperforms other state-of-the-art general image completion methods.
Abstract: General image completion and extrapolation methods often fail on portrait images where parts of the human body need to be recovered - a task that requires accurate human body structure and appearance synthesis. We present a two-stage deep learning framework for tacking this problem. In the first stage, given a portrait image with an incomplete human body, we extract a complete, coherent human body structure through a human parsing network, which focuses on structure recovery inside the unknown region with the help of pose estimation. In the second stage, we use an image completion network to fill the unknown region, guided by the structure map recovered in the first stage. For realistic synthesis the completion network is trained with both perceptual loss and conditional adversarial loss. We evaluate our method on public portrait image datasets, and show that it outperforms other state-of-art general image completion methods. Our method enables new portrait image editing applications such as occlusion removal and portrait extrapolation. We further show that the proposed general learning framework can be applied to other types of images, e.g. animal images.

23 citations


Journal ArticleDOI
TL;DR: This work proposes a method to automatically detect and localize visual distractors through learning from a manually labeled dataset, and proposes extracting features at the temporal-superpixel level using a traditional supporting vector machine based learning framework.
Abstract: Personal videos often contain visual distractors, which are objects that are accidentally captured and can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the temporal-superpixel level using a traditional supporting vector machine based learning framework. We also experiment with end-to-end learning using convolutional neural networks, which achieves slightly higher performance than other methods. The classification result is further refined in a postprocessing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling, video frame replacement, and camera path replanning. The user study results show that our method can significantly improve the aesthetic quality of videos.

20 citations


Journal ArticleDOI
TL;DR: This paper presents a novel hyper-lapse video creation approach based on multiple spatially-overlapping videos, which can synthesize novel virtual hyper- lapse routes, which may not exist originally.
Abstract: Hyper-lapse video with high speed-up rate is an efficient way to overview long videos, such as a human activity in first-person view. Existing hyper-lapse video creation methods produce a fast-forward video effect using only one video source. In this paper, we present a novel hyper-lapse video creation approach based on multiple spatially-overlapping videos. We assume the videos share a common view or location, and find transition points where jumps from one video to another may occur. We represent the collection of videos using a hyper-lapse transition graph ; the edges between nodes represent possible hyper-lapse frame transitions. To create a hyper-lapse video, a shortest path search is performed on this digraph to optimize frame sampling and assembly simultaneously. Finally, we render the hyper-lapse results using video stabilization and appearance smoothing techniques on the selected frames. Our technique can synthesize novel virtual hyper-lapse routes, which may not exist originally. We show various application results on both indoor and outdoor video collections with static scenes, moving objects, and crowds.

Journal ArticleDOI
TL;DR: A new set of mathematical and computational schemes are proposed which enable efficient and robust fluid‐solid interaction within the MPM framework, and support simulation of both multiphase flow and fully‐coupled solid‐fluid systems.
Abstract: The material point method (MPM) has attracted increasing attention from the graphics community, as it combines the strengths of both particle‐ and grid‐based solvers. Like the smoothed particle hydrodynamics (SPH) scheme, MPM uses particles to discretize the simulation domain and represent the fundamental unknowns. This makes it insensitive to geometric and topological changes, and readily parallelizable on a GPU. Like grid‐based solvers, MPM uses a background mesh for calculating spatial derivatives, providing more accurate and more stable results than a purely particle‐based scheme. MPM has been very successful in simulating both fluid flow and solid deformation, but less so in dealing with multiple fluids and solids, where the dynamic fluid‐solid interaction poses a major challenge. To address this shortcoming of MPM, we propose a new set of mathematical and computational schemes which enable efficient and robust fluid‐solid interaction within the MPM framework. These versatile schemes support simulation of both multiphase flow and fully‐coupled solid‐fluid systems. A series of examples is presented to demonstrate their capabilities and performance in the presence of various interacting fluids and solids, including multiphase flow, fluid‐solid interaction, and dissolution.

Posted Content
TL;DR: This work proposed a convolutional neural network with a novel prediction layer and a zoom module, called LineNet, designed for state-of-the-art lane detection in an unordered crowdsourced image dataset, and introduced TTLane, a dataset for efficientlane detection in urban road modeling applications.
Abstract: High Definition (HD) maps play an important role in modern traffic scenes However, the development of HD maps coverage grows slowly because of the cost limitation To efficiently model HD maps, we proposed a convolutional neural network with a novel prediction layer and a zoom module, called LineNet It is designed for state-of-the-art lane detection in an unordered crowdsourced image dataset And we introduced TTLane, a dataset for efficient lane detection in urban road modeling applications Combining LineNet and TTLane, we proposed a pipeline to model HD maps with crowdsourced data for the first time And the maps can be constructed precisely even with inaccurate crowdsourced data

Journal ArticleDOI
TL;DR: Using the proposed method, one can easily obtain a stable selfie video with expanded background content by merely capturing some background shots and stabilize the composed video content with a portrait-preserving constraint.
Abstract: Selfie photography from the hand-held camera is becoming a popular media type Although being convenient and flexible, it suffers from low camera motion stability, small field of view, and limited background content These limitations can annoy users, especially, when touring a place of interest and taking selfie videos In this paper, we present a novel method to create what we call a BiggerSelfie that deals with these shortcomings Using a video of the environment that has partial content overlap with the selfie video, we stitch plausible frames selected from the environment video to the original selfie frames and stabilize the composed video content with a portrait-preserving constraint Using the proposed method, one can easily obtain a stable selfie video with expanded background content by merely capturing some background shots We show various results and several evaluations to demonstrate the applicability of our method

Posted Content
TL;DR: This paper proposes a convolutional neural network, called StabNet, that learns a transformation for each input unsteady frame progressively along the time-line, while creating a more stable latent camera path in real-time without explicitly representing the camera path.
Abstract: Video stabilization technique is essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D- and 3D-based stabilization techniques are well studied, but to our knowledge, no solutions based on deep neural networks had been proposed. The reason for this is mostly the shortage of training data, as well as the challenge of modeling the problem using neural networks. In this paper, we solve the video stabilization problem using a convolutional neural network (ConvNet). Instead of dealing with offline holistic camera path smoothing based on feature matching, we focus on low-latency real-time camera path smoothing without explicitly representing the camera path. Our network, called StabNet, learns a transformation for each input unsteady frame progressively along the time-line, while creating a more stable latent camera path. To train the network, we create a dataset of synchronized steady/unsteady video pairs via a well designed hand-held hardware. Experimental results shows that the proposed online method (without using future frames) performs comparatively to traditional offline video stabilization methods, while running about 30 times faster. Further, the proposed StabNet is able to handle night-time and blurry videos, where existing methods fail in robust feature matching.

Proceedings Article
11 Jul 2018
TL;DR: A practical static approach, to effectively detect SAC bugs and automatically recommend patches to help fix them, is proposed, and is evaluated on kernel modules of the Linux kernel and on the FreeBSD and NetBSD kernels, finding 401 new real bugs.
Abstract: In a modern OS, kernel modules often use spinlocks and interrupt handlers to monopolize a CPU core to execute concurrent code in atomic context. In this situation, if the kernel module performs an operation that can sleep at runtime, a system hang may occur. We refer to this kind of concurrency bug as a sleep-in-atomic-context (SAC) bug. In practice, SAC bugs have received insufficient attention and are hard to find, as they do not always cause problems in real executions. In this paper, we propose a practical static approach named DSAC, to effectively detect SAC bugs and automatically recommend patches to help fix them. DSAC uses four key techniques: (1) a hybrid of flow-sensitive and-insensitive analysis to perform accurate and efficient code analysis; (2) a heuristics-based method to accurately extract kernel interfaces that can sleep at runtime; (3) a path-check method to effectively filter out repeated reports and false bugs; (4) a pattern-based method to automatically generate recommended patches to help fix the bugs. We evaluate DSAC on kernel modules (drivers, file systems, and network modules) of the Linux kernel, and on the FreeBSD and NetBSD kernels, and in total find 401 new real bugs. 272 of these bugs have been confirmed by the relevant kernel maintainers, and 43 patches generated by DSAC have been applied by kernel maintainers.

Journal ArticleDOI
Bo Ren1, Tai-Ling Yuan2, Chenfeng Li, Kun Xu2, Shi-Min Hu2 
TL;DR: An efficient, robust and high-fidelity simulation approach based on the shallow-water equations that achieves compatibility with existing 3D fluid simulators and by supporting physically realistic interactions with multiple fluids and solid surfaces is presented.
Abstract: Surface flow phenomena, such as rain water flowing down a tree trunk and progressive water front in a shower room, are common in real life. However, compared with the 3D spatial fluid flow, these surface flow problems have been much less studied in the graphics community. To tackle this research gap, we present an efficient, robust and high-fidelity simulation approach based on the shallow-water equations. Specifically, the standard shallow-water flow model is extended to general triangle meshes with a feature-based bottom friction model, and a series of coherent mathematical formulations are derived to represent the full range of physical effects that are important for real-world surface flow phenomena. In addition, by achieving compatibility with existing 3D fluid simulators and by supporting physically realistic interactions with multiple fluids and solid surfaces, the new model is flexible and readily extensible for coupled phenomena. A wide range of simulation examples are presented to demonstrate the performance of the new approach.

Posted Content
TL;DR: In this paper, a pose-based instance segmentation framework for humans is presented, which separates instances based on human pose, rather than proposal region detection, and achieves better accuracy than the state-of-the-art detection-based approach.
Abstract: The standard approach to image instance segmentation is to perform the object detection first, and then segment the object from the detection bounding-box. More recently, deep learning methods like Mask R-CNN perform them jointly. However, little research takes into account the uniqueness of the "human" category, which can be well defined by the pose skeleton. Moreover, the human pose skeleton can be used to better distinguish instances with heavy occlusion than using bounding-boxes. In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, rather than proposal region detection. We demonstrate that our pose-based framework can achieve better accuracy than the state-of-art detection-based approach on the human instance segmentation problem, and can moreover better handle occlusion. Furthermore, there are few public datasets containing many heavily occluded humans along with comprehensive annotations, which makes this a challenging problem seldom noticed by researchers. Therefore, in this paper we introduce a new benchmark "Occluded Human (OCHuman)", which focuses on occluded humans with comprehensive annotations including bounding-box, human pose and instance masks. This dataset contains 8110 detailed annotated human instances within 4731 images. With an average 0.67 MaxIoU for each person, OCHuman is the most complex and challenging dataset related to human instance segmentation. Through this dataset, we want to emphasize occlusion as a challenging problem for researchers to study.

Posted Content
28 Mar 2018
TL;DR: This paper presents a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, not proposal region detection, and demonstrates that this framework can achieve similar accuracy to the detection-based approach, and can moreover better handle occlusion, which is the most challenging problem in the Detection-based framework.
Abstract: The general method of image instance segmentation is to perform the object detection first, and then segment the object from the detection bounding-box. More recently, deep learning methods like Mask R-CNN perform them jointly. However, little research takes into account the uniqueness of the "1human" category, which can be well defined by the pose skeleton. In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, not proposal region detection. We demonstrate that our pose-based framework can achieve similar accuracy to the detection-based approach, and can moreover better handle occlusion, which is the most challenging problem in the detection-based framework.

Proceedings ArticleDOI
TL;DR: Towards Zero Copy (TZC) as discussed by the authors is an efficient IPC technique that can generate messages that can be divided into two parts, one part is transmitted through a socket and the other part uses shared memory, the part within shared memory is never copied or serialized during its lifetime.
Abstract: Inter-process communication (IPC) is one of the core functions of modern robotics middleware. We propose an efficient IPC technique called TZC (Towards Zero-Copy). As a core component of TZC, we design a novel algorithm called partial serialization. Our formulation can generate messages that can be divided into two parts. During message transmission, one part is transmitted through a socket and the other part uses shared memory. The part within shared memory is never copied or serialized during its lifetime. We have integrated TZC with ROS and ROS2 and find that TZC can be easily combined with current open-source platforms. By using TZC, the overhead of IPC remains constant when the message size grows. In particular, when the message size is 4MB (less than the size of a full HD image), TZC can reduce the overhead of ROS IPC from tens of milliseconds to hundreds of microseconds and can reduce the overhead of ROS2 IPC from hundreds of milliseconds to less than 1 millisecond. We also demonstrate the benefits of TZC by integrating with TurtleBot2 that are used in autonomous driving scenarios. We show that by using TZC, the braking distance can be shortened by 16% than ROS.

Journal ArticleDOI
TL;DR: This paper proposes an efficient algorithm for dendritic crystal simulation that is able to reproduce arbitrary symmetry patterns with different levels of asymmetry breaking effect on general grids or meshes, including spreading on curved surfaces and growth in 3D.
Abstract: Real world dendritic growths show charming structures by their exquisite balance between the symmetry and randomness in the crystal formation. Other than the variety in the natural crystals, richer visual appearance of crystals can benefit from artificially controlling of the crystal growth on its growing directions and shapes. In this paper, by introducing one extra dimension of freedom, i.e. the orientation field, into the simulation, we propose an efficient algorithm for dendritic crystal simulation that is able to reproduce arbitrary symmetry patterns with different levels of asymmetry breaking effect on general grids or meshes, including spreading on curved surfaces and growth in 3D. Flexible artistic control is also enabled in a unified manner by exploiting and guiding the orientation field in the visual simulation. We show the effectiveness of our approach by various demonstrations of simulation results.

Journal ArticleDOI
TL;DR: Using the first computational tool to help ordinary users create transforming pop-up books, inexperienced users can create models in a short time; previously, even experienced artists often took weeks to manually create them.
Abstract: We present the first computational tool to help ordinary users create transforming pop-up books. In each transforming pop-up, when the user pulls a tab, an initial flat two-dimensional (2D) pattern, i.e., a 2D shape with a superimposed picture, such as an airplane, turns into a new 2D pattern, such as a robot. Given the two 2D patterns, our approach automatically computes a 3D pop-up mechanism that transforms one pattern into the other; it also outputs a design blueprint, allowing the user to easily make the final model. We also present a theoretical analysis of basic transformation mechanisms; combining these basic mechanisms allows more flexibility of final designs. Using our approach, inexperienced users can create models in a short time; previously, even experienced artists often took weeks to manually create them. We demonstrate our method on a variety of real-world examples.

Posted Content
TL;DR: An unsupervised algorithm based on object-level contexts is built, which explicitly models the joint probability distribution of object categories and bounding boxes using a Gaussian mixture model.
Abstract: In this work, we propose a novel topic consisting of two dual tasks: 1) given a scene, recommend objects to insert, 2) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized at test time, whereas available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes with a Gaussian mixture model. Experiments on our newly annotated test set demonstrate that our system outperforms existing baselines on all subtasks, and do so under a unified framework. Our contribution promises future extensions and applications.

Proceedings ArticleDOI
24 Feb 2018
TL;DR: The results show that AutoPA can automatically and successfully generate usable active drivers from original driver code and the performance of generated active drivers is not obviously degraded compared to original passive drivers.
Abstract: Original device drivers are often passive in common operating systems, and they should correctly handle synchronization when concurrently invoked by multiple external threads. However, many concurrency bugs have occurred in drivers due to incautious synchronization. To solve concurrency problems, active driver is proposed to replace original passive driver. An active driver has its own thread and does not need to handle synchronization, thus the occurrence probability of many concurrency bugs can be effectively reduced. But previous approaches of active driver have some limitations. The biggest limitation is that original passive driver code needs to be manually rewritten. In this paper, we propose a practical approach, AutoPA, to automatically generate efficient active driver from original passive driver code. AutoPA uses function analysis and code instrumentation to perform automated driver generation, and it uses an improved active driver architecture to reduce performance degradation. We have evaluated AutoPA on 20 Linux drivers. The results show that AutoPA can automatically and successfully generate usable active drivers from original driver code. And generated active drivers can work normally with or without the synchronization primitives in original driver code. To check the effect of AutoPA on driver reliability, we perform fault injection testing on the generated active drivers, and find that all injected concurrency faults are well tolerated and the drivers can work normally. And the performance of generated active drivers is not obviously degraded compared to original passive drivers.

Posted Content
Haozhi Huang1, Sen-Zhe Xu2, Jun-Xiong Cai2, Wei Liu1, Shi-Min Hu2 
TL;DR: In this paper, a pixel-wise disharmony discriminator is used to improve the realism of video harmonization, and a temporal loss is introduced to increase temporal consistency between consecutive harmonized frames.
Abstract: Compositing is one of the most important editing operations for images and videos. The process of improving the realism of composite results is often called harmonization. Previous approaches for harmonization mainly focus on images. In this work, we take one step further to attack the problem of video harmonization. Specifically, we train a convolutional neural network in an adversarial way, exploiting a pixel-wise disharmony discriminator to achieve more realistic harmonized results and introducing a temporal loss to increase temporal consistency between consecutive harmonized frames. Thanks to the pixel-wise disharmony discriminator, we are also able to relieve the need of input foreground masks. Since existing video datasets which have ground-truth foreground masks and optical flows are not sufficiently large, we propose a simple yet efficient method to build up a synthetic dataset supporting supervised training of the proposed adversarial network. Experiments show that training on our synthetic dataset generalizes well to the real-world composite dataset. Also, our method successfully incorporates temporal consistency during training and achieves more harmonious results than previous methods.