scispace - formally typeset
Search or ask a question

Showing papers on "Bounding overwatch published in 2020"


Journal ArticleDOI
TL;DR: A novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion and incorporated into the existing hierarchical deep network, which can improve the precision of object localization.
Abstract: The objective of detection in remote sensing images is to determine the location and category of all targets in these images. The anchor based methods are the most prevalent deep learning based methods, and still have some problems that need to be addressed. First, the existing metric (i.e., intersection over union (IoU)) could not measure the distance between two bounding boxes when they are nonoverlapping. Second, the exsiting bounding box regression loss could not directly optimize the metric in the training process. Third, the existing methods which adopt a hierarchical deep network only choose a single level feature layer for the feature extraction of region proposals, meaning they do not take full use of the advantage of multi-level features. To resolve the above problems, a novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion is proposed in this paper. First, a new metric named generalized IoU is applied, which can quantify the distance between two bounding boxes, regardless of whether they are overlapping or not. Second, a novel bounding box regression loss is proposed, which can not only optimize the new metric (i.e., generalized IoU) directly but also overcome the problem that existing bounding box regression loss based on the new metric cannot adaptively change the gradient based on the metric value. Finally, a multi-level features fusion module is proposed and incorporated into the existing hierarchical deep network, which can make full use of the multi-level features for each region proposal. The quantitative comparisons between the proposed method and baseline method on the large scale dataset DIOR demonstrate that incorporating the proposed bounding box regression loss, multi-level features fusion module, and a combination of both into the baseline method can obtain an absolute gain of 0.7%, 1.4%, and 2.2% or so in terms of mAP, respectively. Comparing this with the state-of-the-art methods demonstrates that the proposed method has achieved a state-of-the-art performance. The curves of average precision with different thresholds show that the advantage of the proposed method is more evident when the threshold of generalized IoU (or IoU) is relatively high, which means that the proposed method can improve the precision of object localization. Similar conclusions can be obtained on a NWPU VHR-10 dataset.

94 citations


Posted Content
TL;DR: A new version of Y OLO with better performance and extended with instance segmentation called Poly-YOLO, which has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices.
Abstract: We present a new version of YOLO with better performance and extended with instance segmentation called Poly-YOLO. Poly-YOLO builds on the original ideas of YOLOv3 and removes two of its weaknesses: a large amount of rewritten labels and inefficient distribution of anchors. Poly-YOLO reduces the issues by aggregating features from a light SE-Darknet-53 backbone with a hypercolumn technique, using stairstep upsampling, and produces a single scale output with high resolution. In comparison with YOLOv3, Poly-YOLO has only 60% of its trainable parameters but improves mAP by a relative 40%. We also present Poly-YOLO lite with fewer parameters and a lower output resolution. It has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices. Finally, Poly-YOLO performs instance segmentation using bounding polygons. The network is trained to detect size-independent polygons defined on a polar grid. Vertices of each polygon are being predicted with their confidence, and therefore Poly-YOLO produces polygons with a varying number of vertices.

71 citations


25 Jan 2020
TL;DR: A novel weakly supervised learning segmentation based on several global constraints derived from box annotations is proposed, leveraging a classical tightness prior to a deep learning setting via imposing a set of constraints on the network outputs.
Abstract: We propose a novel weakly supervised learning segmentation based on several global constraints derived from box annotations. Particularly, we leverage a classical tightness prior to a deep learning setting via imposing a set of constraints on the network outputs. Such a powerful topological prior prevents solutions from excessive shrinking by enforcing any horizontal or vertical line within the bounding box to contain, at least, one pixel of the foreground region. Furthermore, we integrate our deep tightness prior with a global background emptiness constraint, guiding training with information outside the bounding box. We demonstrate experimentally that such a global constraint is much more powerful than standard cross-entropy for the background class. Our optimization problem is challenging as it takes the form of a large set of inequality constraints on the outputs of deep networks. We solve it with sequence of unconstrained losses based on a recent powerful extension of the log-barrier method, which is well-known in the context of interior-point methods. This accommodates standard stochastic gradient descent (SGD) for training deep networks, while avoiding computationally expensive and unstable Lagrangian dual steps and projections. Extensive experiments over two different public data sets and applications (prostate and brain lesions) demonstrate that the synergy between our global tightness and emptiness priors yield very competitive performances, approaching full supervision and outperforming significantly DeepCut. Furthermore, our approach removes the need for computationally expensive proposal generation. Our code is shared anonymously.

56 citations


Journal ArticleDOI
TL;DR: In this article, a comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators, is presented, which mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task.

52 citations


Posted Content
TL;DR: A variational quantum algorithm called Variational Quantum Fisher Information Estimation (VQFIE) is presented, which estimates lower and upper bounds on the QFI, based on bounding the fidelity, and outputs a range in which the actual QFI lies.
Abstract: The Quantum Fisher information (QFI) quantifies the ultimate precision of estimating a parameter from a quantum state, and can be regarded as a reliability measure of a quantum system as a quantum sensor. However, estimation of the QFI for a mixed state is in general a computationally demanding task. In this work we present a variational quantum algorithm called Variational Quantum Fisher Information Estimation (VQFIE) to address this task. By estimating lower and upper bounds on the QFI, based on bounding the fidelity, VQFIE outputs a range in which the actual QFI lies. This result can then be used to variationally prepare the state that maximizes the QFI, for the application of quantum sensing. In contrast to previous approaches, VQFIE does not require knowledge of the explicit form of the sensor dynamics. We simulate the algorithm for a magnetometry setup and demonstrate the tightening of our bounds as the state purity increases. For this example, we compare our bounds to literature bounds and show that our bounds are tighter.

49 citations


Journal ArticleDOI
TL;DR: A method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes, demonstrating its ability to generalize for different types of objects.
Abstract: In this paper we introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes. The proposed disentangling transformation isolates the contribution made by different groups of parameters to a given loss, without changing its nature. This brings two advantages: i) it simplifies the training dynamics in the presence of losses with complex interactions of parameters, and ii) it allows us to avoid the issue of balancing independent regression terms. We further apply this disentangling transformation to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. We also critically review the AP metric used in KITTI3D and resolve a flaw which affected and biased all previously published results on monocular 3D detection. Our improved metric is now used as official KITTI3D metric. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results. We provide additional results on all the classes of KITTI3D as well as nuScenes datasets to further validate the robustness of our method, demonstrating its ability to generalize for different types of objects.

26 citations


Journal ArticleDOI
19 Oct 2020
TL;DR: This work shows how one can robustly certify over 2.3 bits of device-independent local randomness from a two-quibt state using a sequence of measurements, going beyond the theoretical maximum of two bits that can be achieved with non-sequential measurements.
Abstract: An important problem in quantum information theory is that of bounding sets of correlations that arise from making local measurements on entangled states of arbitrary dimension. Currently, the best-known method to tackle this problem is the NPA hierarchy; an infinite sequence of semidefinite programs that provides increasingly tighter outer approximations to the desired set of correlations. In this work we consider a more general scenario in which one performs sequences of local measurements on an entangled state of arbitrary dimension. We show that a simple adaptation of the original NPA hierarchy provides an analogous hierarchy for this scenario, with comparable resource requirements and convergence properties. We then use the method to tackle some problems in device-independent quantum information. First, we show how one can robustly certify over 2.3 bits of device-independent local randomness from a two-quibt state using a sequence of measurements, going beyond the theoretical maximum of two bits that can be achieved with non-sequential measurements. Finally, we show tight upper bounds to two previously defined tasks in sequential Bell test scenarios.

26 citations


Journal ArticleDOI
TL;DR: A novel rotation detector is proposed which redesigns the matching strategy between oriented anchors and ground truth boxes, thereby reducing the instability of the angle to the matching process and achieves state-of-the-art detection accuracy with higher efficiency.
Abstract: Oriented object detection has received extensive attention in recent years, especially for the task of detecting targets in aerial imagery. Traditional detectors locate objects by horizontal bounding boxes (HBBs), which may cause inaccuracies when detecting objects with arbitrary oriented angles, dense distribution and a large aspect ratio. Oriented bounding boxes (OBBs), which add different rotation angles to the horizontal bounding boxes, can better deal with the above problems. New problems arise with the introduction of oriented bounding boxes for rotation detectors, such as an increase in the number of anchors and the sensitivity of the intersection over union (IoU) to changes of angle. To overcome these shortcomings while taking advantage of the oriented bounding boxes, we propose a novel rotation detector which redesigns the matching strategy between oriented anchors and ground truth boxes. The main idea of the new strategy is to decouple the rotating bounding box into a horizontal bounding box during matching, thereby reducing the instability of the angle to the matching process. Extensive experiments on public remote sensing datasets including DOTA, HRSC2016 and UCAS-AOD demonstrate that the proposed approach achieves state-of-the-art detection accuracy with higher efficiency.

26 citations


Journal ArticleDOI
Li Tangwei1, Guanjun Tong1, Hongying Tang1, Baoqing Li1, Chen Bo1 
TL;DR: The No-prior Fisheye Representation Method and the Distortion Shape Matching strategy is proposed, which invokes the irregular quadrilateral bounding boxes based on the contour of distorted objects as the core of the proposed object detector.
Abstract: Fisheye Images have attracted increasing attention from the research community due to their large field of view (LFOV). However, the geometric transformations inherent in fisheye cameras result in unknown spatial distortion and large variations in the appearance of objects. And this fact leads to poor performance of the state-of-the-art methods in conventional two-dimensional (2D) images. To address this problem, we propose a self-study and contour-based object detector in fisheye images, named FisheyeDet. The No-prior Fisheye Representation Method is proposed to guarantee that the network adaptively extracts distortion features without prior information such as prespecified lens parameters, special calibration patterns, etc. Furthermore, in order to tightly and robustly localize objects in fisheye images, the Distortion Shape Matching strategy is proposed, which invokes the irregular quadrilateral bounding boxes based on the contour of distorted objects as the core. By combining with the “No-prior Fisheye Representation Method” and “Distortion Shape Matching”, our proposed detector builds an end-to-end network. Finally, due to the lack of public fisheye datasets, we are on the first attempt to create a multi-class fisheye dataset VOC-Fisheye for object detection. Our proposed detector shows favorable generalization ability and achieves 74.87% mAP (mean average precision) on the VOC-Fisheye, outperforming the existing state-of-the-art methods.

21 citations


Posted Content
TL;DR: This paper proposes a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of a target estimation component and a target classification component that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed.
Abstract: Most existing trackers are based on using a classifier and multi-scale estimation to estimate the target state Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated While trackers adopt a maximum overlap method based on an intersection-over-union (IoU) loss to mitigate this problem, there are defects in the IoU loss itself, that make it impossible to continue to optimize the objective function when a given bounding box is completely contained within/without another bounding box; this makes it very challenging to accurately estimate the target state Accordingly, in this paper, we address the above-mentioned problem by proposing a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of target estimation and target classification The target estimation part is trained to predict the DIoU score between the target ground-truth bounding-box and the estimated bounding-box The DIoU loss can maintain the advantage provided by the IoU loss while minimizing the distance between the center points of two bounding boxes, thereby making the target estimation more accurate Moreover, we introduce a classification part that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed Comprehensive experimental results demonstrate that the proposed method achieves competitive tracking accuracy when compared to state-of-the-art trackers while with a real-time tracking speed

21 citations


Proceedings Article
01 Jan 2020
TL;DR: A novel representation based on 2D beta distribution, named Beta Representation, is proposed, which is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS.
Abstract: Recently significant progress has been made in pedestrian detection, but it remains challenging to achieve high performance in occluded and crowded scenes. It could be attributed mostly to the widely used representation of pedestrians, i.e., 2D axis-aligned bounding box, which just describes the approximate location and size of the object. Bounding box models the object as a uniform distribution within the boundary, making pedestrians indistinguishable in occluded and crowded scenes due to much noise. To eliminate the problem, we propose a novel representation based on 2D beta distribution, named Beta Representation. It pictures a pedestrian by explicitly constructing the relationship between full-body and visible boxes, and emphasizes the center of visual mass by assigning different probability values to pixels. As a result, Beta Representation is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS. What’s more, to fully exploit Beta Representation, a novel pipeline Beta R-CNN equipped with BetaHead and BetaMask is proposed, leading to high detection performance in occluded and crowded scenes. Code will be released at github.com/Guardian44x/Beta-R-CNN.

Posted Content
TL;DR: A new object detector called Gaussian-FCOS is proposed that estimates the localization uncertainty based on an anchor-free detector that captures the uncertainty of similar property with four directions of box offsets and avoids the anchor tuning.
Abstract: Since many safety-critical systems, such as surgical robots and autonomous driving cars, are in unstable environments with sensor noise and incomplete data, it is desirable for object detectors to take into account the confidence of localization prediction There are three limitations of the prior uncertainty estimation methods for anchor-based object detection 1) They model the uncertainty based on object properties having different characteristics, such as location (center point) and scale (width, height) 2) they model a box offset and ground-truth as Gaussian distribution and Dirac delta distribution, which leads to the model misspecification problem Because the Dirac delta distribution is not exactly represented as Gaussian, ie, for any $\mu$ and $\Sigma$ 3) Since anchor-based methods are sensitive to hyper-parameters of anchor, the localization uncertainty modeling is also sensitive to these parameters Therefore, we propose a new localization uncertainty estimation method called Gaussian-FCOS for anchor-free object detection Our method captures the uncertainty based on four directions of box offsets~(left, right, top, bottom) that have similar properties, which enables to capture which direction is uncertain and provide a quantitative value in range~[0, 1] To this end, we design a new uncertainty loss, negative power log-likelihood loss, to measure uncertainty by weighting IoU to the likelihood loss, which alleviates the model misspecification problem Experiments on COCO datasets demonstrate that our Gaussian-FCOS reduces false positives and finds more missing-objects by mitigating over-confidence scores with the estimated uncertainty We hope Gaussian-FCOS serves as a crucial component for the reliability-required task

Journal ArticleDOI
TL;DR: An improved version of the perspective transformation which is more robust and fully automatic and an extended experimental evaluation of speed estimation is presented.
Abstract: Detection and tracking of vehicles captured by traffic surveillance cameras is a key component of intelligent transportation systems. We present an improved version of our algorithm for detection of 3D bounding boxes of vehicles, their tracking and subsequent speed estimation. Our algorithm utilizes the known geometry of vanishing points in the surveilled scene to construct a perspective transformation. The transformation enables an intuitive simplification of the problem of detecting 3D bounding boxes to detection of 2D bounding boxes with one additional parameter using a standard 2D object detector. Main contribution of this paper is an improved construction of the perspective transformation which is more robust and fully automatic and an extended experimental evaluation of speed estimation. We test our algorithm on the speed estimation task of the BrnoCompSpeed dataset. We evaluate our approach with different configurations to gauge the relationship between accuracy and computational costs and benefits of 3D bounding box detection over 2D detection. All of the tested configurations run in real-time and are fully automatic. Compared to other published state-of-the-art fully automatic results our algorithm reduces the mean absolute speed measurement error by 32% (1.10 km/h to 0.75 km/h) and the absolute median error by 40% (0.97 km/h to 0.58 km/h).

Proceedings Article
03 Jun 2020
TL;DR: It is observed that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not.
Abstract: We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled "fast" (but non-robust) and "slow" (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not

Posted Content
TL;DR: An energy-based a posteriori error bound is proposed for the physics-informed neural network solutions of elasticity problems that provides an upper bound of the global error of neural network discretization.
Abstract: An energy-based a posteriori error bound is proposed for the physics-informed neural network solutions of elasticity problems. An admissible displacement-stress solution pair is obtained from a mixed form of physics-informed neural networks, and the proposed error bound is formulated as the constitutive relation error defined by the solution pair. Such an error estimator provides an upper bound of the global error of neural network discretization. The bounding property, as well as the asymptotic behavior of the physics-informed neural network solutions, are studied in a demonstrating example.

Journal ArticleDOI
Jun Chu1, Yiqing Zhang1, Shaoming Li1, Lu Leng1, Jun Miao1 
TL;DR: Experimental results on the MS COCO dataset demonstrate that Syncretic-NMS can steadily increase the accuracy of instance segmentation, while experimentalresults on the Cityscapes dataset prove that the algorithm can adapt to application scenario changes.
Abstract: Instance segmentation is typically based on an object detection framework Semantic segmentation is conducted on the bounding boxes that are returned by detectors NMS (non-maximum suppression) is a common post-processing operation in instance segmentation and object detection tasks It is typically used after bounding box regression to eliminate redundant bounding boxes The evaluation criteria for object detection require that the bounding box be as close as possible to the ground truth, but they do not emphasize the integrity of the included object However, sometimes the bounding boxes cannot contain the complete objects, and the parts beyond the bounding boxes cannot be correctly predicted in the subsequent semantic segmentation To solve this problem, we propose the Syncretic-NMS algorithm The algorithm takes traditional NMS as the first step and processes the bounding boxes obtained by traditional NMS, judges the neighboring bounding boxes of each bounding box, and combines the neighboring boxes that are strongly correlated with the corresponding bounding boxes The coordinates of the merged box are the four coordinate extremes of the bounding box and the highly relevant neighboring box The neighboring box with strong correlation is merged with the corresponding bounding box Based on an analysis of the influences of corresponding factors, the criteria for correlation judgment are specified Experimental results on the MS COCO dataset demonstrate that Syncretic-NMS can steadily increase the accuracy of instance segmentation, while experimental results on the Cityscapes dataset prove that the algorithm can adapt to application scenario changes The computational complexity of Syncretic-NMS is the same as that of traditional NMS Syncretic-NMS is easy to implement, requires no additional training, and can be easily integrated into the available instance segmentation framework

Journal ArticleDOI
TL;DR: An accumulated attention (A-ATT) mechanism to reason among all the attention modules jointly is proposed to reduce internal redundancies in visual grounding and is evaluated on four popular datasets.
Abstract: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. In real-world VG applications, however, we usually have to deal with ambiguous queries and images with complicated scene structures. Identifying the target based on highly redundant and correlated information can be very challenging, leading to unsatisfactory performance. To tackle this, in this paper, we exploit an attention module for each kind of information to reduce the internal redundancies. We then propose the Accumulated Attention mechanism to reason among all the attention modules jointly, thus the correlations among different kinds of information can be explicitly captured. Moreover, to improve the performance and robustness of our VG models, we introduce some noises into the training procedure to bridge the distribution gap between the human-labeled training data and the real-world poor quality data. With this ``noised'' training strategy, we further learn a bounding box regressor, which can be used to refine the bounding box of the target object. We evaluate the proposed methods on four benchmark datasets. The experimental results show that our methods significantly outperform all previous works on every dataset in terms of both speed and accuracy.

Posted Content
TL;DR: The results provide the first nontrivial dimension-dependent lower bound for this problem, and establish an information theoretic limit for several popular sampling algorithms that operate by using stochastic gradients of the log density to generate a sample.
Abstract: We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms. We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $\Omega(\sigma^2 d/\varepsilon^2)$, where $\sigma^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.

Journal ArticleDOI
TL;DR: In this article, three different analytical methods for the computation of upper bounds for the rate of convergence to the limiting regime of one specific class of (in)homogeneous continuous-time Markov chains are considered.
Abstract: Abstract Consideration is given to three different analytical methods for the computation of upper bounds for the rate of convergence to the limiting regime of one specific class of (in)homogeneous continuous-time Markov chains. This class is particularly well suited to describe evolutions of the total number of customers in (in)homogeneous M/M/S queueing systems with possibly state-dependent arrival and service intensities, batch arrivals and services. One of the methods is based on the logarithmic norm of a linear operator function; the other two rely on Lyapunov functions and differential inequalities, respectively. Less restrictive conditions (compared with those known from the literature) under which the methods are applicable are being formulated. Two numerical examples are given. It is also shown that, for homogeneous birth-death Markov processes defined on a finite state space with all transition rates being positive, all methods yield the same sharp upper bound.

Posted Content
TL;DR: This research studies the information freshness in M/G/1 queueing system with a single buffer and the server taking multiple vacations and derives closed-form expressions of informationfreshness metrics such as the expected Age of Information (AoI), the expected Peak Age of information (PAoI) and the variance of peak age under each policy.
Abstract: In this research, we consider age-related metrics for queueing systems with vacation server. Assuming that there is a single buffer at the queue to receive packets, we consider three variations of this single buffer system, namely Conventional Buffer System (CBS), Buffer Relaxation System (BRS), and Conventional Buffer System with Preemption in Service (CBS-P). We introduce a decomposition approach to derive the closed-form expressions for expected Age of Information (AoI), expected Peak Age of Information (PAoI) as well as the variance of peak age for these systems. We then consider these three systems with non-independent vacations, and use polling system as an example to show that the decomposition approach can be applied to derive closed-form expressions of PAoI for general situation. We explore the conditions under which one of these systems has advantage over the others, and we further perform numerical studies to validate our results and develop insights.

Proceedings Article
01 Jan 2020
TL;DR: This work proposes a method which smoothly bounds user contributions by setting appropriate weights on data points and applies it to estimating the mean/quantiles, linear regression, and empirical risk minimization and shows that the algorithm provably outperforms the sample limiting algorithm.
Abstract: A differentially private algorithm guarantees that the input of a single user won’t significantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in order to keep the algorithm differentially private and this might hurt the algorithm’s performance. [AKMV19] initiates the study on bounding user contributions and proposes a very natural algorithm which limits the number of samples each user can contribute by a threshold. For a better trade-off between utility and privacy guarantee, we propose a method which smoothly bounds user contributions by setting appropriate weights on data points and apply it to estimating the mean/quantiles, linear regression, and empirical risk minimization. We show that our algorithm provably outperforms the sample limiting algorithm. We conclude with experimental evaluations which validate our theoretical results.

Journal ArticleDOI
Kun Zhao1, Yongkun Liu1, Siyuan Hao1, Shaoxing Lu1, Hongbin Liu, Lijian Zhou1 
TL;DR: A novel approach based on a “bottom-up and top-down” framework that achieves a 12.65% performance improvement on macroprecision and 12% on macrorecall over image-level CNN-based models.
Abstract: Street view images classification aiming at urban land use analysis is difficult because the class labels (e.g., commercial area), are concepts with higher abstract level compared to the ones of general visual tasks (e.g., persons and cars). Therefore, classification models using only visual features often fail to achieve satisfactory performance. In this paper, a novel approach based on a "Detector-Encoder-Classifier" framework is proposed. Instead of using visual features of the whole image directly as common image-level models based on convolutional neural networks (CNNs) do, the proposed framework firstly obtains the bounding boxes of buildings in street view images from a detector. Their contextual information such as the co-occurrence patterns of building classes and their layout are then encoded into metadata by the proposed algorithm "CODING" (Context encOding of Detected buildINGs). Finally, these bounding box metadata are classified by a recurrent neural network (RNN). In addition, we made a dual-labeled dataset named "BEAUTY" (Building dEtection And Urban funcTional-zone portraYing) of 19,070 street view images and 38,857 buildings based on the existing BIC GSV [1]. The dataset can be used not only for street view image classification, but also for multi-class building detection. Experiments on "BEAUTY" show that the proposed approach achieves a 12.65% performance improvement on macro-precision and 12% on macro-recall over image-level CNN based models. Our code and dataset are available at this https URL

Journal ArticleDOI
TL;DR: An analysis indicates that the latent variables augmentation method based on regularized latent variables distributions can generate samples fitting well with the distribution of data such that the proposed method can improve the performance of CNN with insufficient samples.
Abstract: Image classification is an important part of pattern recognition. With the development of convolutional neural networks (CNNs), many CNN methods are proposed, which have a large number of samples for training, which can have high performance. However, there may exist limited samples in some real-world applications. In order to improve the performance of CNN learning with insufficient samples, this article proposes a new method called the classifier method based on a variational autoencoder (CFVAE), which is comprised of two parts: 1) a standard CNN as a prior classifier and 2) a CNN based on variational autoencoder (VAE) as a posterior classifier. First, the prior classifier is utilized to generate the prior label and information about distributions of latent variables; and the posterior classifier is trained to augment some latent variables from regularized distributions to improve the performance. Second, we also present the uniform objective function of CFVAE and put forward an optimization method based on the stochastic gradient variational Bayes method to solve the objective model. Third, we analyze the feasibility of CFVAE based on Hoeffding's inequality and Chernoff's bounding method. This analysis indicates that the latent variables augmentation method based on regularized latent variables distributions can generate samples fitting well with the distribution of data such that the proposed method can improve the performance of CNN with insufficient samples. Finally, the experiments manifest that our proposed CFVAE can provide more accurate performance than state-of-the-art methods.

Patent
05 May 2020
TL;DR: In this paper, a method and apparatus for training a character detector based on weak supervision, a character detection system and a computer readable storage medium are provided, where the method includes: inputting coarse-grained annotation information of a to-be-processed object, wherein the coarse-general annotation information including a whole bounding outline of a word, text bar or line of the object to be processed.
Abstract: A method and apparatus for training a character detector based on weak supervision, a character detection system and a computer readable storage medium are provided, wherein the method includes: inputting coarse-grained annotation information of a to-be-processed object, wherein the coarse-grained annotation information including a whole bounding outline of a word, text bar or line of the to-be-processed objected; dividing the whole bounding outline of the coarse-grained annotation information, to obtain a coarse bounding box of a character of the to-be-processed object; obtaining a predicted bounding box of the character of the to-be-processed object through a neural network model from the coarse-grained annotation information; and determining a fine bounding box of the character of the to-be-processed object as character-based annotation of the to-be-processed object, according to the coarse bounding box and the predicted bounding box.

Journal ArticleDOI
TL;DR: In this paper, the authors present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes.
Abstract: Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides rich and complex information about digital scenes. It also plays an important role in improving object detection and determining out-of-context objects. In this work, we present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes. We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO), where in some experiments PASCAL2012 is also used.We also show that iterating the process of applying our contextual models also enhances the detection performance further.

Journal ArticleDOI
TL;DR: A comparison analysis between the proposed deterministic bounding method and the classical least-squares adjustment has been conducted in terms of accuracy and reliability, and a new concept of Minimum Detectable Biases is proposed.
Abstract: Reliable confidence domains for positioning with Global Navigation Satellite System (GNSS) and inconsistency measures for the observations are of great importance for any navigation system, especially for safety critical applications. In this work, deterministic error bounds are introduced in form of intervals to assess remaining observation errors. The intervals can be determined based on expert knowledge or - as in our case - based on a sensitivity analysis of the measurement correction process. Using convex optimization, bounding zones are computed for GPS positioning, which satisfy the geometrical constraints imposed by the observation intervals. The bounding zone is a convex polytope. When exploiting only the navigation geometry, a confidence domain is computed in form of a zonotope. We show that the relative volume between the polytope and the zonotope can be considered as an inconsistency measure. A small polytope volume indicates bad consistency of the observations. In extreme cases, empty sets are obtained which indicates large outliers. We explain how shape and volume of the polytopes are related to the positioning geometry. Furthermore, we propose a new concept of Minimum Detectable Biases. Using the example of the Klobuchar ionospheric model and Saastamoinen tropospheric model, we show how observation intervals can be determined via sensitivity analysis of these correction models for a real measurement campaign. Taking GPS code data from simulations and real experiments, a comparison analysis between the proposed deterministic bounding method and the classical least-squares adjustment has been conducted in terms of accuracy and reliability. It shows that the computed polytopes always enclose the reference trajectory. In case of large outliers, large position deviations persist in the least-squares solution while the polytope algorithm yields empty sets and thus successfully detects the cases with outliers.

Journal ArticleDOI
TL;DR: This study investigates a metric that can be used to quantify the view of a 3-D human model, whose value is maximized at the most favorable camera angle in accordance with subjective assessments done by users and forms a viewpoint optimization problem whose objective function is the sum of the metrics.
Abstract: Answering the question “what is the most preferable view of a three-dimensional (3-D) human model?” is a challenge in computer vision, computer graphics, and cinematography applications because the appearance of a human, for a given pose, relies on the viewpoint of the user. Currently, to the best of the authors’ knowledge, solid research on the most preferable viewing angle for obtaining numerical subjective evaluation scores has not been conducted. In this study, we investigate a metric that can be used to quantify the view of a 3-D human model, whose value is maximized at the most favorable camera angle in accordance with subjective assessments done by users. For an objective assessment in a numerical form, in this study, we define three view selection metrics: the 1) normalized limb length sum; 2) normalized area of a two-dimensional bounding box; and 3) normalized visible area of a 3-D bounding box. Finally, we formulate a viewpoint optimization problem whose objective function is the sum of the metrics. However, the objective function is nonconcave, and the solution set of the constraint is nonconvex. To overcome this difficulty, we employ decomposition and penalty methods. From the simulation results, it is verified that the average of the viewpoint selection error between the ground truth viewpoint and the optimal viewpoint obtained by the proposed algorithm is very close to the lower bound of the viewpoint selection error.

Posted Content
TL;DR: A generic method for bounding the convergence rate of an averaging algorithm running in a multi-agent system with a time-varying network, where the associated stochastic matrices have a time -independent Perron vector is developed.
Abstract: We develop a generic method for bounding the convergence rate of an averaging algorithm running in a multi-agent system with a time-varying network, where the associated stochastic matrices have a time-independent Perron vector. This method provides bounds on convergence rates that unify and refine most of the previously known bounds. They depend on geometric parameters of the dynamic communication graph such as the normalized diameter or the bottleneck measure. As corollaries of these geometric bounds, we show that the convergence rate of the Metropolis algorithm in a system of n agents is less than 1 − 1/4n 2 with any communication graph that may vary in time, but is permanently connected and bidirectional. We prove a similar upper bound for the EqualNeighbor algorithm under the additional assumptions that the number of neighbors of each agent is constant and that the communication graph is not too irregular. Moreover our bounds offer improved convergence rates for several averaging algorithms and specific families of communication graphs. Finally we extend our methodology to a time-varying Perron vector and show how convergence times may dramatically degrade with even limited variations of Perron vectors.

Journal ArticleDOI
Han Hu1, Libin Wang1, Mier Zhang1, Yulin Ding1, Qing Zhu1 
TL;DR: In this article, the problem of regularized arrangement of primitives on building facades to aligned locations and consistent sizes is cast into binary integer programming, which omits the requirements for real value parameters and is more efficient to be solved.
Abstract: . Regularized arrangement of primitives on building facades to aligned locations and consistent sizes is important towards structured reconstruction of urban environment. Mixed integer linear programing was used to solve the problem, however, it is extremely time consuming even for state-of-the-art commercial solvers. Aiming to alleviate this issue, we cast the problem into binary integer programming, which omits the requirements for real value parameters and is more efficient to be solved. Firstly, the bounding boxes of the primitives are detected using the YOLOv3 architecture in real-time. Secondly, the coordinates of the upper left corners and the sizes of the bounding boxes are automatically clustered in a binary integer programming optimization, which jointly considers the geometric fitness, regularity and additional constraints; this step does not require a priori knowledge, such as the number of clusters or pre-defined grammars. Finally, the regularized bounding boxes can be directly used to guide the facade reconstruction in an interactive environment. Experimental evaluations have revealed that the accuracies for the extraction of primitives are above 0.82, which is sufficient for the following 3D reconstruction. The proposed approach only takes about 10% to 20% of the runtime than previous approach and reduces the diversity of the bounding boxes to about 20% to 50%.

Posted Content
TL;DR: In this paper, the authors demonstrate the versatility of the tangle-tree duality theorem for abstract separation systems by using it to prove tree-of-tangles theorems.
Abstract: We demonstrate the versatility of the tangle-tree duality theorem for abstract separation systems by using it to prove tree-of-tangles theorems. This approach allows us to strengthen some of the existing tree-of-tangles theorems by bounding the node degrees in them. We also present a slight strengthening and simplified proof of the duality theorem, which allows us to derive a tree-of-tangles theorem also for tangles of different orders.