Showing papers on "Bounding overwatch published in 2020"

PDF

Open Access

Journal Article•DOI•

Object Detection in Remote Sensing Images Based on Improved Bounding Box Regression and Multi-Level Features Fusion

[...]

Xiaoliang Qian, Lin Sheng, Gong Cheng, Xiwen Yao, Hangli Ren, Wang Wei - Show less +2 more

01 Jan 2020-Remote Sensing

TL;DR: A novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion and incorporated into the existing hierarchical deep network, which can improve the precision of object localization.

...read moreread less

Abstract: The objective of detection in remote sensing images is to determine the location and category of all targets in these images. The anchor based methods are the most prevalent deep learning based methods, and still have some problems that need to be addressed. First, the existing metric (i.e., intersection over union (IoU)) could not measure the distance between two bounding boxes when they are nonoverlapping. Second, the exsiting bounding box regression loss could not directly optimize the metric in the training process. Third, the existing methods which adopt a hierarchical deep network only choose a single level feature layer for the feature extraction of region proposals, meaning they do not take full use of the advantage of multi-level features. To resolve the above problems, a novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion is proposed in this paper. First, a new metric named generalized IoU is applied, which can quantify the distance between two bounding boxes, regardless of whether they are overlapping or not. Second, a novel bounding box regression loss is proposed, which can not only optimize the new metric (i.e., generalized IoU) directly but also overcome the problem that existing bounding box regression loss based on the new metric cannot adaptively change the gradient based on the metric value. Finally, a multi-level features fusion module is proposed and incorporated into the existing hierarchical deep network, which can make full use of the multi-level features for each region proposal. The quantitative comparisons between the proposed method and baseline method on the large scale dataset DIOR demonstrate that incorporating the proposed bounding box regression loss, multi-level features fusion module, and a combination of both into the baseline method can obtain an absolute gain of 0.7%, 1.4%, and 2.2% or so in terms of mAP, respectively. Comparing this with the state-of-the-art methods demonstrates that the proposed method has achieved a state-of-the-art performance. The curves of average precision with different thresholds show that the advantage of the proposed method is more evident when the threshold of generalized IoU (or IoU) is relatively high, which means that the proposed method can improve the precision of object localization. Similar conclusions can be obtained on a NWPU VHR-10 dataset.

...read moreread less

94 citations

Posted Content•

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3.

[...]

Petr Hurtik, Vojtech Molek, Jan Hula, Marek Vajgl, Pavel Vlašánek, Tomas Nejezchleba - Show less +2 more

27 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new version of Y OLO with better performance and extended with instance segmentation called Poly-YOLO, which has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices.

...read moreread less

Abstract: We present a new version of YOLO with better performance and extended with instance segmentation called Poly-YOLO. Poly-YOLO builds on the original ideas of YOLOv3 and removes two of its weaknesses: a large amount of rewritten labels and inefficient distribution of anchors. Poly-YOLO reduces the issues by aggregating features from a light SE-Darknet-53 backbone with a hypercolumn technique, using stairstep upsampling, and produces a single scale output with high resolution. In comparison with YOLOv3, Poly-YOLO has only 60% of its trainable parameters but improves mAP by a relative 40%. We also present Poly-YOLO lite with fewer parameters and a lower output resolution. It has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices. Finally, Poly-YOLO performs instance segmentation using bounding polygons. The network is trained to detect size-independent polygons defined on a polar grid. Vertices of each polygon are being predicted with their confidence, and therefore Poly-YOLO produces polygons with a varying number of vertices.

...read moreread less

71 citations

Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision

[...]

Hoel Kervadec, Jose Dolz, Shanshan Wang, Eric Granger, Ismail Ben Ayed - Show less +1 more

25 Jan 2020

TL;DR: A novel weakly supervised learning segmentation based on several global constraints derived from box annotations is proposed, leveraging a classical tightness prior to a deep learning setting via imposing a set of constraints on the network outputs.

...read moreread less

Abstract: We propose a novel weakly supervised learning segmentation based on several global constraints derived from box annotations. Particularly, we leverage a classical tightness prior to a deep learning setting via imposing a set of constraints on the network outputs. Such a powerful topological prior prevents solutions from excessive shrinking by enforcing any horizontal or vertical line within the bounding box to contain, at least, one pixel of the foreground region. Furthermore, we integrate our deep tightness prior with a global background emptiness constraint, guiding training with information outside the bounding box. We demonstrate experimentally that such a global constraint is much more powerful than standard cross-entropy for the background class. Our optimization problem is challenging as it takes the form of a large set of inequality constraints on the outputs of deep networks. We solve it with sequence of unconstrained losses based on a recent powerful extension of the log-barrier method, which is well-known in the context of interior-point methods. This accommodates standard stochastic gradient descent (SGD) for training deep networks, while avoiding computationally expensive and unstable Lagrangian dual steps and projections. Extensive experiments over two different public data sets and applications (prostate and brain lesions) demonstrate that the synergy between our global tightness and emptiness priors yield very competitive performances, approaching full supervision and outperforming significantly DeepCut. Furthermore, our approach removes the need for computationally expensive proposal generation. Our code is shared anonymously.

...read moreread less

56 citations

Journal Article•DOI•

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

[...]

Caner Sahin¹, Guillermo Garcia-Hernando¹, Juil Sock¹, Tae-Kyun Kim¹•Institutions (1)

Imperial College London¹

01 Apr 2020-Image and Vision Computing

TL;DR: In this article, a comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators, is presented, which mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task.

...read moreread less

52 citations

Posted Content•

Variational Quantum Algorithm for Estimating the Quantum Fisher Information

[...]

Jacob L. Beckey, Marco Cerezo, Akira Sone, Patrick J. Coles

20 Oct 2020-arXiv: Quantum Physics

TL;DR: A variational quantum algorithm called Variational Quantum Fisher Information Estimation (VQFIE) is presented, which estimates lower and upper bounds on the QFI, based on bounding the fidelity, and outputs a range in which the actual QFI lies.

...read moreread less

Abstract: The Quantum Fisher information (QFI) quantifies the ultimate precision of estimating a parameter from a quantum state, and can be regarded as a reliability measure of a quantum system as a quantum sensor. However, estimation of the QFI for a mixed state is in general a computationally demanding task. In this work we present a variational quantum algorithm called Variational Quantum Fisher Information Estimation (VQFIE) to address this task. By estimating lower and upper bounds on the QFI, based on bounding the fidelity, VQFIE outputs a range in which the actual QFI lies. This result can then be used to variationally prepare the state that maximizes the QFI, for the application of quantum sensing. In contrast to previous approaches, VQFIE does not require knowledge of the explicit form of the sensor dynamics. We simulate the algorithm for a magnetometry setup and demonstrate the tightening of our bounds as the state purity increases. For this example, we compare our bounds to literature bounds and show that our bounds are tighter.

...read moreread less

49 citations

Journal Article•DOI•

Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition.

[...]

Andrea Simonelli¹, Samuel Rota Bulò², Lorenzo Porzi², Manuel Lopez Antequera², Peter Kontschieder² - Show less +1 more•Institutions (2)

University of Trento¹, Facebook²

18 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes, demonstrating its ability to generalize for different types of objects.

...read moreread less

Abstract: In this paper we introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes. The proposed disentangling transformation isolates the contribution made by different groups of parameters to a given loss, without changing its nature. This brings two advantages: i) it simplifies the training dynamics in the presence of losses with complex interactions of parameters, and ii) it allows us to avoid the issue of balancing independent regression terms. We further apply this disentangling transformation to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. We also critically review the AP metric used in KITTI3D and resolve a flaw which affected and biased all previously published results on monocular 3D detection. Our improved metric is now used as official KITTI3D metric. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results. We provide additional results on all the classes of KITTI3D as well as nuScenes datasets to further validate the robustness of our method, demonstrating its ability to generalize for different types of objects.

...read moreread less

26 citations

Journal Article•DOI•

Bounding sets of sequential quantum correlations and device-independent randomness certification

[...]

Joseph Bowles¹, Flavio Baccari², Flavio Baccari¹, Alexia Salavrakos¹•Institutions (2)

ICFO – The Institute of Photonic Sciences¹, Max Planck Society²

19 Oct 2020

TL;DR: This work shows how one can robustly certify over 2.3 bits of device-independent local randomness from a two-quibt state using a sequence of measurements, going beyond the theoretical maximum of two bits that can be achieved with non-sequential measurements.

...read moreread less

Abstract: An important problem in quantum information theory is that of bounding sets of correlations that arise from making local measurements on entangled states of arbitrary dimension. Currently, the best-known method to tackle this problem is the NPA hierarchy; an infinite sequence of semidefinite programs that provides increasingly tighter outer approximations to the desired set of correlations. In this work we consider a more general scenario in which one performs sequences of local measurements on an entangled state of arbitrary dimension. We show that a simple adaptation of the original NPA hierarchy provides an analogous hierarchy for this scenario, with comparable resource requirements and convergence properties. We then use the method to tackle some problems in device-independent quantum information. First, we show how one can robustly certify over 2.3 bits of device-independent local randomness from a two-quibt state using a sequence of measurements, going beyond the theoretical maximum of two bits that can be achieved with non-sequential measurements. Finally, we show tight upper bounds to two previously defined tasks in sequential Bell test scenarios.

...read moreread less

26 citations

Journal Article•DOI•

Single-Stage Rotation-Decoupled Detector for Oriented Object

[...]

Bo Zhong, Kai Ao

08 Oct 2020-Remote Sensing

TL;DR: A novel rotation detector is proposed which redesigns the matching strategy between oriented anchors and ground truth boxes, thereby reducing the instability of the angle to the matching process and achieves state-of-the-art detection accuracy with higher efficiency.

...read moreread less

Abstract: Oriented object detection has received extensive attention in recent years, especially for the task of detecting targets in aerial imagery. Traditional detectors locate objects by horizontal bounding boxes (HBBs), which may cause inaccuracies when detecting objects with arbitrary oriented angles, dense distribution and a large aspect ratio. Oriented bounding boxes (OBBs), which add different rotation angles to the horizontal bounding boxes, can better deal with the above problems. New problems arise with the introduction of oriented bounding boxes for rotation detectors, such as an increase in the number of anchors and the sensitivity of the intersection over union (IoU) to changes of angle. To overcome these shortcomings while taking advantage of the oriented bounding boxes, we propose a novel rotation detector which redesigns the matching strategy between oriented anchors and ground truth boxes. The main idea of the new strategy is to decouple the rotating bounding box into a horizontal bounding box during matching, thereby reducing the instability of the angle to the matching process. Extensive experiments on public remote sensing datasets including DOTA, HRSC2016 and UCAS-AOD demonstrate that the proposed approach achieves state-of-the-art detection accuracy with higher efficiency.

...read moreread less

26 citations

Journal Article•DOI•

FisheyeDet: A Self-Study and Contour-Based Object Detector in Fisheye Images

[...]

Li Tangwei¹, Guanjun Tong¹, Hongying Tang¹, Baoqing Li¹, Chen Bo¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 Apr 2020-IEEE Access

TL;DR: The No-prior Fisheye Representation Method and the Distortion Shape Matching strategy is proposed, which invokes the irregular quadrilateral bounding boxes based on the contour of distorted objects as the core of the proposed object detector.

...read moreread less

Abstract: Fisheye Images have attracted increasing attention from the research community due to their large field of view (LFOV). However, the geometric transformations inherent in fisheye cameras result in unknown spatial distortion and large variations in the appearance of objects. And this fact leads to poor performance of the state-of-the-art methods in conventional two-dimensional (2D) images. To address this problem, we propose a self-study and contour-based object detector in fisheye images, named FisheyeDet. The No-prior Fisheye Representation Method is proposed to guarantee that the network adaptively extracts distortion features without prior information such as prespecified lens parameters, special calibration patterns, etc. Furthermore, in order to tightly and robustly localize objects in fisheye images, the Distortion Shape Matching strategy is proposed, which invokes the irregular quadrilateral bounding boxes based on the contour of distorted objects as the core. By combining with the “No-prior Fisheye Representation Method” and “Distortion Shape Matching”, our proposed detector builds an end-to-end network. Finally, due to the lack of public fisheye datasets, we are on the first attempt to create a multi-class fisheye dataset VOC-Fisheye for object detection. Our proposed detector shows favorable generalization ability and achieves 74.87% mAP (mean average precision) on the VOC-Fisheye, outperforming the existing state-of-the-art methods.

...read moreread less

21 citations

Posted Content•

Accurate Bounding-box Regression with Distance-IoU Loss for Visual Tracking.

[...]

Di Yuan, Xiaojun Chang, Zhenyu He¹•Institutions (1)

Harbin Institute of Technology¹

03 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of a target estimation component and a target classification component that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed.

...read moreread less

Abstract: Most existing trackers are based on using a classifier and multi-scale estimation to estimate the target state Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated While trackers adopt a maximum overlap method based on an intersection-over-union (IoU) loss to mitigate this problem, there are defects in the IoU loss itself, that make it impossible to continue to optimize the objective function when a given bounding box is completely contained within/without another bounding box; this makes it very challenging to accurately estimate the target state Accordingly, in this paper, we address the above-mentioned problem by proposing a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of target estimation and target classification The target estimation part is trained to predict the DIoU score between the target ground-truth bounding-box and the estimated bounding-box The DIoU loss can maintain the advantage provided by the IoU loss while minimizing the distance between the center points of two bounding boxes, thereby making the target estimation more accurate Moreover, we introduce a classification part that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed Comprehensive experimental results demonstrate that the proposed method achieves competitive tracking accuracy when compared to state-of-the-art trackers while with a real-time tracking speed

...read moreread less

21 citations

Proceedings Article•

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective

[...]

Zixuan Xu¹, Banghuai Li¹, Ye Yuan², Anhong Dang•Institutions (2)

Peking University¹, Beihang University²

01 Jan 2020

TL;DR: A novel representation based on 2D beta distribution, named Beta Representation, is proposed, which is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS.

...read moreread less

Abstract: Recently significant progress has been made in pedestrian detection, but it remains challenging to achieve high performance in occluded and crowded scenes. It could be attributed mostly to the widely used representation of pedestrians, i.e., 2D axis-aligned bounding box, which just describes the approximate location and size of the object. Bounding box models the object as a uniform distribution within the boundary, making pedestrians indistinguishable in occluded and crowded scenes due to much noise. To eliminate the problem, we propose a novel representation based on 2D beta distribution, named Beta Representation. It pictures a pedestrian by explicitly constructing the relationship between full-body and visible boxes, and emphasizes the center of visual mass by assigning different probability values to pixels. As a result, Beta Representation is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS. What’s more, to fully exploit Beta Representation, a novel pipeline Beta R-CNN equipped with BetaHead and BetaMask is proposed, leading to high detection performance in occluded and crowded scenes. Code will be released at github.com/Guardian44x/Beta-R-CNN.

...read moreread less

Posted Content•

Localization Uncertainty Estimation for Anchor-Free Object Detection.

[...]

Youngwan Lee, Joong-won Hwang, Hyung-Il Kim, Kimin Yun, Jongyoul Park - Show less +1 more

28 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new object detector called Gaussian-FCOS is proposed that estimates the localization uncertainty based on an anchor-free detector that captures the uncertainty of similar property with four directions of box offsets and avoids the anchor tuning.

...read moreread less

Abstract: Since many safety-critical systems, such as surgical robots and autonomous driving cars, are in unstable environments with sensor noise and incomplete data, it is desirable for object detectors to take into account the confidence of localization prediction There are three limitations of the prior uncertainty estimation methods for anchor-based object detection 1) They model the uncertainty based on object properties having different characteristics, such as location (center point) and scale (width, height) 2) they model a box offset and ground-truth as Gaussian distribution and Dirac delta distribution, which leads to the model misspecification problem Because the Dirac delta distribution is not exactly represented as Gaussian, ie, for any $\mu$ and $\Sigma$ 3) Since anchor-based methods are sensitive to hyper-parameters of anchor, the localization uncertainty modeling is also sensitive to these parameters Therefore, we propose a new localization uncertainty estimation method called Gaussian-FCOS for anchor-free object detection Our method captures the uncertainty based on four directions of box offsets~(left, right, top, bottom) that have similar properties, which enables to capture which direction is uncertain and provide a quantitative value in range~[0, 1] To this end, we design a new uncertainty loss, negative power log-likelihood loss, to measure uncertainty by weighting IoU to the likelihood loss, which alleviates the model misspecification problem Experiments on COCO datasets demonstrate that our Gaussian-FCOS reduces false positives and finds more missing-objects by mitigating over-confidence scores with the estimated uncertainty We hope Gaussian-FCOS serves as a crucial component for the reliability-required task

...read moreread less

Journal Article•DOI•

Detection of 3D Bounding Boxes of Vehicles Using Perspective Transformation for Accurate Speed Measurement

[...]

Viktor Kocur¹, Milan Ftáčnik¹•Institutions (1)

Comenius University in Bratislava¹

29 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: An improved version of the perspective transformation which is more robust and fully automatic and an extended experimental evaluation of speed estimation is presented.

...read moreread less

Abstract: Detection and tracking of vehicles captured by traffic surveillance cameras is a key component of intelligent transportation systems. We present an improved version of our algorithm for detection of 3D bounding boxes of vehicles, their tracking and subsequent speed estimation. Our algorithm utilizes the known geometry of vanishing points in the surveilled scene to construct a perspective transformation. The transformation enables an intuitive simplification of the problem of detecting 3D bounding boxes to detection of 2D bounding boxes with one additional parameter using a standard 2D object detector. Main contribution of this paper is an improved construction of the perspective transformation which is more robust and fully automatic and an extended experimental evaluation of speed estimation. We test our algorithm on the speed estimation task of the BrnoCompSpeed dataset. We evaluate our approach with different configurations to gauge the relationship between accuracy and computational costs and benefits of 3D bounding box detection over 2D detection. All of the tested configurations run in real-time and are fully automatic. Compared to other published state-of-the-art fully automatic results our algorithm reduces the mean absolute speed measurement error by 32% (1.10 km/h to 0.75 km/h) and the absolute median error by 40% (0.97 km/h to 0.58 km/h).

...read moreread less

Proceedings Article•

Corruption-Tolerant Gaussian Process Bandit Optimization

[...]

Ilija Bogunovic¹, Andreas Krause¹, Jonathan Scarlett²•Institutions (2)

ETH Zurich¹, National University of Singapore²

03 Jun 2020

TL;DR: It is observed that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not.

...read moreread less

Abstract: We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled "fast" (but non-robust) and "slow" (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not

...read moreread less

Posted Content•

An energy-based error bound of physics-informed neural network solutions in elasticity

[...]

Mengwu Guo¹, Ehsan Haghighat²•Institutions (2)

University of Texas at Austin¹, Massachusetts Institute of Technology²

18 Oct 2020-arXiv: Numerical Analysis

TL;DR: An energy-based a posteriori error bound is proposed for the physics-informed neural network solutions of elasticity problems that provides an upper bound of the global error of neural network discretization.

...read moreread less

Abstract: An energy-based a posteriori error bound is proposed for the physics-informed neural network solutions of elasticity problems. An admissible displacement-stress solution pair is obtained from a mixed form of physics-informed neural networks, and the proposed error bound is formulated as the constitutive relation error defined by the solution pair. Such an error estimator provides an upper bound of the global error of neural network discretization. The bounding property, as well as the asymptotic behavior of the physics-informed neural network solutions, are studied in a demonstrating example.

...read moreread less

Journal Article•DOI•

Syncretic-NMS: A Merging Non-Maximum Suppression Algorithm for Instance Segmentation

[...]

Jun Chu¹, Yiqing Zhang¹, Shaoming Li¹, Lu Leng¹, Jun Miao¹ - Show less +1 more•Institutions (1)

Nanchang Hangkong University¹

22 Jun 2020-IEEE Access

TL;DR: Experimental results on the MS COCO dataset demonstrate that Syncretic-NMS can steadily increase the accuracy of instance segmentation, while experimentalresults on the Cityscapes dataset prove that the algorithm can adapt to application scenario changes.

...read moreread less

Abstract: Instance segmentation is typically based on an object detection framework Semantic segmentation is conducted on the bounding boxes that are returned by detectors NMS (non-maximum suppression) is a common post-processing operation in instance segmentation and object detection tasks It is typically used after bounding box regression to eliminate redundant bounding boxes The evaluation criteria for object detection require that the bounding box be as close as possible to the ground truth, but they do not emphasize the integrity of the included object However, sometimes the bounding boxes cannot contain the complete objects, and the parts beyond the bounding boxes cannot be correctly predicted in the subsequent semantic segmentation To solve this problem, we propose the Syncretic-NMS algorithm The algorithm takes traditional NMS as the first step and processes the bounding boxes obtained by traditional NMS, judges the neighboring bounding boxes of each bounding box, and combines the neighboring boxes that are strongly correlated with the corresponding bounding boxes The coordinates of the merged box are the four coordinate extremes of the bounding box and the highly relevant neighboring box The neighboring box with strong correlation is merged with the corresponding bounding box Based on an analysis of the influences of corresponding factors, the criteria for correlation judgment are specified Experimental results on the MS COCO dataset demonstrate that Syncretic-NMS can steadily increase the accuracy of instance segmentation, while experimental results on the Cityscapes dataset prove that the algorithm can adapt to application scenario changes The computational complexity of Syncretic-NMS is the same as that of traditional NMS Syncretic-NMS is easy to implement, requires no additional training, and can be easily integrated into the available instance segmentation framework

...read moreread less

Journal Article•DOI•

Visual Grounding via Accumulated Attention.

[...]

Chaorui Deng¹, Qi Wu², Qingyao Wu¹, Fan Lyu³, Fuyuan Hu⁴, Mingkui Tan¹ - Show less +2 more•Institutions (4)

South China University of Technology¹, University of Adelaide², Tianjin University³, Suzhou University of Science and Technology⁴

21 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An accumulated attention (A-ATT) mechanism to reason among all the attention modules jointly is proposed to reduce internal redundancies in visual grounding and is evaluated on four popular datasets.

...read moreread less

Abstract: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. In real-world VG applications, however, we usually have to deal with ambiguous queries and images with complicated scene structures. Identifying the target based on highly redundant and correlated information can be very challenging, leading to unsatisfactory performance. To tackle this, in this paper, we exploit an attention module for each kind of information to reduce the internal redundancies. We then propose the Accumulated Attention mechanism to reason among all the attention modules jointly, thus the correlations among different kinds of information can be explicitly captured. Moreover, to improve the performance and robustness of our VG models, we introduce some noises into the training procedure to bridge the distribution gap between the human-labeled training data and the real-world poor quality data. With this ``noised'' training strategy, we further learn a bounding box regressor, which can be used to refine the bounding box of the target object. We evaluate the proposed methods on four benchmark datasets. The experimental results show that our methods significantly outperform all previous works on every dataset in terms of both speed and accuracy.

...read moreread less

Posted Content•

Oracle lower bounds for stochastic gradient sampling algorithms.

[...]

Niladri S. Chatterji¹, Peter L. Bartlett¹, Philip M. Long²•Institutions (2)

University of California, Berkeley¹, Google²

01 Feb 2020-arXiv: Machine Learning

TL;DR: The results provide the first nontrivial dimension-dependent lower bound for this problem, and establish an information theoretic limit for several popular sampling algorithms that operate by using stochastic gradients of the log density to generate a sample.

...read moreread less

Abstract: We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms. We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $\Omega(\sigma^2 d/\varepsilon^2)$, where $\sigma^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.

...read moreread less

Journal Article•DOI•

On three methods for bounding the rate of convergence for some continuous–time Markov chains

[...]

Alexander Zeifman, Yacov Satin, Anastasia Kryukova, Rostislav Razumchik, Ksenia Kiseleva, Galina Shilova - Show less +2 more

01 Jun 2020-International Journal of Applied Mathematics and Computer Science

TL;DR: In this article, three different analytical methods for the computation of upper bounds for the rate of convergence to the limiting regime of one specific class of (in)homogeneous continuous-time Markov chains are considered.

...read moreread less

Abstract: Abstract Consideration is given to three different analytical methods for the computation of upper bounds for the rate of convergence to the limiting regime of one specific class of (in)homogeneous continuous-time Markov chains. This class is particularly well suited to describe evolutions of the total number of customers in (in)homogeneous M/M/S queueing systems with possibly state-dependent arrival and service intensities, batch arrivals and services. One of the methods is based on the logarithmic norm of a linear operator function; the other two rely on Lyapunov functions and differential inequalities, respectively. Less restrictive conditions (compared with those known from the literature) under which the methods are applicable are being formulated. Two numerical examples are given. It is also shown that, for homogeneous birth-death Markov processes defined on a finite state space with all transition rates being positive, all methods yield the same sharp upper bound.

...read moreread less

Posted Content•

Age of Information for Single Buffer Systems with Vacation Server.

[...]

Jin Xu, I-Hong Hou, Natarajan Gautam¹•Institutions (1)

Texas A&M University¹

24 Apr 2020-arXiv: Performance

TL;DR: This research studies the information freshness in M/G/1 queueing system with a single buffer and the server taking multiple vacations and derives closed-form expressions of informationfreshness metrics such as the expected Age of Information (AoI), the expected Peak Age of information (PAoI) and the variance of peak age under each policy.

...read moreread less

Abstract: In this research, we consider age-related metrics for queueing systems with vacation server. Assuming that there is a single buffer at the queue to receive packets, we consider three variations of this single buffer system, namely Conventional Buffer System (CBS), Buffer Relaxation System (BRS), and Conventional Buffer System with Preemption in Service (CBS-P). We introduce a decomposition approach to derive the closed-form expressions for expected Age of Information (AoI), expected Peak Age of Information (PAoI) as well as the variance of peak age for these systems. We then consider these three systems with non-independent vacations, and use polling system as an example to show that the decomposition approach can be applied to derive closed-form expressions of PAoI for general situation. We explore the conditions under which one of these systems has advantage over the others, and we further perform numerical studies to validate our results and develop insights.

...read moreread less

Proceedings Article•

Smoothly Bounding User Contributions in Differential Privacy

[...]

Alessandro Epasto¹, Mohammad Mahdian¹, Jieming Mao², Vahab Mirrokni¹, Lijie Ren - Show less +1 more•Institutions (2)

Google¹, University of Pennsylvania²

01 Jan 2020

TL;DR: This work proposes a method which smoothly bounds user contributions by setting appropriate weights on data points and applies it to estimating the mean/quantiles, linear regression, and empirical risk minimization and shows that the algorithm provably outperforms the sample limiting algorithm.

...read moreread less

Abstract: A differentially private algorithm guarantees that the input of a single user won’t significantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in order to keep the algorithm differentially private and this might hurt the algorithm’s performance. [AKMV19] initiates the study on bounding user contributions and proposes a very natural algorithm which limits the number of samples each user can contribute by a threshold. For a better trade-off between utility and privacy guarantee, we propose a method which smoothly bounds user contributions by setting appropriate weights on data points and apply it to estimating the mean/quantiles, linear regression, and empirical risk minimization. We show that our algorithm provably outperforms the sample limiting algorithm. We conclude with experimental evaluations which validate our theoretical results.

...read moreread less

Journal Article•DOI•

Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected Buildings

[...]

Kun Zhao¹, Yongkun Liu¹, Siyuan Hao¹, Shaoxing Lu¹, Hongbin Liu, Lijian Zhou¹ - Show less +2 more•Institutions (1)

Qingdao University¹

03 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel approach based on a “bottom-up and top-down” framework that achieves a 12.65% performance improvement on macroprecision and 12% on macrorecall over image-level CNN-based models.

...read moreread less

Abstract: Street view images classification aiming at urban land use analysis is difficult because the class labels (e.g., commercial area), are concepts with higher abstract level compared to the ones of general visual tasks (e.g., persons and cars). Therefore, classification models using only visual features often fail to achieve satisfactory performance. In this paper, a novel approach based on a "Detector-Encoder-Classifier" framework is proposed. Instead of using visual features of the whole image directly as common image-level models based on convolutional neural networks (CNNs) do, the proposed framework firstly obtains the bounding boxes of buildings in street view images from a detector. Their contextual information such as the co-occurrence patterns of building classes and their layout are then encoded into metadata by the proposed algorithm "CODING" (Context encOding of Detected buildINGs). Finally, these bounding box metadata are classified by a recurrent neural network (RNN). In addition, we made a dual-labeled dataset named "BEAUTY" (Building dEtection And Urban funcTional-zone portraYing) of 19,070 street view images and 38,857 buildings based on the existing BIC GSV [1]. The dataset can be used not only for street view image classification, but also for multi-class building detection. Experiments on "BEAUTY" show that the proposed approach achieves a 12.65% performance improvement on macro-precision and 12% on macro-recall over image-level CNN based models. Our code and dataset are available at this https URL

...read moreread less

Journal Article•DOI•

An Efficient Image Categorization Method With Insufficient Training Samples.

[...]

Luyue Lin¹, Bo Liu¹, Xin Zheng¹, Yanshan Xiao¹•Institutions (1)

Guangdong University of Technology¹

11 Aug 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An analysis indicates that the latent variables augmentation method based on regularized latent variables distributions can generate samples fitting well with the distribution of data such that the proposed method can improve the performance of CNN with insufficient samples.

...read moreread less

Abstract: Image classification is an important part of pattern recognition. With the development of convolutional neural networks (CNNs), many CNN methods are proposed, which have a large number of samples for training, which can have high performance. However, there may exist limited samples in some real-world applications. In order to improve the performance of CNN learning with insufficient samples, this article proposes a new method called the classifier method based on a variational autoencoder (CFVAE), which is comprised of two parts: 1) a standard CNN as a prior classifier and 2) a CNN based on variational autoencoder (VAE) as a posterior classifier. First, the prior classifier is utilized to generate the prior label and information about distributions of latent variables; and the posterior classifier is trained to augment some latent variables from regularized distributions to improve the performance. Second, we also present the uniform objective function of CFVAE and put forward an optimization method based on the stochastic gradient variational Bayes method to solve the objective model. Third, we analyze the feasibility of CFVAE based on Hoeffding's inequality and Chernoff's bounding method. This analysis indicates that the latent variables augmentation method based on regularized latent variables distributions can generate samples fitting well with the distribution of data such that the proposed method can improve the performance of CNN with insufficient samples. Finally, the experiments manifest that our proposed CFVAE can provide more accurate performance than state-of-the-art methods.

...read moreread less

Patent•

Method and apparatus for training a character detector based on weak supervision, system and medium

[...]

Chengquan Zhang¹, Jiaming Liu¹, Han Junyu, Errui Ding•Institutions (1)

Baidu¹

05 May 2020

TL;DR: In this paper, a method and apparatus for training a character detector based on weak supervision, a character detection system and a computer readable storage medium are provided, where the method includes: inputting coarse-grained annotation information of a to-be-processed object, wherein the coarse-general annotation information including a whole bounding outline of a word, text bar or line of the object to be processed.

...read moreread less

Abstract: A method and apparatus for training a character detector based on weak supervision, a character detection system and a computer readable storage medium are provided, wherein the method includes: inputting coarse-grained annotation information of a to-be-processed object, wherein the coarse-grained annotation information including a whole bounding outline of a word, text bar or line of the to-be-processed objected; dividing the whole bounding outline of the coarse-grained annotation information, to obtain a coarse bounding box of a character of the to-be-processed object; obtaining a predicted bounding box of the character of the to-be-processed object through a neural network model from the coarse-grained annotation information; and determining a fine bounding box of the character of the to-be-processed object as character-based annotation of the to-be-processed object, according to the coarse bounding box and the predicted bounding box.

...read moreread less

Journal Article•DOI•

Improving object detection performance using scene contextual constraints

[...]

Faisal Alamri¹, Nicolas Pugeault²•Institutions (2)

University of Exeter¹, The Turing Institute²

09 Jul 2020-IEEE Transactions on Cognitive and Developmental Systems

TL;DR: In this paper, the authors present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes.

...read moreread less

Abstract: Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides rich and complex information about digital scenes. It also plays an important role in improving object detection and determining out-of-context objects. In this work, we present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes. We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO), where in some experiments PASCAL2012 is also used.We also show that iterating the process of applying our contextual models also enhances the detection performance further.

...read moreread less

Journal Article•DOI•

Reliable Bounding Zones and Inconsistency Measures for GPS Positioning using Geometrical Constraints

[...]

Hani Dbouk¹, Steffen Schön¹•Institutions (1)

Leibniz University of Hanover¹

19 Mar 2020-Acta Cybernetica

TL;DR: A comparison analysis between the proposed deterministic bounding method and the classical least-squares adjustment has been conducted in terms of accuracy and reliability, and a new concept of Minimum Detectable Biases is proposed.

...read moreread less

Abstract: Reliable confidence domains for positioning with Global Navigation Satellite System (GNSS) and inconsistency measures for the observations are of great importance for any navigation system, especially for safety critical applications. In this work, deterministic error bounds are introduced in form of intervals to assess remaining observation errors. The intervals can be determined based on expert knowledge or - as in our case - based on a sensitivity analysis of the measurement correction process. Using convex optimization, bounding zones are computed for GPS positioning, which satisfy the geometrical constraints imposed by the observation intervals. The bounding zone is a convex polytope. When exploiting only the navigation geometry, a confidence domain is computed in form of a zonotope. We show that the relative volume between the polytope and the zonotope can be considered as an inconsistency measure. A small polytope volume indicates bad consistency of the observations. In extreme cases, empty sets are obtained which indicates large outliers. We explain how shape and volume of the polytopes are related to the positioning geometry. Furthermore, we propose a new concept of Minimum Detectable Biases. Using the example of the Klobuchar ionospheric model and Saastamoinen tropospheric model, we show how observation intervals can be determined via sensitivity analysis of these correction models for a real measurement campaign. Taking GPS code data from simulations and real experiments, a comparison analysis between the proposed deterministic bounding method and the classical least-squares adjustment has been conducted in terms of accuracy and reliability. It shows that the computed polytopes always enclose the reference trajectory. In case of large outliers, large position deviations persist in the least-squares solution while the polytope algorithm yields empty sets and thus successfully detects the cases with outliers.

...read moreread less

Journal Article•DOI•

Optimal Camera Point Selection Toward the Most Preferable View of 3-D Human Pose

[...]

Beom Kwon, Jungwoo Huh, Kyoungoh Lee, Sanghoon Lee

01 Jan 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This study investigates a metric that can be used to quantify the view of a 3-D human model, whose value is maximized at the most favorable camera angle in accordance with subjective assessments done by users and forms a viewpoint optimization problem whose objective function is the sum of the metrics.

...read moreread less

Abstract: Answering the question “what is the most preferable view of a three-dimensional (3-D) human model?” is a challenge in computer vision, computer graphics, and cinematography applications because the appearance of a human, for a given pose, relies on the viewpoint of the user. Currently, to the best of the authors’ knowledge, solid research on the most preferable viewing angle for obtaining numerical subjective evaluation scores has not been conducted. In this study, we investigate a metric that can be used to quantify the view of a 3-D human model, whose value is maximized at the most favorable camera angle in accordance with subjective assessments done by users. For an objective assessment in a numerical form, in this study, we define three view selection metrics: the 1) normalized limb length sum; 2) normalized area of a two-dimensional bounding box; and 3) normalized visible area of a 3-D bounding box. Finally, we formulate a viewpoint optimization problem whose objective function is the sum of the metrics. However, the objective function is nonconcave, and the solution set of the constraint is nonconvex. To overcome this difficulty, we employ decomposition and penalty methods. From the simulation results, it is verified that the average of the viewpoint selection error between the ground truth viewpoint and the optimal viewpoint obtained by the proposed algorithm is very close to the lower bound of the viewpoint selection error.

...read moreread less

Posted Content•

Geometric Bounds for Convergence Rates of Averaging Algorithms

[...]

Bernadette Charron-Bost¹•Institutions (1)

École Polytechnique¹

07 Dec 2020-arXiv: Multiagent Systems

TL;DR: A generic method for bounding the convergence rate of an averaging algorithm running in a multi-agent system with a time-varying network, where the associated stochastic matrices have a time -independent Perron vector is developed.

...read moreread less

Abstract: We develop a generic method for bounding the convergence rate of an averaging algorithm running in a multi-agent system with a time-varying network, where the associated stochastic matrices have a time-independent Perron vector. This method provides bounds on convergence rates that unify and refine most of the previously known bounds. They depend on geometric parameters of the dynamic communication graph such as the normalized diameter or the bottleneck measure. As corollaries of these geometric bounds, we show that the convergence rate of the Metropolis algorithm in a system of n agents is less than 1 − 1/4n 2 with any communication graph that may vary in time, but is permanently connected and bidirectional. We prove a similar upper bound for the EqualNeighbor algorithm under the additional assumptions that the number of neighbors of each agent is constant and that the communication graph is not too irregular. Moreover our bounds offer improved convergence rates for several averaging algorithms and specific families of communication graphs. Finally we extend our methodology to a time-varying Perron vector and show how convergence times may dramatically degrade with even limited variations of Perron vectors.

...read moreread less

Journal Article•DOI•

Fast and regularized reconstruction of building façades from street-view images using binary integer programming

[...]

Han Hu¹, Libin Wang¹, Mier Zhang¹, Yulin Ding¹, Qing Zhu¹ - Show less +1 more•Institutions (1)

Southwest Jiaotong University¹

03 Aug 2020-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

TL;DR: In this article, the problem of regularized arrangement of primitives on building facades to aligned locations and consistent sizes is cast into binary integer programming, which omits the requirements for real value parameters and is more efficient to be solved.

...read moreread less

Abstract: . Regularized arrangement of primitives on building facades to aligned locations and consistent sizes is important towards structured reconstruction of urban environment. Mixed integer linear programing was used to solve the problem, however, it is extremely time consuming even for state-of-the-art commercial solvers. Aiming to alleviate this issue, we cast the problem into binary integer programming, which omits the requirements for real value parameters and is more efficient to be solved. Firstly, the bounding boxes of the primitives are detected using the YOLOv3 architecture in real-time. Secondly, the coordinates of the upper left corners and the sizes of the bounding boxes are automatically clustered in a binary integer programming optimization, which jointly considers the geometric fitness, regularity and additional constraints; this step does not require a priori knowledge, such as the number of clusters or pre-defined grammars. Finally, the regularized bounding boxes can be directly used to guide the facade reconstruction in an interactive environment. Experimental evaluations have revealed that the accuracies for the extraction of primitives are above 0.82, which is sufficient for the following 3D reconstruction. The proposed approach only takes about 10% to 20% of the runtime than previous approach and reduces the diversity of the bounding boxes to about 20% to 50%.

...read moreread less

Posted Content•

Obtaining trees of tangles from tangle-tree duality

[...]

Christian Elbracht, Jakob Kneip, Maximilian Teegen

19 Nov 2020-arXiv: Combinatorics

TL;DR: In this paper, the authors demonstrate the versatility of the tangle-tree duality theorem for abstract separation systems by using it to prove tree-of-tangles theorems.

...read moreread less

Abstract: We demonstrate the versatility of the tangle-tree duality theorem for abstract separation systems by using it to prove tree-of-tangles theorems. This approach allows us to strengthen some of the existing tree-of-tangles theorems by bounding the node degrees in them. We also present a slight strengthening and simplified proof of the duality theorem, which allows us to derive a tree-of-tangles theorem also for tangles of different orders.

...read moreread less