scispace - formally typeset
Search or ask a question

Showing papers on "Distance transform published in 2019"


Journal ArticleDOI
TL;DR: A new method to automatically segment nuclei from Haematoxylin and Eosin stained histopathology data with fully convolutional networks is described and superior performance is demonstrated as compared to other approaches using Convolutional Neural Networks.
Abstract: The advent of digital pathology provides us with the challenging opportunity to automatically analyze whole slides of diseased tissue in order to derive quantitative profiles that can be used for diagnosis and prognosis tasks. In particular, for the development of interpretable models, the detection and segmentation of cell nuclei is of the utmost importance. In this paper, we describe a new method to automatically segment nuclei from Haematoxylin and Eosin (H&E) stained histopathology data with fully convolutional networks. In particular, we address the problem of segmenting touching nuclei by formulating the segmentation problem as a regression task of the distance map. We demonstrate superior performance of this approach as compared to other approaches using Convolutional Neural Networks.

338 citations


Posted Content
TL;DR: In this article, the authors proposed three loss functions to estimate the Hausdorff distance (HD) from the segmentation probability map produced by a CNN and used these loss functions for training CNNs for segmentation of the prostate, liver and pancreas in ultrasound, magnetic resonance, and computed tomography images.
Abstract: The Hausdorff Distance (HD) is widely used in evaluating medical image segmentation methods. However, existing segmentation methods do not attempt to reduce HD directly. In this paper, we present novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly. We propose three methods to estimate HD from the segmentation probability map produced by a CNN. One method makes use of the distance transform of the segmentation boundary. Another method is based on applying morphological erosion on the difference between the true and estimated segmentation maps. The third method works by applying circular/spherical convolution kernels of different radii on the segmentation probability maps. Based on these three methods for estimating HD, we suggest three loss functions that can be used for training to reduce HD. We use these loss functions to train CNNs for segmentation of the prostate, liver, and pancreas in ultrasound, magnetic resonance, and computed tomography images and compare the results with commonly-used loss functions. Our results show that the proposed loss functions can lead to approximately 18-45 % reduction in HD without degrading other segmentation performance criteria such as the Dice similarity coefficient. The proposed loss functions can be used for training medical image segmentation methods in order to reduce the large segmentation errors.

165 citations


Proceedings ArticleDOI
01 Jul 2019
TL;DR: This work proposes a novel architecture called Psi-Net with a single encoder and three parallel decoders to facilitate joint training of three tasks, and proposes a new joint loss function which consists of a weighted combination of Negative Log Likelihood and Mean Square Error loss.
Abstract: Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net has been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use of parallel decoders which along with performing the mask predictions also perform contour prediction and distance map estimation. The contour and distance map aid in ensuring smoothness in the segmentation predictions. To facilitate joint training of three tasks, we propose a novel architecture called Psi-Net with a single encoder and three parallel decoders (thus having a shape of Ψ), one decoder to learn the segmentation mask prediction and other two decoders to learn the auxiliary tasks of contour detection and distance map estimation. The learning of these auxiliary tasks helps in capturing the shape and the boundary information. We also propose a new joint loss function for the proposed architecture. The loss function consists of a weighted combination of Negative Log Likelihood and Mean Square Error loss. We have used two publicly available datasets: 1) Origa dataset for the task of optic cup and disc segmentation and 2) Endovis segment dataset for the task of polyp segmentation to evaluate our model. We have conducted extensive experiments using our network to show our model gives better results in terms of segmentation, boundary and shape metrics.

97 citations


Journal ArticleDOI
TL;DR: Several cardiac indices that often serve as diagnostic biomarkers, specifically blood pool volume, myocardial mass, and ejection fraction, computed using the proposed regularization method are better correlated with the indices computed from the reference, ground truth segmentation.
Abstract: PURPOSE Cardiac image segmentation is a critical process for generating personalized models of the heart and for quantifying cardiac performance parameters. Fully automatic segmentation of the left ventricle (LV), the right ventricle (RV), and the myocardium from cardiac cine MR images is challenging due to variability of the normal and abnormal anatomy, as well as the imaging protocols. This study proposes a multi-task learning (MTL)-based regularization of a convolutional neural network (CNN) to obtain accurate segmenation of the cardiac structures from cine MR images. METHODS We train a CNN network to perform the main task of semantic segmentation, along with the simultaneous, auxiliary task of pixel-wise distance map regression. The network also predicts uncertainties associated with both tasks, such that their losses are weighted by the inverse of their corresponding uncertainties. As a result, during training, the task featuring a higher uncertainty is weighted less and vice versa. The proposed distance map regularizer is a decoder network added to the bottleneck layer of an existing CNN architecture, facilitating the network to learn robust global features. The regularizer block is removed after training, so that the original number of network parameters does not change. The trained network outputs per-pixel segmentation when a new patient cine MR image is provided as an input. RESULTS We show that the proposed regularization method improves both binary and multi-class segmentation performance over the corresponding state-of-the-art CNN architectures. The evaluation was conducted on two publicly available cardiac cine MRI datasets, yielding average Dice coefficients of 0.84 ± 0.03 and 0.91 ± 0.04. We also demonstrate improved generalization performance of the distance map regularized network on cross-dataset segmentation, showing as much as 42% improvement in myocardium Dice coefficient from 0.56 ± 0.28 to 0.80 ± 0.14. CONCLUSIONS We have presented a method for accurate segmentation of cardiac structures from cine MR images. Our experiments verify that the proposed method exceeds the segmentation performance of three existing state-of-the-art methods. Furthermore, several cardiac indices that often serve as diagnostic biomarkers, specifically blood pool volume, myocardial mass, and ejection fraction, computed using our method are better correlated with the indices computed from the reference, ground truth segmentation. Hence, the proposed method has the potential to become a non-invasive screening and diagnostic tool for the clinical assessment of various cardiac conditions, as well as a reliable aid for generating patient specific models of the cardiac anatomy for therapy planning, simulation, and guidance.

59 citations



Book ChapterDOI
13 Oct 2019
TL;DR: In this paper, the authors propose complementary-task learning to enforce shape-prior leveraging the existing target labels for multi-organ segmentation in whole-body computed tomography (CT).
Abstract: Multi-organ segmentation in whole-body computed tomography (CT) is a constant pre-processing step which finds its application in organ-specific image retrieval, radiotherapy planning, and interventional image analysis. We address this problem from an organ-specific shape-prior learning perspective. We introduce the idea of complementary-task learning to enforce shape-prior leveraging the existing target labels. We propose two complementary-tasks namely (i) distance map regression and (ii) contour map detection to explicitly encode the geometric properties of each organ. We evaluate the proposed solution on the public VISCERAL dataset containing CT scans of multiple organs. We report a significant improvement of overall dice score from 0.8849 to 0.9018 due to the incorporation of complementary-task learning.

49 citations


Book ChapterDOI
24 Dec 2019
TL;DR: Dense RepPoints is shown to represent and learn object segments well, with the use of a novel distance transform sampling method combined with set-to-set supervision, leading to performance that surpasses counterparts based on contours or grids.
Abstract: We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level. Techniques are proposed to efficiently process these dense points, maintaining near-constant complexity with increasing point numbers. Dense RepPoints is shown to represent and learn object segments well, with the use of a novel distance transform sampling method combined with set-to-set supervision. The distance transform sampling combines the strengths of contour and grid representations, leading to performance that surpasses counterparts based on contours or grids. Code is available at https://github.com/justimyhxu/Dense-RepPoints.

40 citations


17 Apr 2019
TL;DR: A novel distance map derived loss penalty term for semantic segmentation is proposed, using distance maps, derived from ground truth masks, to create a penalty term, guiding the network's focus towards hard-to-segment boundary regions.
Abstract: Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance maps, derived from ground truth masks, to create a penalty term, guiding the network's focus towards hard-to-segment boundary regions. We investigate the effects of this penalizing factor against cross-entropy, Dice, and focal loss, among others, evaluating performance on a 3D MRI bone segmentation task from the publicly available Osteoarthritis Initiative dataset. We observe a significant improvement in the quality of segmentation, with better shape preservation at bone boundaries and areas affected by partial volume. We ultimately aim to use our loss penalty term to improve the extraction of shape biomarkers and derive metrics to quantitatively evaluate the preservation of shape.

38 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed an improved watershed analysis, which defines a dimensionless parameter α to discriminate contacts and necks, which can effectively segment 3D particle contacts while preserving the continuity of necks for different granular soils.

34 citations


Journal ArticleDOI
TL;DR: This work introduces a new semantic segmentation regularization based on the regression of a distance transform, which requires almost no modification of the network structure and adds a very low overhead to the training process.

31 citations


Posted Content
TL;DR: In this paper, distance maps derived from ground truth masks are used to create a penalty term, guiding the network's focus towards hard-to-segment boundary regions, which shows a significant improvement in the quality of segmentation.
Abstract: Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance maps, derived from ground truth masks, to create a penalty term, guiding the network's focus towards hard-to-segment boundary regions. We investigate the effects of this penalizing factor against cross-entropy, Dice, and focal loss, among others, evaluating performance on a 3D MRI bone segmentation task from the publicly available Osteoarthritis Initiative dataset. We observe a significant improvement in the quality of segmentation, with better shape preservation at bone boundaries and areas affected by partial volume. We ultimately aim to use our loss penalty term to improve the extraction of shape biomarkers and derive metrics to quantitatively evaluate the preservation of shape.

Posted ContentDOI
Sheng Chen1, Zhe Sun1, Yutong Lu1, Huiying Zhao1, Yuedong Yang1 
01 Aug 2019-bioRxiv
TL;DR: This study represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame, which is the first method to employ 2D distance map for predicting protein properties.
Abstract: Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.

Journal ArticleDOI
TL;DR: Three experimental results show that the proposed method improves the accuracy of the centerline and solves the problem of broken centerline, and that the method reconstructing the roads is excellent for maintain their integrity.
Abstract: Traditional road extraction algorithms, which focus on improving the accuracy of road surfaces, cannot overcome the interference of shelter caused by vegetation, buildings, and shadows. In this paper, we extract the roads via road centerline extraction, road width extraction, broken centerline connection, and road reconstruction. We use a multiscale segmentation algorithm to segment the images, and feature extraction to get the initial road. The fast marching method (FMM) algorithm is employed to obtain the boundary distance field and the source distance field, and the branch backing-tracking method is used to acquire the initial centerline. Road width of each initial centerline is calculated by combining the boundary distance fields, before a tensor field is applied for connecting the broken centerline to gain the final centerline. The final centerline is matched with its road width when the final road is reconstructed. Three experimental results show that the proposed method improves the accuracy of the centerline and solves the problem of broken centerline, and that the method reconstructing the roads is excellent for maintain their integrity.

Journal ArticleDOI
TL;DR: This paper presents an unsupervised approach for automatic cell segmentation and counting, namely CSC, in high-throughput microscopy images and shows that CSC outperforms the current state-of-the-art techniques.
Abstract: New technological advances in automated microscopy have given rise to large volumes of data, which have made human-based analysis infeasible, heightening the need for automatic systems for high-throughput microscopy applications. In particular, in the field of fluorescence microscopy, automatic tools for image analysis are making an essential contribution in order to increase the statistical power of the cell analysis process. The development of these automatic systems is a difficult task due to both the diversification of the staining patterns and the local variability of the images. In this paper, we present an unsupervised approach for automatic cell segmentation and counting, namely CSC, in high-throughput microscopy images. The segmentation is performed by dividing the whole image into square patches that undergo a gray level clustering followed by an adaptive thresholding. Subsequently, the cell labeling is obtained by detecting the centers of the cells, using both distance transform and curvature analysis, and by applying a region growing process. The advantages of CSC are manifold. The foreground detection process works on gray levels rather than on individual pixels, so it proves to be very efficient. Moreover, the combination of distance transform and curvature analysis makes the counting process very robust to clustered cells. A further strength of the CSC method is the limited number of parameters that must be tuned. Indeed, two different versions of the method have been considered, CSC-7 and CSC-3, depending on the number of parameters to be tuned. The CSC method has been tested on several publicly available image datasets of real and synthetic images. Results in terms of standard metrics and spatially aware measures show that CSC outperforms the current state-of-the-art techniques.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: U-net based approach for direct skeleton extraction of the object within Pixel SkelNetOn - CVPR 2019 challenge is introduced, inspired by CNNs success in skeleton extraction from real images task.
Abstract: Skeletonization is a process aimed to extract a line-like object shape representation, skeleton, which is of great interest for optical character recognition, shape-based object matching, recognition, biomedical image analysis, etc.. Existing methods for skeleton extraction are typically based on topological, morphological or distance transform and are known to be sensitive to the noise on the boundary and require post-processing procedure for redundant branches pruning. In this work, we introduce U-net based approach for direct skeleton extraction of the object within Pixel SkelNetOn - CVPR 2019 challenge, inspired by CNNs success in skeleton extraction from real images task. The main idea of our approach is to consistently edit a skeleton mask by feature propagation through different scale layers. It opposes final skeleton generation from different scale object shape representations as occurs in approaches with deep supervision for skeleton extraction from the real image. Our U-net based model showed ~0.75 F1-score on the validation set and the ensemble of eight identical models, trained on different data subsets, got 0.7846 F1-score on the test data.

Journal ArticleDOI
TL;DR: This paper proposes a new automated cephalometric landmark localization method under the framework of GAN that trained an adversarial network to learn the mapping from features to the distance map of a specific target landmark.
Abstract: Locating anatomical landmarks in a cephalometric X-ray image is a crucial step in cephalometric analysis. Manual landmark localization suffers from inter- and intra-observer variability, which makes developing automated localization methods urgent in clinics. Most of the existing techniques follow the routine thoughts which estimate numerical values of displacements or coordinates for the target landmarks. Additionally, there are no reported applications of generative adversarial networks (GAN) in cephalometric landmark localization. Motivated by these facts, we propose a new automated cephalometric landmark localization method under the framework of GAN. The principle behind our approach is fundamentally different from the conventional ones. It trained an adversarial network under the framework of GAN to learn the mapping from features to the distance map of a specific target landmark. Namely, the output of the adversarial network in this paper is image data, instead of displacements or coordinates as the conventional approaches. Based on the trained networks, we can predict the distance maps of all target landmarks in a new cephalometric image. Subsequently, the target landmarks are detected from the predicted distance maps by an approach similar to regression voting. Experimental results validate the good performance of our method in localization of cephalometric landmarks in dental X-ray images.

Journal ArticleDOI
TL;DR: An automatic method to detect terminations for tree-like structures based on a multiscale ray-shooting model and a termination visual prior that outperforms other the state-of-the-art termination detection methods.
Abstract: Digital reconstruction (tracing) of tree-like structures, such as neurons, retinal blood vessels, and bronchi, from volumetric images and 2D images is very important to biomedical research. Many existing reconstruction algorithms rely on a set of good seed points. The 2D or 3D terminations are good candidates for such seed points. In this paper, we propose an automatic method to detect terminations for tree-like structures based on a multiscale ray-shooting model and a termination visual prior. The multiscale ray-shooting model detects 2D terminations by extracting and analyzing the multiscale intensity distribution features around a termination candidate. The range of scale is adaptively determined according to the local neurite diameter estimated by the Rayburst sampling algorithm in combination with the gray-weighted distance transform. The termination visual prior is based on a key observation–when observing a 3D termination from three orthogonal directions without occlusion, we can recognize it in at least two views. Using this prior with the multiscale ray-shooting model, we can detect 3D terminations with high accuracies. Experiments on 3D neuron image stacks, 2D neuron images, 3D bronchus image stacks, and 2D retinal blood vessel images exhibit average precision and recall rates of 87.50% and 90.54%. The experimental results confirm that the proposed method outperforms other the state-of-the-art termination detection methods.

Journal ArticleDOI
TL;DR: A new approach called pixel replication is described, which uses the image Euclidean distance transform in combination with Gaussian mixture models to better exploit practically effective optimization for delineating objects with elliptical decision boundaries.
Abstract: One of the most important and error-prone tasks in biological image analysis is the segmentation of touching or overlapping cells. Particularly for optical microscopy, including transmitted light and confocal fluorescence microscopy, there is often no consistent discriminative information to separate cells that touch or overlap. It is desired to partition touching foreground pixels into cells using the binary threshold image information only, and optionally incorporating gradient information. The most common approaches for segmenting touching and overlapping cells in these scenarios are based on the watershed transform. We describe a new approach called pixel replication for the task of segmenting elliptical objects that touch or overlap. Pixel replication uses the image Euclidean distance transform in combination with Gaussian mixture models to better exploit practically effective optimization for delineating objects with elliptical decision boundaries. Pixel replication improves significantly on commonly used methods based on watershed transforms, or based on fitting Gaussian mixtures directly to the thresholded image data. Pixel replication works equivalently on both 2-D and 3-D image data, and naturally combines information from multi-channel images. The accuracy of the proposed technique is measured using both the segmentation accuracy on simulated ellipse data and the tracking accuracy on validated stem cell tracking results extracted from hundreds of live-cell microscopy image sequences. Pixel replication is shown to be significantly more accurate compared with other approaches. Variance relationships are derived, allowing a more practically effective Gaussian mixture model to extract cell boundaries for data generated from the threshold image using the uniform elliptical distribution and from the distance transform image using the triangular elliptical distribution.

Journal ArticleDOI
TL;DR: The proposed method uses Fuzzy Distance Transform based adaptive stroke filter which can effectively localize text regions from camera captured images with complex background and the visual response of text segmentation is quite impressive.
Abstract: Localization of text from camera captured images with complex background is now-a-days a growing demand of modern IT enable service. Most of the current text localization techniques are sensitive to text features like color, size, style and also to the background clutter. Among all the methods proposed in different literatures, Stroke Filter is much more effective in localization of text. The effectiveness of traditional stroke filter is limited because of its fixed width and is capable of segmenting strokes/texts of predefined range of width. The proposed method uses Fuzzy Distance Transform based adaptive stroke filter which can effectively localize text regions from camera captured images with complex background. The method is applied by experiment on a database containing 600 images and the visual response of text segmentation is quite impressive. To get the accuracy of the proposed method, it is applied on a set of 16 test images and the segmentation result is compared with the ground truth images resulting in a recall, precision and f-measure values of 96.65%, 87.77% and 91.89% respectively.

Journal ArticleDOI
TL;DR: This work proposes an alternative distance transform method, the random-walk distance transform, and demonstrates its effectiveness in high-throughput segmentation of three microCT datasets of biological tilings (i.e., structures composed of a large number of similar repeating units).
Abstract: Various 3D imaging techniques are routinely used to examine biological materials, the results of which are usually a stack of grayscale images. In order to quantify structural aspects of the biological materials, however, they must first be extracted from the dataset in a process called segmentation. If the individual structures to be extracted are in contact or very close to each other, distance-based segmentation methods utilizing the Euclidean distance transform are commonly employed. Major disadvantages of the Euclidean distance transform, however, are its susceptibility to noise (very common in biological data), which often leads to incorrect segmentations (i.e. poor separation of objects of interest), and its limitation of being only effective for roundish objects. In the present work, we propose an alternative distance transform method, the random-walk distance transform, and demonstrate its effectiveness in high-throughput segmentation of three microCT datasets of biological tilings (i.e. structures composed of a large number of similar repeating units). In contrast to the Euclidean distance transform, this random-walk approach represents the global, rather than the local, geometric character of the objects to be segmented and, thus, is less susceptible to noise. In addition, it is directly applicable to structures with anisotropic shape characteristics. Using three case studies—stingray tessellated cartilage, starfish dermal endoskeleton, and the prismatic layer of bivalve mollusc shell—we provide a typical workflow for the segmentation of tiled structures, describe core image processing concepts that are underused in biological research, and show that for each study system, large amounts of biologically-relevant data can be rapidly segmented, visualized and analyzed.

Journal ArticleDOI
TL;DR: Experimental results verify that the proposed approach is effective to detect foreground objects from complex background environments, and outperforms some state-of-the-art methods.
Abstract: Background modeling and subtraction, the task to detect moving objects in a scene, is a fundamental and critical step for many high level computer vision tasks. However, background subtraction modeling is still an open and challenge problem particularly in practical scenarios with drastic illumination changes and dynamic backgrounds. In this paper, we propose a novel foreground detection method based on CNNs(Convolutional Neural Networks) to deal with challenges confronted with background subtraction. Firstly, given a cleaned background image without moving objects, constructing adjustable neighborhood of each pixel in the background image to form windows; CNN features are extracted with a pre-trained CNN model for each window to form a features based background model. Secondly, for the current frame of a video scene, extracting features with the same operation as the background model. Euclidean distance is adopted to build distance map for current frame and background image with CNN features. Thirdly, the distance map is fed into graph cut algorithm to obtain foreground mask. In order to deal with background changes, the background model is updated with a certain rate. Experimental results verify that the proposed approach is effective to detect foreground objects from complex background environments, and outperforms some state-of-the-art methods.

Posted Content
TL;DR: Dense RepPoints as discussed by the authors uses a large set of points to describe an object at multiple levels, including both box level and pixel level, using a distance transform sampling method combined with set-to-set supervision.
Abstract: We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level. Techniques are proposed to efficiently process these dense points, maintaining near-constant complexity with increasing point numbers. Dense RepPoints is shown to represent and learn object segments well, with the use of a novel distance transform sampling method combined with set-to-set supervision. The distance transform sampling combines the strengths of contour and grid representations, leading to performance that surpasses counterparts based on contours or grids. Code is available at \url{this https URL}.

Journal ArticleDOI
TL;DR: The Distance Transform Network (DTN) is proposed, which combines the power of networks and the richness of information provided from Euclidean distance transform for shape analysis, and is effective for natural shapes classification according to the higher success rates obtained in all cases.

Proceedings ArticleDOI
Ziheng Zhang1, Anpei Chen1, Ling Xie1, Jingyi Yu1, Shenghua Gao1 
15 Oct 2019
TL;DR: This work introduces a new representation, namely a semantics-aware distance map (sem-dist map), to serve as a target for amodal segmentation instead of the commonly used masks and heatmaps, and introduces a novel convolutional neural network architecture, which is referred to as semantic layering network, to estimate sem-dist maps layer by layer.
Abstract: In this work, we demonstrate yet another approach to tackle the amodal segmentation problem. Specifically, we first introduce a new representation, namely a semantics-aware distance map (sem-dist map), to serve as our target for amodal segmentation instead of the commonly used masks and heatmaps. The sem-dist map is a kind of level-set representation, of which the different regions of an object are placed into different levels on the map according to their visibility. It is a natural extension of masks and heatmaps, where modal, amodal segmentation, as well as depth order information, are all well-described. Then we also introduce a novel convolutional neural network (CNN) architecture, which we refer to as semantic layering network, to estimate sem-dist maps layer by layer, from the global-level to the instance-level, for all objects in an image. Extensive experiments on the COCOA and D2SA datasets have demonstrated that our framework can predict amodal segmentation, occlusion, and depth order with state-of-the-art performance.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: An algorithm to detect outliers in object counting based on color and shape information based on Median Absolute Deviation on Histograms of both features is presented.
Abstract: Object counting based on image data has been developed in many research. It is fast, automatic and noncontact solution that is applied in health, microbiology, object tracking, robotics and industry. During the counting, object that differs from majority object might be presented in a frame. This outlier object should be detected and not be counted. This study present an algorithm to detect outliers in object counting based on color and shape information. The color was based on Hue whilst the shape was based on distance transform. Both features were chosen as it is invariant to position and rotation in plane. Outlier detection utilized Median Absolute Deviation (MAD) on Histograms of both features. The testing shows promising result (accuracy of 94.3%) in 35 images with simple background.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposes a novel semantic-elevation mapping approach for navigation task that works more reliably and safely than that only with elevation-based approach.
Abstract: Probabilistic traversable map plays a critical role for mobile robot in safe and reliable navigation. Different from structured environment, traversable region in unstructured environment such as grass and sidewalk is relatively more complex. Traditional elevation-based traversable map cannot represent such complex environment well. Thus, it may cause navigation failure. To address this limitation, this paper proposes a novel semantic-elevation mapping approach for navigation task. We first build a multi-layer semantic map from continuous semantic segmentation images. Then, this multi-layer semantic map is fused and converted into a probabilistic map by a distance transform approach. Generated semantic probabilistic map is then fused to an elevation map at path planning stage. The proposed approach is tested on an Unmanned Ground Vehicle (UGV) platform. The results show that our semantic-elevation mapping approach works more reliably and safely than that only with elevation-based approach.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The algorithm uses inverted binary in combination with Otsu thresholding techniques to find the optimal threshold value of the image in an inverted color space and achieves real-time processing speed of approximately 33.1 frame-per-second.
Abstract: This paper presents a real-time watershed-based algorithm for detecting multiple potholes on asphalt road surface. The algorithm uses (i) inverted binary in combination with Otsu thresholding techniques to find the optimal threshold value of the image in an inverted color space; then (ii) morphological technique with open, then close kernels to filter small noises and bold pothole edges on the image; and (iii) distance transform for finding markers on the pre-watershed-phase image before applying the watershed algorithm. As a result, the algorithm achieves real-time processing speed of approximately 33.1 frame-per-second (fps). Based on the tested images, it is evident that the algorithm can be used for detecting effectively potholes with different sizes and structures on three types of road surfaces namely smooth, aged, and degraded ones.

Journal ArticleDOI
TL;DR: A novel multiscale region transform (MReT) is proposed to perform region integral over different contour-steered strips at all possible scales to effectively integrate patch features, and thus enables a better description of the shape image in a coarse-to-fine manner.
Abstract: In this paper, a multiscale contour steered region integral (MCSRI) method is proposed to classify highly similar shapes with flexible interior connection architectures. A component distance map (CDM) is developed to robustly characterize the flexible interior connection structure, shape of the exterior contour, and their inter-relationship in a shape image. A novel multiscale region transform (MReT) is proposed to perform region integral over different contour-steered strips at all possible scales to effectively integrate patch features, and thus enables a better description of the shape image in a coarse-to-fine manner. It is applied to solve a challenging problem of classifying cultivars from leaf images, which is a new attempt in both biology and computer vision research communities. A soybean cultivar leaf vein database (SoyCultivarVein), which is the first cultivar leaf vein database, is created and presented for performance evaluation. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art methods in similar shape classification and the possibility of cultivar recognition via leaf pattern analysis, which may lead to a new research interest towards fine-level shape analysis on cultivar classification.

Posted Content
TL;DR: In this article, a new semantic segmentation regularization based on the regression of a distance transform is introduced, which requires almost no modification of the network structure and adds a very low overhead to the training process.
Abstract: Understanding visual scenes relies more and more on dense pixel-wise classification obtained via deep fully convolutional neural networks. However, due to the nature of the networks, predictions often suffer from blurry boundaries and ill-segmented shapes, fueling the need for post-processing. This work introduces a new semantic segmentation regularization based on the regression of a distance transform. After computing the distance transform on the label masks, we train a FCN in a multi-task setting in both discrete and continuous spaces by learning jointly classification and distance regression. This requires almost no modification of the network structure and adds a very low overhead to the training process. Learning to approximate the distance transform back-propagates spatial cues that implicitly regularizes the segmentation. We validate this technique with several architectures on various datasets, and we show significant improvements compared to competitive baselines.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work presents a novel and generic approach, Figure and Formula Detector (FFD), to detect the formulas and figures from document images using traditional computer vision approaches in addition to deep models.
Abstract: In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer vision approaches in addition to deep models. We transform input images by applying connected component analysis (CC), distance transform, and colour transform, which are stacked together to generate an input image for the network. The best results produced by FFD for figure and formula detection are with F1-score of 0.906 and 0.905, respectively. We also propose a new dataset for figures and formulas detection to aid future research in this direction. The obtained results advocate that enhancing the input representation can simplify the subsequent optimization problem resulting in significant gains over their conventional counterparts.