scispace - formally typeset
Search or ask a question

Showing papers by "National University of Defense Technology published in 2017"


Proceedings ArticleDOI
25 Jun 2017
TL;DR: An overview of blockchain architechture is provided and some typical consensus algorithms used in different blockchains are compared and possible future trends for blockchain are laid out.
Abstract: Blockchain, the foundation of Bitcoin, has received extensive attentions recently. Blockchain serves as an immutable ledger which allows transactions take place in a decentralized manner. Blockchain-based applications are springing up, covering numerous fields including financial services, reputation system and Internet of Things (IoT), and so on. However, there are still many challenges of blockchain technology such as scalability and security problems waiting to be overcome. This paper presents a comprehensive overview on blockchain technology. We provide an overview of blockchain architechture firstly and compare some typical consensus algorithms used in different blockchains. Furthermore, technical challenges and recent advances are briefly listed. We also lay out possible future trends for blockchain.

2,642 citations


Journal ArticleDOI
TL;DR: This overview reviews theoretical underpinnings of multi-view learning and attempts to identify promising venues and point out some specific challenges which can hopefully promote further research in this rapidly developing field.

679 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: The Improved Deep Embedded Clustering (IDEC) algorithm is proposed, which manipulates feature space to scatter data points using a clustering loss as guidance and can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation.
Abstract: Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to non-representative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an under-complete autoencoder is applied. By integrating the clustering loss and autoencoder’s reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.

566 citations


Proceedings ArticleDOI
Matej Kristan1, Ales Leonardis2, Jiri Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Čehovin Zajc1, Tomas Vojir3, Gustav Häger4, Alan Lukezic1, Abdelrahman Eldesokey4, Gustavo Fernandez5, Alvaro Garcia-Martin6, Andrej Muhič1, Alfredo Petrosino7, Alireza Memarmoghadam8, Andrea Vedaldi9, Antoine Manzanera10, Antoine Tran10, A. Aydin Alatan11, Bogdan Mocanu, Boyu Chen12, Chang Huang, Changsheng Xu13, Chong Sun12, Dalong Du, David Zhang, Dawei Du13, Deepak Mishra, Erhan Gundogdu11, Erhan Gundogdu14, Erik Velasco-Salido, Fahad Shahbaz Khan4, Francesco Battistone, Gorthi R. K. Sai Subrahmanyam, Goutam Bhat4, Guan Huang, Guilherme Sousa Bastos, Guna Seetharaman15, Hongliang Zhang16, Houqiang Li17, Huchuan Lu12, Isabela Drummond, Jack Valmadre9, Jae-chan Jeong18, Jaeil Cho18, Jae-Yeong Lee18, Jana Noskova, Jianke Zhu19, Jin Gao13, Jingyu Liu13, Ji-Wan Kim18, João F. Henriques9, José M. Martínez, Junfei Zhuang20, Junliang Xing13, Junyu Gao13, Kai Chen21, Kannappan Palaniappan22, Karel Lebeda, Ke Gao22, Kris M. Kitani23, Lei Zhang, Lijun Wang12, Lingxiao Yang, Longyin Wen24, Luca Bertinetto9, Mahdieh Poostchi22, Martin Danelljan4, Matthias Mueller25, Mengdan Zhang13, Ming-Hsuan Yang26, Nianhao Xie16, Ning Wang17, Ondrej Miksik9, Payman Moallem8, Pallavi Venugopal M, Pedro Senna, Philip H. S. Torr9, Qiang Wang13, Qifeng Yu16, Qingming Huang13, Rafael Martin-Nieto, Richard Bowden27, Risheng Liu12, Ruxandra Tapu, Simon Hadfield27, Siwei Lyu28, Stuart Golodetz9, Sunglok Choi18, Tianzhu Zhang13, Titus Zaharia, Vincenzo Santopietro, Wei Zou13, Weiming Hu13, Wenbing Tao21, Wenbo Li28, Wengang Zhou17, Xianguo Yu16, Xiao Bian24, Yang Li19, Yifan Xing23, Yingruo Fan20, Zheng Zhu13, Zhipeng Zhang13, Zhiqun He20 
01 Jul 2017
TL;DR: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative; results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years.
Abstract: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website1.

485 citations


Proceedings Article
06 Jul 2017
TL;DR: In this article, a dual path network (DPN) is proposed for image classification, which shares common features while maintaining the flexibility to explore new features through dual path architectures, achieving state-of-the-art performance on the ImagNet-1k, Places365 and PASCAL VOC datasets.
Abstract: In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.

475 citations


Journal ArticleDOI
TL;DR: A novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures, is introduced and it is demonstrated that without supervision, the network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.
Abstract: We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes. We demonstrate that without supervision, our network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.

403 citations


Journal ArticleDOI
TL;DR: A novel convolutional neural network framework is proposed for time series classification that can discover and extract the suitable internal structure to generate deep features of the input time series automatically by using convolution and pooling operations.

398 citations


Book ChapterDOI
14 Nov 2017
TL;DR: A convolutional autoencoders structure is developed to learn embedded features in an end-to-end way and a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment.
Abstract: Deep clustering utilizes deep neural networks to learn feature representation that is suitable for clustering tasks. Though demonstrating promising performance in various applications, we observe that existing deep clustering algorithms either do not well take advantage of convolutional neural networks or do not considerably preserve the local structure of data generating distribution in the learned feature space. To address this issue, we propose a deep convolutional embedded clustering algorithm in this paper. Specifically, we develop a convolutional autoencoders structure to learn embedded features in an end-to-end way. Then, a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment. To avoid feature space being distorted by the clustering loss, we keep the decoder remained which can preserve local structure of data in feature space. In sum, we simultaneously minimize the reconstruction loss of convolutional autoencoders and the clustering loss. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and back-propagation. Experiments on benchmark datasets empirically validate the power of convolutional autoencoders for feature learning and the effectiveness of local structure preservation.

377 citations


Posted Content
TL;DR: This work reveals the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, and finds that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations.
Abstract: In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.

342 citations


Posted Content
TL;DR: A novel monocular visual odometry system called UnDeepVO, able to estimate the 6-DoF pose of a monocular camera and the depth of its view by using deep neural networks, is proposed.
Abstract: We propose a novel monocular visual odometry (VO) system called UnDeepVO in this paper. UnDeepVO is able to estimate the 6-DoF pose of a monocular camera and the depth of its view by using deep neural networks. There are two salient features of the proposed UnDeepVO: one is the unsupervised deep learning scheme, and the other is the absolute scale recovery. Specifically, we train UnDeepVO by using stereo image pairs to recover the scale but test it by using consecutive monocular images. Thus, UnDeepVO is a monocular system. The loss function defined for training the networks is based on spatial and temporal dense information. A system overview is shown in Fig. 1. The experiments on KITTI dataset show our UnDeepVO achieves good performance in terms of pose accuracy.

324 citations


Journal ArticleDOI
TL;DR: A large scale performance evaluation for texture classification, empirically assessing forty texture features including thirty two recent most promising LBP variants and eight non-LBP descriptors based on deep convolutional networks on thirteen widely-used texture datasets.

Journal ArticleDOI
TL;DR: A new diversified DBN is developed through regularizing pretraining and fine-tuning procedures by a diversity promoting prior over latent factors that obtain much better results than original DBNs and comparable or even better performances compared with other recent hyperspectral image classification methods.
Abstract: In the literature of remote sensing, deep models with multiple layers have demonstrated their potentials in learning the abstract and invariant features for better representation and classification of hyperspectral images. The usual supervised deep models, such as convolutional neural networks, need a large number of labeled training samples to learn their model parameters. However, the real-world hyperspectral image classification task provides only a limited number of training samples. This paper adopts another popular deep model, i.e., deep belief networks (DBNs), to deal with this problem. The DBNs allow unsupervised pretraining over unlabeled samples at first and then a supervised fine-tuning over labeled samples. But the usual pretraining and fine-tuning method would make many hidden units in the learned DBNs tend to behave very similarly or perform as “dead” (never responding) or “potential over-tolerant” (always responding) latent factors. These results could negatively affect description ability and thus classification performance of DBNs. To further improve DBN’s performance, this paper develops a new diversified DBN through regularizing pretraining and fine-tuning procedures by a diversity promoting prior over latent factors. Moreover, the regularized pretraining and fine-tuning can be efficiently implemented through usual recursive greedy and back-propagation learning framework. The experiments over real-world hyperspectral images demonstrated that the diversity promoting prior in both pretraining and fine-tuning procedure lead to the learned DBNs with more diverse latent factors, which directly make the diversified DBNs obtain much better results than original DBNs and comparable or even better performances compared with other recent hyperspectral image classification methods.

Journal ArticleDOI
TL;DR: This study demonstrates how public and private data sources that are commonly available for LMICs can be used to provide novel insight into the spatial distribution of poverty, indicating the possibility to estimate and continually monitor poverty rates at high spatial resolution in countries with limited capacity to support traditional methods of data collection.
Abstract: Poverty is one of the most important determinants of adverse health outcomes globally, a major cause of societal instability and one of the largest causes of lost human potential. Traditional approaches to measuring and targeting poverty rely heavily on census data, which in most low- and middle-income countries (LMICs) are unavailable or out-of-date. Alternate measures are needed to complement and update estimates between censuses. This study demonstrates how public and private data sources that are commonly available for LMICs can be used to provide novel insight into the spatial distribution of poverty. We evaluate the relative value of modelling three traditional poverty measures using aggregate data from mobile operators and widely available geospatial data. Taken together, models combining these data sources provide the best predictive power (highest r2 = 0.78) and lowest error, but generally models employing mobile data only yield comparable results, offering the potential to measure poverty more frequently and at finer granularity. Stratifying models into urban and rural areas highlights the advantage of using mobile data in urban areas and different data in different contexts. The findings indicate the possibility to estimate and continually monitor poverty rates at high spatial resolution in countries with limited capacity to support traditional methods of data collection.

Journal ArticleDOI
TL;DR: In this paper, an integrated local trajectory planning and tracking control (ILTPTC) framework for autonomous vehicles driving along a reference path with obstacles avoidance is proposed, where an efficient state-space sampling-based trajectory planning scheme is employed to smoothly follow the reference path.

Journal ArticleDOI
TL;DR: In this paper, a frequency-selective surface (FSS) with high in-band transmission at high frequency and wideband absorption at low frequency is presented. But the PLC structure is not considered.
Abstract: This communication presents a novel frequency-selective surface (FSS) with high in-band transmission at high frequency and wideband absorption at low frequency. It consists of a resistive sheet and a metallic bandpass FSS separated by a foam spacer. The resistive element is realized by inserting a strip-type parallel $LC$ (PLC) structure into the center of a lumped-resistor-loaded metallic dipole. The PLC resonates at the passband of the bandpass FSS and exhibits an infinite impedance, which splits the resistive dipole into two short sections per the surface current; this setup allows for high in-band transmission at high frequency. Below the resonance frequency, the PLC becomes finite inductive and the entire FSS performs as an absorber with the metallic FSS as a ground plane. The surface current distribution on the resistive element can be controlled at various frequencies via the PLC structure. The wideband absorption and high in-band transmission of the proposed design are verified by both numerical simulation and experimental measurements. The potential extension to polarization-insensitive designs is also discussed.

Journal ArticleDOI
10 Feb 2017-Sensors
TL;DR: An improved detection method based on Faster R-CNN is proposed, which employs a hyper region proposal network (HRPN) to extract vehicle-like targets with a combination of hierarchical feature maps and replaces the classifier after RPN by a cascade of boosted classifiers to verify the candidate regions.
Abstract: Detecting vehicles in aerial imagery plays an important role in a wide range of applications. The current vehicle detection methods are mostly based on sliding-window search and handcrafted or shallow-learning-based features, having limited description capability and heavy computational costs. Recently, due to the powerful feature representations, region convolutional neural networks (CNN) based detection methods have achieved state-of-the-art performance in computer vision, especially Faster R-CNN. However, directly using it for vehicle detection in aerial images has many limitations: (1) region proposal network (RPN) in Faster R-CNN has poor performance for accurately locating small-sized vehicles, due to the relatively coarse feature maps; and (2) the classifier after RPN cannot distinguish vehicles and complex backgrounds well. In this study, an improved detection method based on Faster R-CNN is proposed in order to accomplish the two challenges mentioned above. Firstly, to improve the recall, we employ a hyper region proposal network (HRPN) to extract vehicle-like targets with a combination of hierarchical feature maps. Then, we replace the classifier after RPN by a cascade of boosted classifiers to verify the candidate regions, aiming at reducing false detection by negative example mining. We evaluate our method on the Munich vehicle dataset and the collected vehicle dataset, with improvements in accuracy and robustness compared to existing methods.

Journal ArticleDOI
TL;DR: This work focused on miR-24-1 and found that this miRNA unconventionally activates gene transcription by targeting enhancers, and demonstrates a novel mechanism of miRNA as an enhancer trigger.
Abstract: MicroRNAs (miRNAs) are small non-coding RNAs that function as negative gene expression regulators. Emerging evidence shows that, except for function in the cytoplasm, miRNAs are also present in the nucleus. However, the functional significance of nuclear miRNAs remains largely undetermined. By screening miRNA database, we have identified a subset of miRNA that functions as enhancer regulators. Here, we found a set of miRNAs show gene-activation function. We focused on miR-24-1 and found that this miRNA unconventionally activates gene transcription by targeting enhancers. Consistently, the activation was completely abolished when the enhancer sequence was deleted by TALEN. Furthermore, we found that miR-24-1 activates enhancer RNA (eRNA) expression, alters histone modification, and increases the enrichment of p300 and RNA Pol II at the enhancer locus. Our results demonstrate a novel mechanism of miRNA as an enhancer trigger.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this article, a fast-to-train two-streamed CNN is proposed to predict depth and depth gradients, which are then fused together into an accurate and detailed depth map.
Abstract: Estimating depth from a single RGB image is an ill-posed and inherently ambiguous problem. State-of-the-art deep learning methods can now estimate accurate 2D depth maps, but when the maps are projected into 3D, they lack local detail and are often highly distorted. We propose a fast-to-train two-streamed CNN that predicts depth and depth gradients, which are then fused together into an accurate and detailed depth map. We also define a novel set loss over multiple images; by regularizing the estimation between a common set of images, the network is less prone to overfitting and achieves better accuracy than competing methods. Experiments on the NYU Depth v2 dataset shows that our depth predictions are competitive with state-of-the-art and lead to faithful 3D projections.

Journal ArticleDOI
TL;DR: An optimized artificial potential field algorithm for multi-UAV operation in 3-D dynamic space with a distance factor and jump strategy to solve common problems, such as unreachable targets, and ensure that the UAV will not collide with any obstacles is presented.
Abstract: Unmanned aerial vehicle (UAV) systems are one of the most rapidly developing, highest level and most practical applied unmanned aerial systems. Collision avoidance and trajectory planning are the core areas of any UAV system. However, there are theoretical and practical problems associated with the existing methods. To manage these problems, this paper presents an optimized artificial potential field (APF) algorithm for multi-UAV operation in 3-D dynamic space. The classic APF algorithm is restricted to single UAV trajectory planning and usually fails to guarantee the avoidance of collisions. To overcome this challenge, a method is proposed with a distance factor and jump strategy to solve common problems, such as unreachable targets, and ensure that the UAV will not collide with any obstacles. The method considers the UAV companions as dynamic obstacles to realize collaborative trajectory planning. Furthermore, the jitter problem is solved using the dynamic step adjustment method. Several resolution scenarios are illustrated. The method has been validated in quantitative test simulation models and satisfactory results were obtained in a simulated urban environment.

Journal ArticleDOI
TL;DR: Experimental results on the moving and stationary target acquisition and recognition data set indicate that the branched ensemble model based on the unit architecture can achieve 99% classification accuracy with all training data.
Abstract: The deep convolutional neural network (CNN) has been widely used for target classification, because it can learn highly useful representations from data However, it is difficult to apply a CNN for synthetic aperture radar (SAR) target classification directly, for it often requires a large volume of labeled training data, which is impractical for SAR applications The highway network is a newly proposed architecture based on CNN that can be trained with smaller data sets This letter proposes a novel architecture called the convolutional highway unit to train deeper networks with limited SAR data The unit architecture is formed by modified convolutional highway layers, a maxpool layer, and a dropout layer Then, the networks can be flexibly formed by stacking the unit architecture to extract deep feature representations for classification Experimental results on the moving and stationary target acquisition and recognition data set indicate that the branched ensemble model based on the unit architecture can achieve 99% classification accuracy with all training data When the training data are reduced to 30%, the classification accuracy of the ensemble model can still reach 9497%

Journal ArticleDOI
F. P. An1, A. B. Balantekin2, H. R. Band3, M. Bishai4  +199 moreInstitutions (39)
TL;DR: While measurements of the evolution in the IBD spectrum show general agreement with predictions from recent reactor models, the measured evolution in total IBD yield disagrees with recent predictions at 3.1σ, indicating that an overall deficit in the measured flux with respect to predictions does not result from equal fractional deficits from the primary fission isotopes.
Abstract: The Daya Bay experiment has observed correlations between reactor core fuel evolution and changes in the reactor antineutrino flux and energy spectrum. Four antineutrino detectors in two experimental halls were used to identify 2.2 million inverse beta decays (IBDs) over 1230 days spanning multiple fuel cycles for each of six 2.9 GWth reactor cores at the Daya Bay and Ling Ao nuclear power plants. Using detector data spanning effective ^(239)Pu fission fractions F_(239) from 0.25 to 0.35, Daya Bay measures an average IBD yield σ_f of (5.90±0.13)×10^(-43) cm^2/fission and a fuel-dependent variation in the IBD yield, dσ_f/dF_(239), of (-1.86±0.18)×10^(-43) cm^2/fission. This observation rejects the hypothesis of a constant antineutrino flux as a function of the ^(239)Pu fission fraction at 10 standard deviations. The variation in IBD yield is found to be energy dependent, rejecting the hypothesis of a constant antineutrino energy spectrum at 5.1 standard deviations. While measurements of the evolution in the IBD spectrum show general agreement with predictions from recent reactor models, the measured evolution in total IBD yield disagrees with recent predictions at 3.1σ. This discrepancy indicates that an overall deficit in the measured flux with respect to predictions does not result from equal fractional deficits from the primary fission isotopes ^(235)U, ^(239)Pu, ^(238)U, and ^(241)Pu. Based on measured IBD yield variations, yields of (6.17±0.17) and (4.27±0.26)×10^(-43) cm^2/fission have been determined for the two dominant fission parent isotopes ^(235)U and ^(239)Pu. A 7.8% discrepancy between the observed and predicted ^(235)U yields suggests that this isotope may be the primary contributor to the reactor antineutrino anomaly.

Journal ArticleDOI
TL;DR: This paper explored the use of deep convolutional neural network methodology for the automatic classification of diabetic retinopathy using color fundus image, and obtained an accuracy of 94.5% on the authors' dataset, outperforming the results obtained by using classical approaches.
Abstract: The automatic detection of diabetic retinopathy is of vital importance, as it is the main cause of irreversible vision loss in the working-age population in the developed world. The early detection of diabetic retinopathy occurrence can be very helpful for clinical treatment; although several different feature extraction approaches have been proposed, the classification task for retinal images is still tedious even for those trained clinicians. Recently, deep convolutional neural networks have manifested superior performance in image classification compared to previous handcrafted feature-based image classification methods. Thus, in this paper, we explored the use of deep convolutional neural network methodology for the automatic classification of diabetic retinopathy using color fundus image, and obtained an accuracy of 94.5% on our dataset, outperforming the results obtained by using classical approaches.

Proceedings ArticleDOI
18 May 2017
TL;DR: Taking the objects proposals generated by Faster R-CNN for the guard windows of CFAR algorithm, this method picks up the small-sized targets by reevaluating the bounding boxes which have relative low classification scores in detection network, to gain better performance of detection.
Abstract: SAR ship detection is essential to marine monitoring. Recently, with the development of the deep neural network and the spring of the SAR images, SAR ship detection based on deep neural network has been a trend. However, the multi-scale ships in SAR images cause the undesirable differences of features, which decrease the accuracy of ship detection based on deep learning methods. Aiming at this problem, this paper modifies the Faster R-CNN, a state-of-the-art object detection networks, by the traditional constant false alarm rate (CFAR). Taking the objects proposals generated by Faster R-CNN for the guard windows of CFAR algorithm, this method picks up the small-sized targets. By reevaluating the bounding boxes which have relative low classification scores in detection network, this method gain better performance of detection.

Journal ArticleDOI
TL;DR: Fang et al. fabricate one- and two-dimensional nonlinear acoustic metamaterials with a broadband, low-frequency, response—greatly suppressing low frequency noise.
Abstract: Linear acoustic metamaterials (LAMs) are widely used to manipulate sound; however, it is challenging to obtain bandgaps with a generalized width (ratio of the bandgap width to its start frequency) >1 through linear mechanisms. Here we adopt both theoretical and experimental approaches to describe the nonlinear chaotic mechanism in both one-dimensional (1D) and two-dimensional (2D) nonlinear acoustic metamaterials (NAMs). This mechanism enables NAMs to reduce wave transmissions by as much as 20–40 dB in an ultra-low and ultra-broad band that consists of bandgaps and chaotic bands. With subwavelength cells, the generalized width reaches 21 in a 1D NAM and it goes up to 39 in a 2D NAM, which overcomes the bandwidth limit for wave suppression in current LAMs. This work enables further progress in elucidating the dynamics of NAMs and opens new avenues in double-ultra acoustic manipulation.

Posted Content
TL;DR: In this paper, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency.
Abstract: In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed LRTR method outperforms other denoising algorithms on real corrupted hyperspectral data and can preserve the global structure of HSIs and simultaneously remove Gaussian noise and sparse noise.
Abstract: This paper studies the hyperspectral image (HSI) denoising problem under the assumption that the signal is low in rank. In this paper, a mixture of Gaussian noise and sparse noise is considered. The sparse noise includes stripes, impulse noise, and dead pixels. The denoising task is formulated as a low-rank tensor recovery (LRTR) problem from Gaussian noise and sparse noise. Traditional low-rank tensor decomposition methods are generally NP-hard to compute. Besides, these tensor decomposition based methods are sensitive to sparse noise. In contrast, the proposed LRTR method can preserve the global structure of HSIs and simultaneously remove Gaussian noise and sparse noise.The proposed method is based on a new tensor singular value decomposition and tensor nuclear norm. The NP-hard tensor recovery task is well accomplished by polynomial time algorithms. The convergence of the algorithm and the parameter settings are also described in detail. Preliminary numerical experiments have demonstrated that the proposed method is effective for low-rank tensor recovery from Gaussian noise and sparse noise. Experimental results also show that the proposed LRTR method outperforms other denoising algorithms on real corrupted hyperspectral data.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This work introduces a novel bridge between the modality-specific representations by creating a co-embedding space based on a recurrent residual fusion (RRF) block that adapts the recurrent mechanism to residual learning, so that it can recursively improve feature embeddings while retaining the shared parameters.
Abstract: A major challenge in matching between vision and language is that they typically have completely different features and representations. In this work, we introduce a novel bridge between the modality-specific representations by creating a co-embedding space based on a recurrent residual fusion (RRF) block. Specifically, RRF adapts the recurrent mechanism to residual learning, so that it can recursively improve feature embeddings while retaining the shared parameters. Then, a fusion module is used to integrate the intermediate recurrent outputs and generates a more powerful representation. In the matching network, RRF acts as a feature enhancement component to gather visual and textual representations into a more discriminative embedding space where it allows to narrow the crossmodal gap between vision and language. Moreover, we employ a bi-rank loss function to enforce separability of the two modalities in the embedding space. In the experiments, we evaluate the proposed RRF-Net using two multi-modal datasets where it achieves state-of-the-art results.

Journal ArticleDOI
TL;DR: To accurately extract vehicle-like targets, an accurate-vehicle-proposal-network (AVPN) based on hyper feature map which combines hierarchical feature maps that are more accurate for small object detection is developed and a coupled R-CNN method is proposed, which combines an AVPN and a vehicle attribute learning network to extract the vehicle's location and attributes simultaneously.
Abstract: Vehicle detection in aerial images, being an interesting but challenging problem, plays an important role for a wide range of applications. Traditional methods are based on sliding-window search and handcrafted or shallow-learning-based features with heavy computational costs and limited representation power. Recently, deep learning algorithms, especially region-based convolutional neural networks (R-CNNs), have achieved state-of-the-art detection performance in computer vision. However, several challenges limit the applications of R-CNNs in vehicle detection from aerial images: 1) vehicles in large-scale aerial images are relatively small in size, and R-CNNs have poor localization performance with small objects; 2) R-CNNs are particularly designed for detecting the bounding box of the targets without extracting attributes; 3) manual annotation is generally expensive and the available manual annotation of vehicles for training R-CNNs are not sufficient in number. To address these problems, this paper proposes a fast and accurate vehicle detection framework. On one hand, to accurately extract vehicle-like targets, we developed an accurate-vehicle-proposal-network (AVPN) based on hyper feature map which combines hierarchical feature maps that are more accurate for small object detection. On the other hand, we propose a coupled R-CNN method, which combines an AVPN and a vehicle attribute learning network to extract the vehicle's location and attributes simultaneously. For original large-scale aerial images with limited manual annotations, we use cropped image blocks for training with data augmentation to avoid overfitting. Comprehensive evaluations on the public Munich vehicle dataset and the collected vehicle dataset demonstrate the accuracy and effectiveness of the proposed method.

Journal ArticleDOI
06 Jul 2017-Nature
TL;DR: The results suggest that Hrd1 forms a retro-translocation channel for the movement of misfolded polypeptides through the endoplasmic reticulum membrane.
Abstract: Misfolded endoplasmic reticulum proteins are retro-translocated through the membrane into the cytosol, where they are poly-ubiquitinated, extracted from the membrane, and degraded by the proteasome-a pathway termed endoplasmic reticulum-associated protein degradation (ERAD). Proteins with misfolded domains in the endoplasmic reticulum lumen or membrane are discarded through the ERAD-L and ERAD-M pathways, respectively. In Saccharomyces cerevisiae, both pathways require the ubiquitin ligase Hrd1, a multi-spanning membrane protein with a cytosolic RING finger domain. Hrd1 is the crucial membrane component for retro-translocation, but it is unclear whether it forms a protein-conducting channel. Here we present a cryo-electron microscopy structure of S. cerevisiae Hrd1 in complex with its endoplasmic reticulum luminal binding partner, Hrd3. Hrd1 forms a dimer within the membrane with one or two Hrd3 molecules associated at its luminal side. Each Hrd1 molecule has eight transmembrane segments, five of which form an aqueous cavity extending from the cytosol almost to the endoplasmic reticulum lumen, while a segment of the neighbouring Hrd1 molecule forms a lateral seal. The aqueous cavity and lateral gate are reminiscent of features of protein-conducting conduits that facilitate polypeptide movement in the opposite direction-from the cytosol into or across membranes. Our results suggest that Hrd1 forms a retro-translocation channel for the movement of misfolded polypeptides through the endoplasmic reticulum membrane.

Journal ArticleDOI
TL;DR: A heuristic approach to controlling the iteration number in the fusion process of a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures is proposed.
Abstract: Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.