scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 2019"


Journal ArticleDOI
TL;DR: The current version of iTOL v4 introduces four new dataset types, together with numerous new features, and is the first tool which supports direct visualization of Qiime 2 trees and associated annotations.
Abstract: The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. The current version introduces four new dataset types, together with numerous new features. Annotation options have been expanded and new control options added for many display elements. An interactive spreadsheet-like editor has been implemented, providing dataset creation and editing directly in the web interface. Font support has been rewritten with full support for UTF-8 character encoding throughout the user interface. Google Web Fonts are now fully supported in the tree text labels. iTOL v4 is the first tool which supports direct visualization of Qiime 2 trees and associated annotations. The user account system has been streamlined and expanded with new navigation options, and currently handles >700 000 trees from more than 40 000 individual users. Full batch access has been implemented allowing programmatic upload and export of trees and annotations.

4,233 citations


Journal ArticleDOI
TL;DR: RAxML-NG is presented, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML, which offers improved accuracy, flexibility, speed, scalability, and usability compared with RAx ML/ exaML.
Abstract: MOTIVATION Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

1,765 citations


Journal ArticleDOI
TL;DR: The new version of Evolview was designed to provide simple tree uploads, manipulation and viewing options with additional annotation types, and the ‘dataset system’ used for visualizing tree information has evolved substantially from the previous version.
Abstract: Evolview is an interactive tree visualization tool designed to help researchers in visualizing phylogenetic trees and in annotating these with additional information. It offers the user with a platform to upload trees in most common tree formats, such as Newick/Phylip, Nexus, Nhx and PhyloXML, and provides a range of visualization options, using fifteen types of custom annotation datasets. The new version of Evolview was designed to provide simple tree uploads, manipulation and viewing options with additional annotation types. The 'dataset system' used for visualizing tree information has evolved substantially from the previous version, and the user can draw on a wide range of additional example visualizations. Developments since the last public release include a complete redesign of the user interface, new annotation dataset types, additional tree visualization styles, full-text search of the documentation, and some backend updates. The project management aspect of Evolview was also updated, with a unified approach to tree and project management and sharing. Evolview is freely available at: https://www.evolgenius.info/evolview/.

436 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: A hybrid learning procedure is developed which integrates end-task supervised learning and the tree structure reinforcement learning, where the former's evaluation result serves as a self-critic for the latter's structure exploration.
Abstract: We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual QA 2) the dynamic structure varies from image to image and task to task, allowing more content-/task-specific message passing among objects. To construct a VCTree, we design a score function that calculates the task-dependent validity between each object pair, and the tree is the binary version of the maximum spanning tree from the score matrix. Then, visual contexts are encoded by bidirectional TreeLSTM and decoded by task-specific models. We develop a hybrid learning procedure which integrates end-task supervised learning and the tree structure reinforcement learning, where the former's evaluation result serves as a self-critic for the latter's structure exploration. Experimental results on two benchmarks, which require reasoning over contexts: Visual Genome for scene graph generation and VQA2.0 for visual Q&A, show that VCTree outperforms state-of-the-art results while discovering interpretable visual context structures.

346 citations


Journal ArticleDOI
TL;DR: A multilayer convolutional neural network (MCNN) is proposed for the classification of the Mango leaves infected by the Anthracnose fungal disease and the results envisage the higher classification accuracy of the proposed MCNN model when compared to the other state-of-the-art approaches.
Abstract: Fungal diseases not only influence the economic importance of the plants and its products but also abate their ecological prominence. Mango tree, specifically the fruits and the leaves are highly affected by the fungal disease named as Anthracnose. The main aim of this paper is to develop an appropriate and effective method for diagnosis of the disease and its symptoms, therefore espousing a suitable system for an early and cost-effective solution of this problem. Over the last few years, due to their higher performance capability in terms of computation and accuracy, computer vision, and deep learning methodologies have gained popularity in assorted fungal diseases classification. Therefore, for this paper, a multilayer convolutional neural network (MCNN) is proposed for the classification of the Mango leaves infected by the Anthracnose fungal disease. This paper is validated on a real-time dataset captured at the Shri Mata Vaishno Devi University, Katra, J&K, India consists of 1070 images of the Mango tree leaves. The dataset contains both healthy and infected leaf images. The results envisage the higher classification accuracy of the proposed MCNN model when compared to the other state-of-the-art approaches.

193 citations


Posted Content
TL;DR: Improvements to the interpretability of tree-based models through the first polynomial time algorithm to compute optimal explanations based on game theory, and a new type of explanation that directly measures local feature interaction effects.
Abstract: Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

186 citations


Journal ArticleDOI
TL;DR: This semi-supervised deep learning approach demonstrates that remote sensing can overcome a lack of labeled training data by generating noisy data for initial training using unsupervised methods and retraining the resulting models with high quality labeled data.
Abstract: Remote sensing can transform the speed, scale, and cost of biodiversity and forestry surveys. Data acquisition currently outpaces the ability to identify individual organisms in high resolution imagery. We outline an approach for identifying tree-crowns in RGB imagery while using a semi-supervised deep learning detection network. Individual crown delineation has been a long-standing challenge in remote sensing and available algorithms produce mixed results. We show that deep learning models can leverage existing Light Detection and Ranging (LIDAR)-based unsupervised delineation to generate trees that are used for training an initial RGB crown detection model. Despite limitations in the original unsupervised detection approach, this noisy training data may contain information from which the neural network can learn initial tree features. We then refine the initial model using a small number of higher-quality hand-annotated RGB images. We validate our proposed approach while using an open-canopy site in the National Ecological Observation Network. Our results show that a model using 434,551 self-generated trees with the addition of 2848 hand-annotated trees yields accurate predictions in natural landscapes. Using an intersection-over-union threshold of 0.5, the full model had an average tree crown recall of 0.69, with a precision of 0.61 for the visually-annotated data. The model had an average tree detection rate of 0.82 for the field collected stems. The addition of a small number of hand-annotated trees improved the performance over the initial self-supervised model. This semi-supervised deep learning approach demonstrates that remote sensing can overcome a lack of labeled training data by generating noisy data for initial training using unsupervised methods and retraining the resulting models with high quality labeled data.

157 citations


Journal ArticleDOI
TL;DR: In this paper, the reliability and robustness of tree height observations obtained via a conventional field inventory, airborne laser scanning (ALS) and terrestrial laser scanner (TLS) were investigated.
Abstract: Quantitative comparisons of tree height observations from different sources are scarce due to the difficulties in effective sampling. In this study, the reliability and robustness of tree height observations obtained via a conventional field inventory, airborne laser scanning (ALS) and terrestrial laser scanning (TLS) were investigated. A carefully designed non-destructive experiment was conducted that included 1174 individual trees in 18 sample plots (32 m × 32 m) in a Scandinavian boreal forest. The point density of the ALS data was approximately 450 points/m2. The TLS data were acquired with multi-scans from the center and the four quadrant directions of the sample plots. Both the ALS and TLS data represented the cutting edge point cloud products. Tree heights were manually measured from the ALS and TLS point clouds with the aid of existing tree maps. Therefore, the evaluation results revealed the capacities of the applied laser scanning (LS) data while excluding the influence of data processing approach such as the individual tree detection. The reliability and robustness of different tree height sources were evaluated through a cross-comparison of the ALS-, TLS-, and field- based tree heights. Compared to ALS and TLS, field measurements were more sensitive to stand complexity, crown classes, and species. Overall, field measurements tend to overestimate height of tall trees, especially tall trees in codominant crown class. In dense stands, high uncertainties also exist in the field measured heights for small trees in intermediate and suppressed crown class. The ALS-based tree height estimates were robust across all stand conditions. The taller the tree, the more reliable was the ALS-based tree height. The highest uncertainty in ALS-based tree heights came from trees in intermediate crown class, due to the difficulty of identifying treetops. When using TLS, reliable tree heights can be expected for trees lower than 15–20 m in height, depending on the complexity of forest stands. The advantage of LS systems was the robustness of the geometric accuracy of the data. The greatest challenges of the LS techniques in measuring individual tree heights lie in the occlusion effects, which lead to omissions of trees in intermediate and suppressed crown classes in ALS data and incomplete crowns of tall trees in TLS data.

154 citations


Journal ArticleDOI
TL;DR: This study transmute the programs’ OpCodes into a vector space and employ fuzzy and fast fuzzy pattern tree methods for malware detection and categorization, obtaining a high degree of accuracy during reasonable run-times especially for the fast fuzzypattern tree.

151 citations


Proceedings ArticleDOI
01 Aug 2019
TL;DR: A treestructured neural model to generate expression tree in a goal-driven manner is proposed and experimental results on the dataset Math23K have shown that the model outperforms significantly several state-of-the-art models.
Abstract: Most existing neural models for math word problems exploit Seq2Seq model to generate solution expressions sequentially from left to right, whose results are far from satisfactory due to the lack of goal-driven mechanism commonly seen in human problem solving. This paper proposes a tree-structured neural model to generate expression tree in a goal-driven manner. Given a math word problem, the model first identifies and encodes its goal to achieve, and then the goal gets decomposed into sub-goals combined by an operator in a top-down recursive way. The whole process is repeated until the goal is simple enough to be realized by a known quantity as leaf node. During the process, two-layer gated-feedforward networks are designed to implement each step of goal decomposition, and a recursive neural network is used to encode fulfilled subtrees into subtree embeddings, which provides a better representation of subtrees than the simple goals of subtrees. Experimental results on the dataset Math23K have shown that our tree-structured model outperforms significantly several state-of-the-art models.

146 citations


Journal ArticleDOI
TL;DR: This work proposes the utilization of unmanned aerial vehicles (UAVs) to collect data in dense wireless sensor networks using projection-based compressive data gathering (CDG) as a novel solution methodology and proposes a set of effective algorithms to generate solutions for relatively large-scale network scenarios.
Abstract: Fifth generation wireless networks are expected to provide advanced capabilities and create new markets. Among the emerging markets, Internet of Things (IoT) use cases are standing out with the proliferation of a wide range of sensors that can be configured to continuously monitor and transmit data for intelligent processing and decision making. Devices in such scenarios are normally extremely energy-constrained and often exist in large numbers and can be located in hard-to-reach areas; the fact that necessitates the design and implementation of effective energy-aware data collection mechanisms. To this end, we propose the utilization of unmanned aerial vehicles (UAVs) to collect data in dense wireless sensor networks using projection-based compressive data gathering (CDG) as a novel solution methodology. CDG is utilized to aggregate data en-route from a large set of sensor nodes to selected projection nodes acting as cluster heads (CHs) in order to reduce the number of needed transmissions leading to notable energy savings and extended network lifetime. The UAV transfers the gathered data from the CHs to a remote sink node, e.g., a 5G cellular base station, which avoids the need for long range transmissions or multihop communications among the sensors. Our problem definition aims at clustering the sensors, constructing an optimized forwarding tree per cluster, and gathering the data from selected CH nodes based on projection-based CDG with minimized UAV trajectory distance. We formulate a joint optimization problem and divide it into four complementary subproblems to generate close-to-optimal results with lower complexity. Moreover, we propose a set of effective algorithms to generate solutions for relatively large-scale network scenarios. We demonstrate the superiority of the proposed approach and the designed algorithms via detailed performance results with analysis, comparisons, and insights.

Journal ArticleDOI
TL;DR: In this paper, the interpretable trees (inTrees) framework extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions.
Abstract: Tree ensembles such as random forests and boosted trees are accurate but difficult to understand. In this work, we provide the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions. The inTrees framework can be applied to multiple types of tree ensembles, e.g., random forests, regularized random forests, and boosted trees. We implemented the inTrees algorithms in the “inTrees” R package.

Journal ArticleDOI
TL;DR: In this paper, a visual analytic system aiming at interpreting random forest models and predictions is proposed, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation.
Abstract: As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. The ensemble nature of the model helps random forests outperform any individual decision tree. However, it also leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. The interpretation challenges stem from the variety and complexity of the contained decision trees. Each decision tree has its unique structure and properties, such as the features used in the tree and the feature threshold in each tree node. Thus, a data input may lead to a variety of decision paths. To understand how a final prediction is achieved, it is desired to understand and compare all decision paths in the context of all tree structures, which is a huge challenge for any users. In this paper, we propose a visual analytic system aiming at interpreting random forest models and predictions. In addition to providing users with all the tree information, we summarize the decision paths in random forests, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation. To demonstrate the effectiveness of our system, two usage scenarios and a qualitative user study are conducted.

Journal ArticleDOI
TL;DR: PastML was applied to the phylogeography of Dengue serotype 2 (DENV2), and the evolution of drug resistances in a large HIV data set, and showed the uncertainty of the human-sylvatic DENV2 geographic origin.
Abstract: The reconstruction of ancestral scenarios is widely used to study the evolution of characters along phylogenetic trees. One commonly uses the marginal posterior probabilities of the character states, or the joint reconstruction of the most likely scenario. However, marginal reconstructions provide users with state probabilities, which are difficult to interpret and visualize, whereas joint reconstructions select a unique state for every tree node and thus do not reflect the uncertainty of inferences. We propose a simple and fast approach, which is in between these two extremes. We use decision-theory concepts (namely, the Brier score) to associate each node in the tree to a set of likely states. A unique state is predicted in tree regions with low uncertainty, whereas several states are predicted in uncertain regions, typically around the tree root. To visualize the results, we cluster the neighboring nodes associated with the same states and use graph visualization tools. The method is implemented in the PastML program and web server. The results on simulated data demonstrate the accuracy and robustness of the approach. PastML was applied to the phylogeography of Dengue serotype 2 (DENV2), and the evolution of drug resistances in a large HIV data set. These analyses took a few minutes and provided convincing results. PastML retrieved the main transmission routes of human DENV2 and showed the uncertainty of the human-sylvatic DENV2 geographic origin. With HIV, the results show that resistance mutations mostly emerge independently under treatment pressure, but resistance clusters are found, corresponding to transmissions among untreated patients.

Journal ArticleDOI
TL;DR: This paper scaffold the neural network’s architecture around a leading-order model of the physics underlying the data, and calls it Junipr: “Jets from UNsupervised Interpretable PRobabilistic models”.
Abstract: In applications of machine learning to particle physics, a persistent challenge is how to go beyond discrimination to learn about the underlying physics. To this end, a powerful tool would be a framework for unsupervised learning, where the machine learns the intricate high-dimensional contours of the data upon which it is trained, without reference to pre-established labels. In order to approach such a complex task, an unsupervised network must be structured intelligently, based on a qualitative understanding of the data. In this paper, we scaffold the neural network’s architecture around a leading-order model of the physics underlying the data. In addition to making unsupervised learning tractable, this design actually alleviates existing tensions between performance and interpretability. We call the framework Junipr: “Jets from UNsupervised Interpretable PRobabilistic models”. In this approach, the set of particle momenta composing a jet are clustered into a binary tree that the neural network examines sequentially. Training is unsupervised and unrestricted: the network could decide that the data bears little correspondence to the chosen tree structure. However, when there is a correspondence, the network’s output along the tree has a direct physical interpretation. Junipr models can perform discrimination tasks, through the statistically optimal likelihood-ratio test, and they permit visualizations of discrimination power at each branching in a jet’s tree. Additionally, Junipr models provide a probability distribution from which events can be drawn, providing a data-driven Monte Carlo generator. As a third application, Junipr models can reweight events from one (e.g. simulated) data set to agree with distributions from another (e.g. experimental) data set.

Proceedings ArticleDOI
01 Oct 2019
Abstract: Visual grounding, a task to ground (i.e., localize) natural language in images, essentially requires composite visual reasoning. However, existing methods over-simplify the composite nature of language into a monolithic sentence embedding or a coarse composition of subject-predicate-object triplet. In this paper, we propose to ground natural language in an intuitive, explainable, and composite fashion as it should be. In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed. NMTree disentangles the visual grounding from the composite reasoning, allowing the former to only focus on primitive and easy-to-generalize patterns. To reduce the impact of parsing errors, we train the modules and their assembly end-to-end by using the Gumbel-Softmax approximation and its straight-through gradient estimator, accounting for the discrete nature of module assembly. Overall, the proposed NMTree consistently outperforms the state-of-the-arts on several benchmarks. Qualitative results show explainable grounding score calculation in great detail.

Journal ArticleDOI
TL;DR: This paper details the use of deep convolution neural networks architecture based on single-stage detectors to detect and count fruits within the tree canopy using new convolutional deep learning techniques.
Abstract: Image/video processing for fruit detection in the tree using hard-coded feature extraction algorithms has shown high accuracy on fruit detection during recent years. While accurate, these approaches even with high-end hardware are still computationally intensive and too slow for real-time systems. This paper details the use of deep convolution neural networks architecture based on single-stage detectors. Using deep-learning techniques eliminates the need for hard-code specific features for specific fruit shapes, color and/or other attributes. This architecture takes the input image and divides into AxA grid, where A is a configurable hyper-parameter that defines the fineness of the grid. To each grid cell an image detection and localization algorithm is applied. Each of those cells is responsible to predict bounding boxes and confidence score for fruit (apple and pear in the case of this study) detected in that cell. We want this confidence score to be high if a fruit exists in a cell, otherwise to be zero, if no fruit is in the cell. More than 100 images of apple and pear trees were taken. Each tree image with approximately 50 fruits, that at the end resulted on more than 5000 images of apple and pear fruits each. Labeling images for training consisted on manually specifying the bounding boxes for fruits, where (x, y) are the center coordinates of the box and (w, h) are width and height. This architecture showed an accuracy of more than 90% fruit detection. Based on correlation between number of visible fruits, detected fruits on one frame and the real number of fruits on one tree, a model was created to accommodate this error rate. Processing speed is higher than 20 FPS which is fast enough for any grasping/harvesting robotic arm or other real-time applications. HIGHLIGHTS Using new convolutional deep learning techniques based on single-shot detectors to detect and count fruits (apple and pear) within the tree canopy.

Journal ArticleDOI
TL;DR: In this paper, the tree tensor network (TTN) is proposed for generative modeling of natural data such as images and handwritten digits, which exhibits balanced performance in expressibility and efficient training and sampling.
Abstract: Matrix product states (MPSs), a tensor network designed for one-dimensional quantum systems, were recently proposed for generative modeling of natural data (such as images) in terms of the ``Born machine.'' However, the exponential decay of correlation in MPSs restricts its representation power heavily for modeling complex data such as natural images. In this work, we push forward the effort of applying tensor networks to machine learning by employing the tree tensor network (TTN), which exhibits balanced performance in expressibility and efficient training and sampling. We design the tree tensor network to utilize the two-dimensional prior of the natural images and develop sweeping learning and sampling algorithms which can be efficiently implemented utilizing graphical processing units. We apply our model to random binary patterns and the binary MNIST data sets of handwritten digits. We show that the TTN is superior to MPSs for generative modeling in keeping the correlation of pixels in natural images, as well as giving better log-likelihood scores in standard data sets of handwritten digits. We also compare its performance with state-of-the-art generative models such as variational autoencoders, restricted Boltzmann machines, and PixelCNN. Finally, we discuss the future development of tensor network states in machine learning problems.

Journal ArticleDOI
TL;DR: UAV photogrammetric tree height measurements are shown to be a viable option for intensive forest monitoring plots and that the possibility to acquire within-season tree growth measurements merits further study, and it was shown that negative and positive biases evident in field-based and UAV-based photogram Metrics could potentially lead to misinterpretation of results.
Abstract: The measurement of tree height has long been an important tree attribute for the purpose of calculating tree growth, volume, and biomass, which in turn deliver important ecological and economical information to decision makers. Tree height has traditionally been measured by indirect field-based techniques, however these methods are rarely contested. With recent advances in Unmanned Aerial Vehicle (UAV) remote sensing technologies, the possibility to acquire accurate tree heights semi-automatically has become a reality. In this study, photogrammetric and field-based tree height measurements of a Scots Pine stand were validated using destructive methods. The intensive forest monitoring site implemented for the study was configured with permanent ground control points (GCPs) measured with a Total Station (TS). Field-based tree height measurements resulted in a similar level of error to that of the photogrammetric measurements, with root mean square error (RMSE) values of 0.304 m (1.82%) and 0.34 m (2.07%), respectively (n = 34). A conflicting bias was, however, discovered where field measurements tended to overestimate tree heights and photogrammetric measurements were underestimated. The photogrammetric tree height measurements of all trees (n = 285) were validated against the field-based measurements and resulted in a RMSE of 0.479 m (2.78%). Additionally, two separate photogrammetric tree height datasets were compared (n = 251), and a very low amount of error was observed with a RMSE of 0.138 m (0.79%), suggesting a high potential for repeatability. This study shows that UAV photogrammetric tree height measurements are a viable option for intensive forest monitoring plots and that the possibility to acquire within-season tree growth measurements merits further study. Additionally, it was shown that negative and positive biases evident in field-based and UAV-based photogrammetric tree height measurements could potentially lead to misinterpretation of results when field-based measurements are used as validation.

Journal ArticleDOI
17 Jul 2019
TL;DR: Wang et al. as discussed by the authors propose a tree-structured policy gradient recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree.
Abstract: Reinforcement learning (RL) has recently been introduced to interactive recommender systems (IRS) because of its nature of learning from dynamic interactions and planning for long-run performance. As IRS is always with thousands of items to recommend (i.e., thousands of actions), most existing RL-based methods, however, fail to handle such a large discrete action space problem and thus become inefficient. The existing work that tries to deal with the large discrete action space problem by utilizing the deep deterministic policy gradient framework suffers from the inconsistency between the continuous action representation (the output of the actor network) and the real discrete action. To avoid such inconsistency and achieve high efficiency and recommendation effectiveness, in this paper, we propose a Tree-structured Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree. Extensive experiments on carefully-designed environments based on two real-world datasets demonstrate that our model provides superior recommendation performance and significant efficiency improvement over state-of-the-art methods.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The Tree Structure Reference Joints Image (TSRJI), a novel skeleton image representation to be used as input to CNNs, has been introduced, which has the advantage of combining the use of reference joints and a tree structure skeleton.
Abstract: In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition. To that end, two main baseline approaches have been researched: (i) Recurrent Neural Networks (RNNs) with Long-Short Term Memory (LSTM); and (ii) skeleton image representations used as input to a Convolutional Neural Network (CNN). Although RNN approaches present excellent results, such methods lack the ability to efficiently learn the spatial relations between the skeleton joints. On the other hand, the representations used to feed CNN approaches present the advantage of having the natural ability of learning structural information from 2D arrays (i.e., they learn spatial relations from the skeleton joints). To further improve such representations, we introduce the Tree Structure Reference Joints Image (TSRJI), a novel skeleton image representation to be used as input to CNNs. The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton. While the former incorporates different spatial relationships between the joints, the latter preserves important spatial relations by traversing a skeleton tree with a depth-first order algorithm. Experimental results demonstrate the effectiveness of the proposed representation for 3D action recognition on two datasets achieving state-of-the-art results on the recent NTU RGB+D 120 dataset.

Journal ArticleDOI
TL;DR: A novel tree-based method called fast splitting algorithm based on consecutive slot status detection (FSA-CSS), which includes a fast splitting (FS) mechanism and a shrink mechanism and achieves a system throughput of 0.41, outperforming the existing ultra high frequency RFID solutions.
Abstract: Efficient and effective objects identification using radio frequency identification (RFID) is always a challenge in large-scale industrial and commercial applications. Among existing solutions, the tree-based splitting scheme has attracted increasing attention because of its high extendibility and feasibility. However, the conventional tree splitting algorithms can only solve tag collision with counter value equals to zero and usually result in performance degradation when the number of tags is large. To overcome such drawbacks, we propose a novel tree-based method called fast splitting algorithm based on consecutive slot status detection (FSA-CSS), which includes a fast splitting (FS) mechanism and a shrink mechanism. Specifically, the FS mechanism is used to reduce collisions by increasing commands when the number of consecutive collision is above a threshold, whereas the shrink mechanism is used to reduce extra idle slots introduced by the FS. Simulation results supplemented by prototyping tests show that the proposed FSA-CSS achieves a system throughput of 0.41, outperforming the existing ultra high frequency RFID solutions.

Proceedings ArticleDOI
01 Feb 2019
TL;DR: This work is proposing grid search-based hyperparameter tuning (GSHPT) for random forest parameters to classify Microarray Cancer Data and Experimental results of the proposed work show an improvement over the state of the art methods.
Abstract: Cancer is a group of diseases caused due to abnormal cell growth. Due to the innovation of microarray technology, a large variety of microarray cancer datasets are produced and hence open up avenues to carry out research work across several disciplines such as Statistics, Computational Biology, Genomic studies and other related fields. The main challenges in analyzing microarray cancer data are the curse of dimensionality, small sample size, noisy data, and imbalance class problem. In this work, we are proposing grid search-based hyperparameter tuning (GSHPT) for random forest parameters to classify Microarray Cancer Data. A grid search is designed by a set of fixed parameter values which are essential in providing optimal accuracy on the basis of n-fold cross-validation. In our work, the 10-fold cross validation is considered. The grid search algorithm provides best parameters such as the number of features to consider at each split, number of trees in the forest, the maximum depth of the tree and the minimum number of samples required to be split at the leaf node. The maximum number of trees considered are 10, 20 and 70 respectively for Ovarian, 3-class Leukemia, and 3-class Leukemia cancer data. In the case of MLL and SRBCT, 50 trees are generated to achieve the maximum classification accuracy. The Gini index is employed as criteria to split the nodes and the maximum depth of the tree is set to 2 for all datasets. Experimental results of the proposed work show an improvement over the state of the art methods. The performance of the proposed method is evaluated using standard metrics such as classification accuracy, precision, recall, f1-score, confusion matrix and misclassification rate and comparative analysis is performed and the results are provided to reveal the performance of the proposed method.

Proceedings ArticleDOI
07 Jun 2019
TL;DR: A new iterative refinement algorithm is designed that induces a multi-root dependency tree while predicting the output summary of single-document extractive summarization, and performs competitively against state-of-the-art methods.
Abstract: In this paper, we conceptualize single-document extractive summarization as a tree induction problem. In contrast to previous approaches which have relied on linguistically motivated document representations to generate summaries, our model induces a multi-root dependency tree while predicting the output summary. Each root node in the tree is a summary sentence, and the subtrees attached to it are sentences whose content relates to or explains the summary sentence. We design a new iterative refinement algorithm: it induces the trees through repeatedly refining the structures predicted by previous iterations. We demonstrate experimentally on two benchmark datasets that our summarizer performs competitively against state-of-the-art methods.

Journal ArticleDOI
10 Dec 2019
TL;DR: Comparing the multitudes of genomes for phylogenetic relatedness is computationally expensive and laborious and there are many strategies to reduce the complexity of the data for downstream analysis, especially using nucleotide stretches of length k (kmers).
Abstract: In the past decade, the number of publicly available bacterial genomes has increased dramatically. These genomes have been generated for impactful initiatives, especially in the field of genomic epidemiology (Brown, Dessai, McGarry, & Gerner-Smidt, 2019; Timme et al., 2017). Genomes are sequenced, shared publicly, and subsequently analyzed for phylogenetic relatedness. If two genomes of epidemiological interest are found to be related, further investigation might be prompted. However, comparing the multitudes of genomes for phylogenetic relatedness is computationally expensive and, with large numbers, laborious. Consequently, there are many strategies to reduce the complexity of the data for downstream analysis, especially using nucleotide stretches of length k (kmers).

Proceedings ArticleDOI
01 Jul 2019
TL;DR: A new way to represent social-media conversations as binarized constituency trees that allows comparing features in source-posts and their replies effectively and to use convolution units in Tree LSTMs that are better at learning patterns in features obtained from the source and reply posts is proposed.
Abstract: Learning from social-media conversations has gained significant attention recently because of its applications in areas like rumor detection. In this research, we propose a new way to represent social-media conversations as binarized constituency trees that allows comparing features in source-posts and their replies effectively. Moreover, we propose to use convolution units in Tree LSTMs that are better at learning patterns in features obtained from the source and reply posts. Our Tree LSTM models employ multi-task (stance + rumor) learning and propagate the useful stance signal up in the tree for rumor classification at the root node. The proposed models achieve state-of-the-art performance, outperforming the current best model by 12% and 15% on F1-macro for rumor-veracity classification and stance classification tasks respectively.

Proceedings ArticleDOI
26 May 2019
TL;DR: This paper reports on the experience of deploying a new deep learning tree-based defect prediction model built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code.
Abstract: Defects are common in software systems and cause many problems for software users. Different methods have been developed to make early prediction about the most likely defective modules in large codebases. Most focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and multiple levels of semantics of source code, a potentially important capability for building accurate prediction models. In this paper, we report on our experience of deploying a new deep learning tree-based defect prediction model in practice. This model is built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. We discuss a number of lessons learned from developing the model and evaluating it on two datasets, one from open source projects contributed by our industry partner Samsung and the other from the public PROMISE repository.


Journal ArticleDOI
01 Jun 2019-Sensors
TL;DR: Simulation results indicate that the improved basic versions of the tree growth algorithm and the elephant herding optimization swarm intelligence metaheuristics can obtain more consistent and accurate locations of the unknown target nodes in wireless sensor networks topology than other approaches that have been proposed in the literature.
Abstract: Wireless sensor networks, as an emerging paradigm of networking and computing, have applications in diverse fields such as medicine, military, environmental control, climate forecasting, surveillance, etc. For successfully tackling the node localization problem, as one of the most significant challenges in this domain, many algorithms and metaheuristics have been proposed. By analyzing available modern literature sources, it can be seen that the swarm intelligence metaheuristics have obtained significant results in this domain. Research that is presented in this paper is aimed towards achieving further improvements in solving the wireless sensor networks localization problem by employing swarm intelligence. To accomplish this goal, we have improved basic versions of the tree growth algorithm and the elephant herding optimization swarm intelligence metaheuristics and applied them to solve the wireless sensor networks localization problem. In order to determine whether the improvements are accomplished, we have conducted empirical experiments on different sizes of sensor networks ranging from 25 to 150 target nodes, for which distance measurements are corrupted by Gaussian noise. Comparative analysis with other state-of-the-art swarm intelligence algorithms that have been already tested on the same problem instance, the butterfly optimization algorithm, the particle swarm optimization algorithm, and the firefly algorithm, is conducted. Simulation results indicate that our proposed algorithms can obtain more consistent and accurate locations of the unknown target nodes in wireless sensor networks topology than other approaches that have been proposed in the literature.

Proceedings ArticleDOI
Wanhua Li1, Jiwen Lu1, Jianjiang Feng1, Chunjing Xu2, Jie Zhou1, Qi Tian2 
15 Jun 2019
TL;DR: Wang et al. as discussed by the authors proposed a bridge-tree structure to enforce the similarity between neighbor nodes, where local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity-aware weights.
Abstract: Age estimation is an important yet very challenging problem in computer vision. Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process. However, the facial aging process is also a continuous process, and the continuity relationship between different components has not been effectively exploited. In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. The proposed BridgeNet consists of local regressors and gating networks. Local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity aware weights for the results of local regressors by employing the proposed bridge-tree structure, which introduces bridge connections into tree models to enforce the similarity between neighbor nodes. Moreover, these two components of BridgeNet can be jointly learned in an end-to-end way. We show experimental results on the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet outperforms the state-of-the-art methods.