scispace - formally typeset
Search or ask a question

Showing papers on "Layer (object-oriented design) published in 2020"


Proceedings ArticleDOI
23 Aug 2020
TL;DR: This paper proposes a general graph neural network framework designed specifically for multivariate time series data that outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
Abstract: Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.

576 citations


Posted Content
TL;DR: Universal Dependencies as mentioned in this paper is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework, which consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers.
Abstract: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

290 citations


Journal ArticleDOI
TL;DR: This review studies intensive research to obtain a comprehensive framework for Mixed reality applications and introduces MR development steps and analytical models, a simulation toolkit, system types, and architecture types, in addition to practical issues for stakeholders such as considering MR different domains.
Abstract: Currently, new technologies have enabled the design of smart applications that are used as decision-making tools in the problems of daily life. The key issue in designing such an application is the increasing level of user interaction. Mixed reality (MR) is an emerging technology that deals with maximum user interaction in the real world compared to other similar technologies. Developing an MR application is complicated, and depends on the different components that have been addressed in previous literature. In addition to the extraction of such components, a comprehensive study that presents a generic framework comprising all components required to develop MR applications needs to be performed. This review studies intensive research to obtain a comprehensive framework for MR applications. The suggested framework comprises five layers: the first layer considers system components; the second and third layers focus on architectural issues for component integration; the fourth layer is the application layer that executes the architecture; and the fifth layer is the user interface layer that enables user interaction. The merits of this study are as follows: this review can act as a proper resource for MR basic concepts, and it introduces MR development steps and analytical models, a simulation toolkit, system types, and architecture types, in addition to practical issues for stakeholders such as considering MR different domains.

118 citations


Posted Content
TL;DR: Semantic Calibration for Cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism, demonstrating the effectiveness and flexibility of the proposed attention based soft layer association mechanism for cross-layer distillation.
Abstract: Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. Code is avaliable at \url{this https URL}.

92 citations


Journal ArticleDOI
TL;DR: This study presents a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution.
Abstract: Training of generative adversarial network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores.

69 citations


Proceedings Article
01 May 2020
TL;DR: Universal Dependencies as mentioned in this paper is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework, which consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers.
Abstract: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the universal guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

62 citations


Book ChapterDOI
23 Aug 2020
TL;DR: This paper presents a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy and can improve the performance on the COCO dataset compared to the non-local operation.
Abstract: Non-local operation is widely explored to model the long-range dependencies. However, the redundant computation in this operation leads to a prohibitive complexity. In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy. Instead of propagating the messages from all positions, our RepGraph layer computes the response of one node merely with a few representative nodes. The locations of representative nodes come from a learned spatial offset matrix. The RepGraph layer is flexible to integrate into many visual architectures and combine with other operations. With the application of semantic segmentation, without any bells and whistles, our RepGraph network can compete or perform favourably against the state-of-the-art methods on three challenging benchmarks: ADE20K, Cityscapes, and PASCAL-Context datasets. In the task of object detection, our RepGraph layer can also improve the performance on the COCO dataset compared to the non-local operation. Code is available at https://git.io/RepGraph.

45 citations


Journal ArticleDOI
TL;DR: An integrated architecture of Convolutional Neural Network and Long Short-Term Memory network is proposed to identify the polarity of words on the Google cloud and performing computations on Google Colaboratory to provide an appropriate solution for analyzing sentiments and classification of the opinions into positive and negative classes.
Abstract: The rapid development of social media, and special websites with critical reviews of products have created a huge collection of resources for customers all over the world. These data may contain a lot of information including product reviews, predicting market changes, and the polarity of opinions. Machine learning and deep learning algorithms provide the necessary tools for intelligence analysis in these challenges. In current competitive markets, it is essential to understand opinions, and sentiments of reviewers by extracting and analyzing their features. Besides, processing and analyzing this volume of data in the cloud can increase the cost of the system, strongly. Fewer dependencies on expensive hardware, storage space, and related software can be provided through cloud computing and Natural Language Processing (NLP). In our work, we propose an integrated architecture of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network to identify the polarity of words on the Google cloud and performing computations on Google Colaboratory. Our proposed model based on deep learning algorithms with word embedding technique learns features through a CNN layer, and these features are fed directly into a bidirectional LSTM layer to capture long-term feature dependencies. Then, they can be reused from a CNN layer to provide abstract features before final dense layers. The main goal for this work is to provide an appropriate solution for analyzing sentiments and classification of the opinions into positive and negative classes. Our implementations show that found on the proposed model, the accuracy of more than 89.02% is achievable.

35 citations


Journal ArticleDOI
TL;DR: A functional Bayesian network is further constructed to infer the information from the basic layer, which can be customized according to user demands, such as fault detection, fault diagnosis, and classification of operating status.
Abstract: In this brief, a hierarchical Bayesian network modeling framework is formulated for large-scale process monitoring and decision making, which includes a basic layer and a functional layer. First, the whole process is decomposed into different units, where local Bayesian networks are constructed, providing monitoring information and decision-making capability for the upper layer. The network structure is determined automatically based on the process data in each local unit of the basic layer. Then, through incorporating the topological structure of the process, a functional Bayesian network is further constructed to infer the information from the basic layer, which can be customized according to user demands, such as fault detection, fault diagnosis, and classification of operating status. The performance of the proposed method is evaluated through a benchmark process.

32 citations


Journal ArticleDOI
14 Mar 2020-Sensors
TL;DR: This work proposes a mixture of “static” and “dynamic” activation functions, which are stochastically selected at each layer of a CNN, to design new models to be used as stand-alone networks or as a component of an ensemble.
Abstract: In recent years, the field of deep learning has achieved considerable success in pattern recognition, image segmentation, and many other classification fields. There are many studies and practical applications of deep learning on images, video, or text classification. Activation functions play a crucial role in discriminative capabilities of the deep neural networks and the design of new "static" or "dynamic" activation functions is an active area of research. The main difference between "static" and "dynamic" functions is that the first class of activations considers all the neurons and layers as identical, while the second class learns parameters of the activation function independently for each layer or even each neuron. Although the "dynamic" activation functions perform better in some applications, the increased number of trainable parameters requires more computational time and can lead to overfitting. In this work, we propose a mixture of "static" and "dynamic" activation functions, which are stochastically selected at each layer. Our idea for model design is based on a method for changing some layers along the lines of different functional blocks of the best performing CNN models, with the aim of designing new models to be used as stand-alone networks or as a component of an ensemble. We propose to replace each activation layer of a CNN (usually a ReLU layer) by a different activation function stochastically drawn from a set of activation functions: in this way, the resulting CNN has a different set of activation function layers.

31 citations


Posted Content
TL;DR: This work takes a multi-task learning approach, where the classification is implemented as an attention layer and enforces the model to also represent the prior, which leads to a strong inductive bias.
Abstract: A main challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image, or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for the perception and prior knowledge. Instead, we take a multi-task learning approach, where the classification is implemented as an attention layer. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model to also represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations leads to a significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning, this leads to accurate predictions with 1% of annotated images only.

Journal ArticleDOI
TL;DR: The method proposed can spark attention and provide basic insights into further theoretical research and practical application of multilayer networks, including detecting spreading routes and locating sources of rumors or pseudo news.
Abstract: This study focuses on topology identification in two-layer networks with peer-to-peer unidirectional couplings, where one layer (the response layer) receives information from the other layer (the drive layer). The goal is to construct a theoretical framework for identifying the topology of the response layer based on the dynamics observed in both layers. In particular, an auxiliary layer is constructed. Based on the LaSalle-type invariance principle, simple control inputs and updating laws are designed to enable nodes in the auxiliary layer to reach complete synchronization with their counterparts in the response layer. Simultaneously, the topology of the response layer is adaptively identified. Numerical simulations are conducted to illustrate the effectiveness of the method. The impact of the inter-layer information transmission speed on the identification performance is further investigated. It is revealed that neither too slow or too fast information transmission favors efficient identification, and there exists an optimal level of transmission speed. The duplex framework can model many real-world systems, such as communication-rumor spreading networks. Therefore, the method proposed in this study can spark attention and provide basic insights into further theoretical research and practical application of multilayer networks, including detecting spreading routes and locating sources of rumors or pseudo news.

Journal ArticleDOI
TL;DR: This article presents a software layer to abstract users of unmanned aerial vehicles from the specific hardware of the platform and the autopilot interfaces, to simplify the development and testing of higher-level algorithms in aerial robotics.
Abstract: This article presents a software layer to abstract users of unmanned aerial vehicles from the specific hardware of the platform and the autopilot interfaces. The main objective of our unmanned aeri...

Journal ArticleDOI
TL;DR: An empirical analysis of feature dependencies in three real-world automotive systems shows that features in modern vehicles are highly interdependent and that developers are not aware of these dependencies in most cases.

Posted Content
07 Feb 2020
TL;DR: A novel topological layer for general deep learning models based on persistent landscapes, in which the topological structure is learned during training via backpropagation, without requiring any input featurization or data preprocessing.
Abstract: We propose a novel topological layer for general deep learning models based on persistent landscapes, in which we can efficiently exploit underlying topological features of the input data structure. We use the robust DTM function and show differentiability with respect to layer inputs, for a general persistent homology with arbitrary filtration. Thus, our proposed layer can be placed anywhere in the network architecture and feed critical information on the topological features of input data into subsequent layers to improve the learnability of the networks toward a given task. A task-optimal structure of the topological layer is learned during training via backpropagation, without requiring any input featurization or data preprocessing. We provide a tight stability theorem, and show that the proposed layer is robust towards noise and outliers. We demonstrate the effectiveness of our approach by classification experiments on various datasets.

Journal ArticleDOI
TL;DR: This paper presents a novel hybrid deep learning network for human activity recognition that also employs multimodal sensor data, and the proposed model is a ConvLSTM pipeline that makes full use of the information in each layer extracted along the temporal domain.
Abstract: Human activity recognition (HAR) using body-worn sensors is an active research area in human-computer interaction and human activity analysis. The traditional methods use hand-crafted features to classify multiple activities, which is both heavily dependent on human domain knowledge and results in shallow feature extraction. Rapid developments in deep learning have caused most researchers to switch to deep learning methods, which extract features from raw data automatically. Most of the existing works on human activity recognition tasks involve multimodal sensor data, and these networks mainly focus on the top representation extracted from bottom-up feedforward process without reusing other features from bottom layers. In this paper, we present a novel hybrid deep learning network for human activity recognition that also employs multimodal sensor data; however, our proposed model is a ConvLSTM pipeline that makes full use of the information in each layer extracted along the temporal domain. Thus, we propose a dense connection module (DCM) to ensure maximum information flow between the network layers. Furthermore, we employ a multilayer feature aggregation module (MFAM) to extract features along the spatial domain, and we aggregate the features obtained from every convolutional layer according to the importance of features in different spatial locations. The output of the MFAM is input into two LSTM layers to further model the temporal dependencies. Finally, a fully connected layer and a softmax function are used to compute the probability of each class. We demonstrate the effectiveness of our proposed model on two benchmark datasets: Opportunity and UniMiB-SHAR. The results illustrate that our designed network outperforms the state-of-the-art models. We also conduct experiments on efficiency, multimodal fusion and different hyperparameters to analyze our proposed network. Finally, we carry out ablation and visualization experiments to reveal the effectiveness of the two proposed modules.

Proceedings ArticleDOI
08 Nov 2020
TL;DR: This paper describes the set up needed for the application of a probabilistic approach, and proposes the use of a novel tool – the scenario theory – to overcome limitations of the traditional tools from statistics.
Abstract: This paper discusses the problem of testing the performance of the adaptation layer in a self-adaptive system. The problem is notoriously hard, due to the high degree of uncertainty and variability inherent in an adaptive software application. In particular, providing any type of formal guarantee for this problem is extremely difficult. In this paper we propose the use of a rigorous probabilistic approach to overcome the mentioned difficulties and provide probabilistic guarantees on the software performance. We describe the set up needed for the application of a probabilistic approach. We then discuss the traditional tools from statistics that could be applied to analyse the results, highlighting their limitations and motivating why they are unsuitable for the given problem. We propose the use of a novel tool – the scenario theory – to overcome said limitations. We conclude the paper with a thorough empirical evaluation of the proposed approach, using two adaptive software applications: the Tele-Assistance Service and the Self-Adaptive Video Encoder. With the first, we empirically expose the trade-off between data collection and confidence in the testing campaign. With the second, we demonstrate how to compare different adaptation strategies.

Journal ArticleDOI
Zidong Du1, Qi Guo1, Zhao Yongwei1, Tian Zhi1, Yunji Chen1, Zhiwei Xu1 
24 Mar 2020
TL;DR: A comprehensive SaNNS from a new perspective, that is, the model layer, to exploit more opportunities for high efficiency is proposed, called as MinMaxNN, which features model switching and elastic sparsity based on monitored information from the execution environment.
Abstract: Neural network (NN) processors are specially designed to handle deep learning tasks by utilizing multilayer artificial NNs. They have been demonstrated to be useful in broad application fields such as image recognition, speech processing, machine translation, and scientific computing. Meanwhile, innovative self-aware techniques, whereby a system can dynamically react based on continuously sensed information from the execution environment, have attracted attention from both academia and industry. Actually, various self-aware techniques have been applied to NN systems to significantly improve the computational speed and energy efficiency. This article surveys state-of-the-art self-aware NN systems (SaNNSs), which can be achieved at different layers, that is, the architectural layer, the physical layer, and the circuit layer. At the architectural layer, SaNNS can be characterized from a data-centric perspective where different data properties (i.e., data value, data precision, dataflow, and data distribution) are exploited. At the physical layer, various parameters of physical implementation are considered. At the circuit layer, different logics and devices can be used for high efficiency. In fact, the self-awareness of existing SaNNS is still in a preliminary form. We propose a comprehensive SaNNS from a new perspective, that is, the model layer, to exploit more opportunities for high efficiency. The proposed system is called as MinMaxNN, which features model switching and elastic sparsity based on monitored information from the execution environment. The model switching mechanism implies that models (i.e., min and max model) dynamically switch given different inputs for both efficiency and accuracy. The elastic sparsity mechanism indicates that the sparsity of NNs can be dynamically adjusted in each layer for efficiency. The experimental results show that compared with traditional SaNNS, MinMaxNN can achieve $5.64\times $ and 19.66% performance improvement and energy reduction, respectively, without notable loss of accuracy and negative effects on developers’ productivity.

Journal ArticleDOI
Ming Tong1, Yiran Chen1, Lei Ma1, He Bai1, Xing Yue1 
TL;DR: A new nonnegative matrix factorization with local constraint (LC-NMF) is presented, which fully mines the spatiotemporal relationship in a video not only between adjacent frames, but also between multi-interval frames and a Deep NMF method is established, which takes the proposed TD- NMF as the unit algorithm of each layer.
Abstract: In order to improve action recognition accuracy, a new nonnegative matrix factorization with local constraint (LC-NMF) is firstly presented. By applying it for effective trajectory clustering, complex backgrounds are removed and then the motion salient regions are obtained. Secondly, a nonnegative matrix factorization with temporal dependencies constraint (TD-NMF) is proposed, which fully mines the spatiotemporal relationship in a video not only between adjacent frames, but also between multi-interval frames. Meanwhile, the introduction of $$ l_{2,1} $$-norm makes the spatiotemporal features possess better sparseness and robustness. In addition, these features are directly learned from data and thus have an inherent generalization ability. Finally, a Deep NMF method is established, which takes the proposed TD-NMF as the unit algorithm of each layer. By introducing the hierarchical feature extraction strategy, the base matrix of the first layer is gradually decomposed; then, it is supplemented and completed layer by layer. Consequently, the more complete and accurate local feature estimations are obtained, and then the discriminative and expressive abilities of features are effectively enhanced and recognition performance is further improved. Adequate and extensive experiments verify the effectiveness of the proposed methods. Moreover, the update rules and convergence proofs for LC-NMF and TD-NMF are also given.

Posted Content
TL;DR: A novel zero-shot representation learning framework that jointly learns discriminative global and local features using only class-level attributes and points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of the image representation.
Abstract: From the beginning of zero-shot learning research, visual attributes have been shown to play an important role In order to better transfer attribute-based knowledge from known to unknown classes, we argue that an image representation with integrated attribute localization ability would be beneficial for zero-shot learning To this end, we propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features using only class-level attributes While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features We show that our locality augmented image representations achieve a new state-of-the-art on three zero-shot learning benchmarks As an additional benefit, our model points to the visual evidence of the attributes in an image, eg for the CUB dataset, confirming the improved attribute localization ability of our image representation

Posted Content
TL;DR: FADER, a novel technique for speeding up detection-based methods, is introduced, which addresses the issues above by employing RBF networks as detectors and fixing the number of required prototypes, the runtime complexity of adversarial examples detectors can be controlled.
Abstract: Deep neural networks are vulnerable to adversarial examples, i.e., carefully-crafted inputs that mislead classification at test time. Recent defenses have been shown to improve adversarial robustness by detecting anomalous deviations from legitimate training samples at different layer representations - a behavior normally exhibited by adversarial attacks. Despite technical differences, all aforementioned methods share a common backbone structure that we formalize and highlight in this contribution, as it can help in identifying promising research directions and drawbacks of existing methods. The first main contribution of this work is the review of these detection methods in the form of a unifying framework designed to accommodate both existing defenses and newer ones to come. In terms of drawbacks, the overmentioned defenses require comparing input samples against an oversized number of reference prototypes, possibly at different representation layers, dramatically worsening the test-time efficiency. Besides, such defenses are typically based on ensembling classifiers with heuristic methods, rather than optimizing the whole architecture in an end-to-end manner to better perform detection. As a second main contribution of this work, we introduce FADER, a novel technique for speeding up detection-based methods. FADER overcome the issues above by employing RBF networks as detectors: by fixing the number of required prototypes, the runtime complexity of adversarial examples detectors can be controlled. Our experiments outline up to 73x prototypes reduction compared to analyzed detectors for MNIST dataset and up to 50x for CIFAR10 dataset respectively, without sacrificing classification accuracy on both clean and adversarial data.

Journal ArticleDOI
13 Jan 2020-Sensors
TL;DR: An executive module is proposed that coordinates the activity of several independent modules that are connected by an inter-process communication mechanism and uses hierarchical interpreted binary Petri nets to define the behavior expected from the car in different scenarios according to the traffic rules.
Abstract: Most autonomous car control frameworks are based on a middleware layer with several independent modules that are connected by an inter-process communication mechanism. These modules implement basic actions and report events about their state by subscribing and publishing messages. Here, we propose an executive module that coordinates the activity of these modules. This executive module uses hierarchical interpreted binary Petri nets (PNs) to define the behavior expected from the car in different scenarios according to the traffic rules. The module commands actions by sending messages to other modules and evolves its internal state according to the events (messages) received. A programming environment named RoboGraph (RG) is introduced with this architecture. RG includes a graphical interface that allows the edition, execution, tracing, and maintenance of the PNs. For the execution, a dispatcher loads these PNs and executes the different behaviors. The RG monitor that shows the state of all the running nets has proven to be very useful for debugging and tracing purposes. The whole system has been applied to an autonomous car designed for elderly or disabled people.

Posted Content
TL;DR: A novel Memory based Attentive Fusion layer, which fuses modes by incorporating both the current features and longterm dependencies in the data, thus allowing the model to understand the relative importance of modes over time.
Abstract: The use of multi-modal data for deep machine learning has shown promise when compared to uni-modal approaches with fusion of multi-modal features resulting in improved performance in several applications. However, most state-of-the-art methods use naive fusion which processes feature streams independently, ignoring possible long-term dependencies within the data during fusion. In this paper, we present a novel Memory based Attentive Fusion layer, which fuses modes by incorporating both the current features and longterm dependencies in the data, thus allowing the model to understand the relative importance of modes over time. We introduce an explicit memory block within the fusion layer which stores features containing long-term dependencies of the fused data. The feature inputs from uni-modal encoders are fused through attentive composition and transformation followed by naive fusion of the resultant memory derived features with layer inputs. Following state-of-the-art methods, we have evaluated the performance and the generalizability of the proposed fusion approach on two different datasets with different modalities. In our experiments, we replace the naive fusion layer in benchmark networks with our proposed layer to enable a fair comparison. Experimental results indicate that the MBAF layer can generalise across different modalities and networks to enhance fusion and improve performance.

Journal ArticleDOI
TL;DR: The results shows that the proposed CSS-IoV scheme provide the best results in terms of both type of communication in VANETs.
Abstract: This article offers a social networking platform for vehicles, based on cloud computing and service oriented architecture called CSS-IoV for communication and collaboration of vehicle to vehicle (V2V), and vehicle to infrastructure (V2I). CSS-IoV contains five layers: application layer, application service layer, services layer, cloud infrastructure layer, and layer of vehicles. The application layer operation is associated with the functionality of on-board units (OBUs) and road-side units (RSUs). The service layer runs on a service-oriented architecture (SOA) and cloud application services to improve applications. To expand and combine web services on this layer, the application developer and builders can effectively develop the new functions and programs for communication. With the support of dynamic and automated service collaboration, people can easily help and communicate with each other by using OBUs. Our results shows that the proposed scheme provide the best results in terms of both type of communication in VANETs.

Journal ArticleDOI
Rang Meng1, Weijie Chen, Di Xie, Zhang Yuan, Shiliang Pu 
03 Apr 2020
TL;DR: Zhang et al. as mentioned in this paper proposed an efficient one-shot layer assignment search approach via inherited sampling, where the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space.
Abstract: Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.

Posted Content
TL;DR: It is demonstrated that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters, and that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count.
Abstract: Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. In this paper we examine this idea further and demonstrate that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. We further demonstrate that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all.

Proceedings ArticleDOI
19 May 2020
TL;DR: Ye et al. as discussed by the authors proposed a cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations, which can build spatial dependencies among multi-level layers and learn more discriminative features.
Abstract: Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations. First, CNL computes correlations between features of a query layer and all response layers. Second, all response features are weighted according to the correlations and are added to the query features. Due to the interactions of cross-layer features, our model builds spatial dependencies among multi-level layers and learns more discriminative features. In addition, we can reduce the aggregation cost if we set low-dimensional deep layer as query layer. Experiments are conducted to show our model achieves or surpasses state-of-the-art results on three benchmark datasets of fine-grained classification. Our codes can be found at github.com/FouriYe/CNL-ICIP2020.

Journal ArticleDOI
03 Apr 2020
TL;DR: Experimental results demonstrate that TP-RNN consistently outperforms existing RNNs for learning long-term and multi-scale dependencies in sequential data.
Abstract: Learning long-term and multi-scale dependencies in sequential data is a challenging task for recurrent neural networks (RNNs). In this paper, a novel RNN structure called temporal pyramid RNN (TP-RNN) is proposed to achieve these two goals. TP-RNN is a pyramid-like structure and generally has multiple layers. In each layer of the network, there are several sub-pyramids connected by a shortcut path to the output, which can efficiently aggregate historical information from hidden states and provide many gradient feedback short-paths. This avoids back-propagating through many hidden states as in usual RNNs. In particular, in the multi-layer structure of TP-RNN, the input sequence of the higher layer is a large-scale aggregated state sequence produced by the sub-pyramids in the previous layer, instead of the usual sequence of hidden states. In this way, TP-RNN can explicitly learn multi-scale dependencies with multi-scale input sequences of different layers, and shorten the input sequence and gradient feedback paths of each layer. This avoids the vanishing gradient problem in deep RNNs and allows the network to efficiently learn long-term dependencies. We evaluate TP-RNN on several sequence modeling tasks, including the masked addition problem, pixel-by-pixel image classification, signal recognition and speaker identification. Experimental results demonstrate that TP-RNN consistently outperforms existing RNNs for learning long-term and multi-scale dependencies in sequential data.

Posted Content
TL;DR: A novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations that builds spatial dependencies among multi-level layers and learns more discriminative features.
Abstract: Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations. First, CNL computes correlations between features of a query layer and all response layers. Second, all response features are weighted according to the correlations and are added to the query features. Due to the interactions of cross-layer features, our model builds spatial dependencies among multi-level layers and learns more discriminative features. In addition, we can reduce the aggregation cost if we set low-dimensional deep layer as query layer. Experiments are conducted to show our model achieves or surpasses state-of-the-art results on three benchmark datasets of fine-grained classification. Our codes can be found at this http URL.

Book ChapterDOI
23 Jan 2020
TL;DR: In this paper, a review of the use of IoT, cloud, and machine learning in the context of agriculture is presented, where various types of sensors used, the type of IoT support that is developed, and the kind of data analytics used in research in recent times have been discussed in this review paper.
Abstract: Agriculture is the backbone of India, and its produce must be optimized. The technology trio of IoT, cloud, and machine learning can have a trending impact on the agriculture domain. Each one of these techniques can contribute to a great extent in increasing agriculture productivity. Most of the research carried out in this direction has implemented a layered architecture. The lowermost layer is the sensor layer, followed by the data collection layer, and finally the data analytics layer. The sensor layer consists of various kinds of sensors like moisture, light, temperature, etc. and along with the IoT provides an ambiance to understand the feature of nomenclature of the plant and generates the data. The second layer is the data collection layer which uses cloud or fog computing basically to structure the enormous amount of data generated from the sensors and finally analysis layer where machine learning or any other rule framework is used to analyze the data to provide the best possible solution for better crop yield. The various types of sensors used, the type of IoT support that is developed, and the kind of data analytics used in research in recent times have been discussed in this review paper.