scispace - formally typeset
Search or ask a question

Showing papers on "Sequence learning published in 2019"


Journal ArticleDOI
17 Jul 2019
TL;DR: Point2Sequence as discussed by the authors employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention, and captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure.
Abstract: Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.

233 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, the authors propose a protocol to decide when to update importance weights, which data to use to update them, and how to accumulate the importance weights at each update step.
Abstract: Methods proposed in the literature towards continual deep learning typically operate in a task-based sequential learning setup. A sequence of tasks is learned, one at a time, with all data of current task available but not of previous or future tasks. Task boundaries and identities are known at all times. This setup, however, is rarely encountered in practical applications. Therefore we investigate how to transform continual learning to an online setup. We develop a system that keeps on learning over time in a streaming fashion, with data distributions gradually changing and without the notion of separate tasks. To this end, we build on the work on Memory Aware Synapses, and show how this method can be made online by providing a protocol to decide i) when to update the importance weights, ii) which data to use to update them, and iii) how to accumulate the importance weights at each update step. Experimental results show the validity of the approach in the context of two applications: (self-)supervised learning of a face recognition model by watching soap series and learning a robot to avoid collisions.

159 citations


Proceedings Article
01 Jan 2019
TL;DR: In this article, memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning, which solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.
Abstract: People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

159 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: The framework consists of a 3D convolutional residual network for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling that is optimized in an alternate way for weakly supervised continuous sign language recognition.
Abstract: In this paper, we propose an alignment network with iterative optimization for weakly supervised continuous sign language recognition. Our framework consists of two modules: a 3D convolutional residual network (3D-ResNet) for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling. The above two modules are optimized in an alternate way. In the encoder-decoder sequence learning network, two decoders are included, i.e., LSTM decoder and CTC decoder. Both decoders are jointly trained by maximum likelihood criterion with a soft Dynamic Time Warping (soft-DTW) alignment constraint. The warping path, which indicates the possible alignment between input video clips and sign words, is used to fine-tune the 3D-ResNet as training labels with classification loss. After fine-tuning, the improved features are extracted for optimization of encoder-decoder sequence learning network in next iteration. The proposed algorithm is evaluated on two large scale continuous sign language recognition benchmarks, i.e., RWTH-PHOENIX-Weather and CSL. Experimental results demonstrate the effectiveness of our proposed method.

152 citations


Journal ArticleDOI
TL;DR: This work introduces a dense connection strategy, proposing a novel Densely Connected Graph Convolutional Network (DCGCN), able to integrate both local and non-local features to learn a better structural representation of a graph.
Abstract: We focus on graph-to-sequence learning, which can be framed as transducing graph structures to sequences for text generation. To capture structural information associated with graphs, we investigat...

117 citations


Journal ArticleDOI
TL;DR: This work solves the sequence learning problem as an image classification task using convolutional neural networks, and builds a classification network with stacked residual blocks and having a special design called linear skip gated connection which can benefit information propagation across multiple residual blocks.
Abstract: For skeleton-based action recognition, most of the existing works used recurrent neural networks. Using convolutional neural networks (CNNs) is another attractive solution considering their advantages in parallelization, effectiveness in feature learning, and model base sufficiency. Besides these, skeleton data are low-dimensional features. It is natural to arrange a sequence of skeleton features chronologically into an image, which retains the original information. Therefore, we solve the sequence learning problem as an image classification task using CNNs. For better learning ability, we build a classification network with stacked residual blocks and having a special design called linear skip gated connection which can benefit information propagation across multiple residual blocks. When arranging the coordinates of body joints in one frame into a skeleton feature, we systematically investigate the performance of part-based, chain-based, and traversal-based orders. Furthermore, a fully convolutional permutation network is designed to learn an optimized order for data rearrangement. Without any bells and whistles, our proposed model achieves state-of-the-art performance on two challenging benchmark datasets, outperforming existing methods significantly.

106 citations


01 Jan 2019
TL;DR: In this paper, an end-to-end deep learning framework was proposed to predict the number of passengers alighting at each station in the near future, given the last few short-term periods.
Abstract: Abstract The accurate short-term passenger flow prediction is of great significance for real-time public transit management, timely emergency response as well as systematical medium and long-term planning. In this paper, we propose an end-to-end deep learning framework that can simultaneously make multi-step predictions for all stations in a large scale metro system. A sequence to sequence model embedded with the attention mechanism forms the backbone of this framework. The sequence to sequence model consists of an encoder network and a decoder network, making it good at modeling sequential data with varying lengths and the attention mechanism further enhances its ability to capture long-range dependencies. We use the proposed framework to predict the number of passengers alighting at each station in the near future, given the number of passengers boarding at each station in the last few short-term periods. The large quantities of real-world data collected from Singapore’s metro system are used to validate the proposed model. In addition, a set of comparisons made among our model and other classical approaches evidently indicates that the proposed model is more scalable and robust than other baselines in making multi-step and network-wide predictions for short-term passenger flow.

95 citations


Journal ArticleDOI
TL;DR: It is found that a large fraction of non-greedy decisions that humans make in volatile environments do not stem from exploration but from the limited precision of learning, and further identify its neurophysiological correlates.
Abstract: When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these ‘non-greedy’ decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus–norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning. Findling, Skvortsova et al. find that a large fraction of non-greedy decisions that humans make in volatile environments do not stem from exploration but from the limited precision of learning, and further identify its neurophysiological correlates.

87 citations


Proceedings ArticleDOI
18 Mar 2019
TL;DR: An attention-based sequence-to-sequence model that combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent network to decode the actual character sequence.
Abstract: Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.

81 citations


Proceedings ArticleDOI
Dehong Ma1, Sujian Li1, Fangzhao Wu, Xing Xie, Houfeng Wang1 
01 Jul 2019
TL;DR: To formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively, the proposed gated unit networks and position-aware attention mechanism are designed.
Abstract: Aspect term extraction (ATE) aims at identifying all aspect terms in a sentence and is usually modeled as a sequence labeling problem. However, sequence labeling based methods cannot make full use of the overall meaning of the whole sentence and have the limitation in processing dependencies between labels. To tackle these problems, we first explore to formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively. At the same time, to make Seq2Seq learning suit to ATE where labels correspond to words one by one, we design the gated unit networks to incorporate corresponding word representation into the decoder, and position-aware attention to pay more attention to the adjacent words of a target word. The experimental results on two datasets show that Seq2Seq learning is effective in ATE accompanied with our proposed gated unit networks and position-aware attention mechanism.

79 citations


Journal ArticleDOI
Yingying Chen1, Jinqiao Wang1, Bingke Zhu1, Ming Tang1, Hanqing Lu1 
TL;DR: An end-to-end deep sequence learning architecture for moving object detection is proposed and a novel attention long short-term memory (Attention ConvLSTM) is proposed to model pixelwise changes over time.
Abstract: Moving object detection is an essential, well-studied but still open problem in computer vision and plays a fundamental role in many applications. Traditional approaches usually reconstruct background images with hand-crafted visual features, such as color, texture, and edge. Due to lack of prior knowledge or semantic information, it is difficult to deal with complicated and rapid changing scenes. To exploit the temporal structure of the pixel-level semantic information, in this paper, we propose an end-to-end deep sequence learning architecture for moving object detection. First, the video sequences are input into a deep convolutional encoder–decoder network for extracting pixel-wise semantic features. Then, to exploit the temporal context, we propose a novel attention long short-term memory (Attention ConvLSTM) to model pixelwise changes over time. A spatial transformer network and a conditional random field layer are finally appended to reduce the sensitivity to camera motion and smooth the foreground boundaries. A multi-task loss is proposed to jointly optimization for frame-based classification and temporal prediction in an end-to-end network. Experimental results on CDnet 2014 and LASIESTA show 12.15% and 16.71% improvement to the state of the art, respectively.

Journal ArticleDOI
TL;DR: This new hypothesis that different learning mechanisms act in synergy as they affect neural structures often relying on the widespread action of neuromodulators is articulate and empirical evidence supporting it is discussed by specifically referring to motor adaptation and sequence learning.

Proceedings ArticleDOI
TL;DR: ProSeNet is proposed, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning that can achieve accuracy on par with state-of-the-art deep learning models and provides a user-friendly approach to model steering.
Abstract: One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence modeling, it is still challenging to explain the rationales behind the model outputs, which is essential for building trust and supporting the domain experts to validate, critique and refine the model. We propose ProSeNet, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning. The prediction is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain. For better interpretability, we define several criteria for constructing the prototypes, including simplicity, diversity, and sparsity and propose the learning objective and the optimization procedure. ProSeNet also provides a user-friendly approach to model steering: domain experts without any knowledge on the underlying model or parameters can easily incorporate their intuition and experience by manually refining the prototypes. We conduct experiments on a wide range of real-world applications, including predictive diagnostics for automobiles, ECG, and protein sequence classification and sentiment analysis on texts. The result shows that ProSeNet can achieve accuracy on par with state-of-the-art deep learning models. We also evaluate the interpretability of the results with concrete case studies. Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate that the model selects high-quality prototypes which align well with human knowledge and can be interactively refined for better interpretability without loss of performance.

Journal ArticleDOI
04 Feb 2019-eLife
TL;DR: These findings support the existence of multiple computational systems for sequence processing involving statistical inferences at multiple scales and the possibility that successive brain responses reflect the progressive extraction of sequence statistics at different timescales.
Abstract: Extracting the temporal structure of sequences of events is crucial for perception, decision-making, and language processing. Here, we investigate the mechanisms by which the brain acquires knowledge of sequences and the possibility that successive brain responses reflect the progressive extraction of sequence statistics at different timescales. We measured brain activity using magnetoencephalography in humans exposed to auditory sequences with various statistical regularities, and we modeled this activity as theoretical surprise levels using several learning models. Successive brain waves related to different types of statistical inferences. Early post-stimulus brain waves denoted a sensitivity to a simple statistic, the frequency of items estimated over a long timescale (habituation). Mid-latency and late brain waves conformed qualitatively and quantitatively to the computational properties of a more complex inference: the learning of recent transition probabilities. Our findings thus support the existence of multiple computational systems for sequence processing involving statistical inferences at multiple scales.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This paper proposed a parallel iterative edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC) and OCR correction.
Abstract: We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC). Recent approaches are based on the popular encoder-decoder (ED) model for sequence to sequence learning. The ED model auto-regressively captures full dependency among output tokens but is slow due to sequential decoding. The PIE model does parallel decoding, giving up the advantage of modeling full dependency in the output, yet it achieves accuracy competitive with the ED model for four reasons: 1. predicting edits instead of tokens, 2. labeling sequences instead of generating sequences, 3. iteratively refining predictions to capture dependencies, and 4. factorizing logits over edits and their token argument to harness pre-trained language models like BERT. Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.

Proceedings ArticleDOI
09 Jul 2019
TL;DR: In this article, a joint ASR and speaker diarization system using a recurrent neural network transducer was proposed to tackle the two tasks by using both linguistic and acoustic cues to infer speaker roles.
Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective functions. Often the SD systems operate directly on the acoustics and are not constrained to respect word boundaries and this deficiency is overcome in an ad hoc manner. Motivated by recent advances in sequence to sequence learning, we propose a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer. Our approach utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues. We evaluated the performance of our approach on a large corpus of medical conversations between physicians and patients. Compared to a competitive conventional baseline, our approach improves word-level diarization error rate from 15.8% to 2.2%.


Journal ArticleDOI
TL;DR: Analysis of training-dependent and off-line changes of two sub-processes of procedural learning: namely, sequence learning and statistical learning can contribute to a deeper understanding of the dynamic changes of multiple parallel learning and consolidation processes that occur during procedural memory formation.
Abstract: Procedural learning is a fundamental cognitive function that facilitates efficient processing of and automatic responses to complex environmental stimuli. Here, we examined training-dependent and off-line changes of two sub-processes of procedural learning: namely, sequence learning and statistical learning. Whereas sequence learning requires the acquisition of order-based relationships between the elements of a sequence, statistical learning is based on the acquisition of probabilistic associations between elements. Seventy-eight healthy young adults (58 females and 20 males) completed the modified version of the Alternating Serial Reaction Time task that was designed to measure Sequence and Statistical Learning simultaneously. After training, participants were randomly assigned to one of three conditions: active wakefulness, quiet rest, or daytime sleep. We examined off-line changes in Sequence and Statistical Learning as well as further improvements after extended practice. Performance in Sequence Learning increased during training, while Statistical Learning plateaued relatively rapidly. After the off-line period, both the acquired sequence and statistical knowledge was preserved, irrespective of the vigilance state (awake, quiet rest or sleep). Sequence Learning further improved during extended practice, while Statistical Learning did not. Moreover, within the sleep group, cortical oscillations and sleep spindle parameters showed differential associations with Sequence and Statistical Learning. Our findings can contribute to a deeper understanding of the dynamic changes of multiple parallel learning and consolidation processes that occur during procedural memory formation.

Posted Content
TL;DR: Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.
Abstract: We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC). Recent approaches are based on the popular encoder-decoder (ED) model for sequence to sequence learning. The ED model auto-regressively captures full dependency among output tokens but is slow due to sequential decoding. The PIE model does parallel decoding, giving up the advantage of modelling full dependency in the output, yet it achieves accuracy competitive with the ED model for four reasons: 1.~predicting edits instead of tokens, 2.~labeling sequences instead of generating sequences, 3.~iteratively refining predictions to capture dependencies, and 4.~factorizing logits over edits and their token argument to harness pre-trained language models like BERT. Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.

Journal ArticleDOI
Chenglu Sun1, Jiahao Fan1, Chen Chen1, Wei Li1, Wei Chen1 
TL;DR: Model analysis displayed that the combination of the hand-crafted features and network trained features can improve the classification performance via the comparison experiments and the RNN is a good choice for learning temporal information in sleep epochs.
Abstract: Sleep stage classification is a fundamental but cumbersome task in sleep analysis. To score the sleep stage automatically, this study presents a stage classification method based on a two-stage neural network. The feature learning stage as the first stage can fuse network trained features with traditional hand-crafted features. A recurrent neural network (RNN) in the second stage is fully utilized for learning temporal information between sleep epochs and obtaining classification results. To solve serious sample imbalance problem, a novel pre-training process combined with data augmentation was introduced. The proposed method was evaluated by two public databases, the Sleep-EDF and Sleep Apnea (SA). The proposed method can achieve the F1-score and Kappa coefficient of 0.806 and 0.80 for healthy subjects, respectively, and achieve 0.790 and 0.74 for the subjects with suspect sleep disorders, respectively. The results show that the method can achieve better performance compared to the state-of-the-art methods for the same databases. Model analysis displayed that the combination of the hand-crafted features and network trained features can improve the classification performance via the comparison experiments. In addition, the RNN is a good choice for learning temporal information in sleep epochs. Besides, the pre-training process with data augmentation is verified that can reduce the impact of sample imbalance. The proposed model has potential to exploit sleep information comprehensively.

Proceedings Article
01 Jan 2019
TL;DR: A large-scale empirical study of catastrophic forgetting in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning indicates that there is no model that avoids CF for all investigated datasets and SLTs under application conditions.
Abstract: We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning. A new experimental protocol is proposed that enforces typical constraints encountered in application scenarios. As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF. Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions. We conclude with a discussion of potential solutions and workarounds to CF, notably for the EWC and IMM models.

Posted Content
TL;DR: This work proposes a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer that utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues.
Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective functions. Often the SD systems operate directly on the acoustics and are not constrained to respect word boundaries and this deficiency is overcome in an ad hoc manner. Motivated by recent advances in sequence to sequence learning, we propose a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer. Our approach utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues. We evaluated the performance of our approach on a large corpus of medical conversations between physicians and patients. Compared to a competitive conventional baseline, our approach improves word-level diarization error rate from 15.8% to 2.2%.

Proceedings ArticleDOI
25 Jul 2019
TL;DR: ProSeNet as mentioned in this paper is an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning, which is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain.
Abstract: One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence modeling, it is still challenging to explain the rationales behind the model outputs, which is essential for building trust and supporting the domain experts to validate, critique and refine the model. We propose ProSeNet, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning. The prediction is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain. For better interpretability, we define several criteria for constructing the prototypes, including simplicity, diversity, and sparsity and propose the learning objective and the optimization procedure. ProSeNet also provides a user-friendly approach to model steering: domain experts without any knowledge on the underlying model or parameters can easily incorporate their intuition and experience by manually refining the prototypes. We conduct experiments on a wide range of real-world applications, including predictive diagnostics for automobiles, ECG, and protein sequence classification and sentiment analysis on texts. The result shows that ProSeNet can achieve accuracy on par with state-of-the-art deep learning models. We also evaluate the interpretability of the results with concrete case studies. Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate that the model selects high-quality prototypes which align well with human knowledge and can be interactively refined for better interpretability without loss of performance.

Journal ArticleDOI
17 Jul 2019
TL;DR: The authors propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood, thus providing better interpretability to users.
Abstract: Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.

Posted Content
TL;DR: In this article, a large-scale empirical study of catastrophic forgetting in modern deep neural network (DNN) models that perform sequential (or: incremental) learning is presented, and a new experimental protocol is proposed that enforces typical constraints encountered in application scenarios.
Abstract: We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning. A new experimental protocol is proposed that enforces typical constraints encountered in application scenarios. As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF. Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions. We conclude with a discussion of potential solutions and workarounds to CF, notably for the EWC and IMM models.

Proceedings ArticleDOI
31 May 2019
TL;DR: This work approaches the medical concept normalization problem as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions.
Abstract: In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions. Our experimental evaluation over three different benchmarks shows that neural architectures leverage the semantic meaning of the entity mention and significantly outperform existing state of the art models.

Journal ArticleDOI
TL;DR: A collaborative optimization algorithm, combining ant colony optimization and a genetic algorithm to provide learners with a personalized learning path, establishes that the hybrid approach provides a better solution than the traditional approach.

Journal ArticleDOI
Bo Dai1, Chongshi Gu1, Erfeng Zhao1, Kai Zhu1, Wenhan Cao1, Xiangnan Qin1 
TL;DR: The comparative results demonstrate that the improved online sequential extreme learning machine can provide highly accurate forecasts and reasonably identify crack behavior.
Abstract: Prediction models are essential in dam crack behavior identification. Prototype monitoring data arrive sequentially in dam safety monitoring. Given such characteristic, sequential learning algorith...

Journal ArticleDOI
TL;DR: A review of the variety of attempts to use RL for instructional sequencing finds that reinforcement learning has been most successful in cases where it has been constrained with ideas and theories from cognitive psychology and the learning sciences.
Abstract: Since the 1960s, researchers have been trying to optimize the sequencing of instructional activities using the tools of reinforcement learning (RL) and sequential decision making under uncertainty. Many researchers have realized that reinforcement learning provides a natural framework for optimal instructional sequencing given a particular model of student learning, and excitement towards this area of research is as alive now as it was over fifty years ago. But does RL actually help students learn? If so, when and where might we expect it to be most helpful? To help answer these questions, we review the variety of attempts to use RL for instructional sequencing. First, we present a historical narrative of this research area. We identify three waves of research, which gives us a sense of the various communities of researchers that have been interested in this problem and where the field is going. Second, we review all of the empirical research that has compared RL-induced instructional policies to baseline methods of sequencing. We find that over half of the studies found that RL-induced policies significantly outperform baselines. Moreover, we identify five clusters of studies with different characteristics and varying levels of success in using RL to help students learn. We find that reinforcement learning has been most successful in cases where it has been constrained with ideas and theories from cognitive psychology and the learning sciences. However, given that our theories and models are limited, we also find that it has been useful to complement this approach with running more robust offline analyses that do not rely heavily on the assumptions of one particular model. Given that many researchers are turning to deep reinforcement learning and big data to tackle instructional sequencing, we believe keeping these best practices in mind can help guide the way to the reward in using RL for instructional sequencing.

Journal ArticleDOI
TL;DR: Experimental results of three natural language tasks confirm that the proposed RNN--SVAE yields higher performance than two benchmark models, and the mean and standard deviation of the continuous semantic space are learned to take advantage of the variational method.