Top 229 papers published in the topic of Sequence learning in 2019

Journal Article•DOI•

Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network

[...]

Xinhai Liu¹, Zhizhong Han², Yu-Shen Liu¹, Matthias Zwicker²•Institutions (2)

Tsinghua University¹, University of Maryland, College Park²

17 Jul 2019

TL;DR: Point2Sequence as discussed by the authors employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention, and captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure.

...read moreread less

Abstract: Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.

...read moreread less

233 citations

Proceedings Article•DOI•

Task-Free Continual Learning

[...]

Rahaf Aljundi¹, Klaas Kelchtermans¹, Tinne Tuytelaars¹•Institutions (1)

Katholieke Universiteit Leuven¹

15 Jun 2019

TL;DR: In this article, the authors propose a protocol to decide when to update importance weights, which data to use to update them, and how to accumulate the importance weights at each update step.

...read moreread less

Abstract: Methods proposed in the literature towards continual deep learning typically operate in a task-based sequential learning setup. A sequence of tasks is learned, one at a time, with all data of current task available but not of previous or future tasks. Task boundaries and identities are known at all times. This setup, however, is rarely encountered in practical applications. Therefore we investigate how to transform continual learning to an online setup. We develop a system that keeps on learning over time in a streaming fashion, with data distributions gradually changing and without the notion of separate tasks. To this end, we build on the work on Memory Aware Synapses, and show how this method can be made online by providing a protocol to decide i) when to update the importance weights, ii) which data to use to update them, and iii) how to accumulate the importance weights at each update step. Experimental results show the validity of the approach in the context of two applications: (self-)supervised learning of a face recognition model by watching soap series and learning a robot to avoid collisions.

...read moreread less

159 citations

Proceedings Article•

Compositional generalization through meta sequence-to-sequence learning

[...]

Brenden M. Lake

01 Jan 2019

TL;DR: In this article, memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning, which solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

...read moreread less

Abstract: People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

...read moreread less

159 citations

Proceedings Article•DOI•

Iterative Alignment Network for Continuous Sign Language Recognition

[...]

Junfu Pu¹, Wengang Zhou¹, Houqiang Li¹•Institutions (1)

University of Science and Technology of China¹

01 Jun 2019

TL;DR: The framework consists of a 3D convolutional residual network for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling that is optimized in an alternate way for weakly supervised continuous sign language recognition.

...read moreread less

Abstract: In this paper, we propose an alignment network with iterative optimization for weakly supervised continuous sign language recognition. Our framework consists of two modules: a 3D convolutional residual network (3D-ResNet) for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling. The above two modules are optimized in an alternate way. In the encoder-decoder sequence learning network, two decoders are included, i.e., LSTM decoder and CTC decoder. Both decoders are jointly trained by maximum likelihood criterion with a soft Dynamic Time Warping (soft-DTW) alignment constraint. The warping path, which indicates the possible alignment between input video clips and sign words, is used to fine-tune the 3D-ResNet as training labels with classification loss. After fine-tuning, the improved features are extracted for optimization of encoder-decoder sequence learning network in next iteration. The proposed algorithm is evaluated on two large scale continuous sign language recognition benchmarks, i.e., RWTH-PHOENIX-Weather and CSL. Experimental results demonstrate the effectiveness of our proposed method.

...read moreread less

152 citations

Journal Article•DOI•

Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning

[...]

Zhijiang Guo¹, Yan Zhang¹, Zhiyang Teng¹, Zhiyang Teng², Wei Lu¹ - Show less +1 more•Institutions (2)

Singapore University of Technology and Design¹, Westlake University²

19 Jun 2019-Transactions of the Association for Computational Linguistics

TL;DR: This work introduces a dense connection strategy, proposing a novel Densely Connected Graph Convolutional Network (DCGCN), able to integrate both local and non-local features to learn a better structural representation of a graph.

...read moreread less

Abstract: We focus on graph-to-sequence learning, which can be framed as transducing graph structures to sequences for text generation. To capture structural information associated with graphs, we investigat...

...read moreread less

117 citations

Journal Article•DOI•

Skeleton-Based Action Recognition With Gated Convolutional Neural Networks

[...]

Congqi Cao¹, Cuiling Lan², Yifan Zhang³, Wenjun Zeng², Hanqing Lu³, Yanning Zhang¹ - Show less +2 more•Institutions (3)

Northwestern Polytechnical University¹, Microsoft², Chinese Academy of Sciences³

01 Feb 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This work solves the sequence learning problem as an image classification task using convolutional neural networks, and builds a classification network with stacked residual blocks and having a special design called linear skip gated connection which can benefit information propagation across multiple residual blocks.

...read moreread less

Abstract: For skeleton-based action recognition, most of the existing works used recurrent neural networks. Using convolutional neural networks (CNNs) is another attractive solution considering their advantages in parallelization, effectiveness in feature learning, and model base sufficiency. Besides these, skeleton data are low-dimensional features. It is natural to arrange a sequence of skeleton features chronologically into an image, which retains the original information. Therefore, we solve the sequence learning problem as an image classification task using CNNs. For better learning ability, we build a classification network with stacked residual blocks and having a special design called linear skip gated connection which can benefit information propagation across multiple residual blocks. When arranging the coordinates of body joints in one frame into a skeleton feature, we systematically investigate the performance of part-based, chain-based, and traversal-based orders. Furthermore, a fully convolutional permutation network is designed to learn an optimized order for data rearrangement. Without any bells and whistles, our proposed model achieves state-of-the-art performance on two challenging benchmark datasets, outperforming existing methods significantly.

...read moreread less

106 citations

Sequence to Sequence Learning with Attention Mechanism for Short-term Passenger Flow Prediction in Large-scale Metro System

[...]

Siyu Hao, Der-Horng Lee

01 Jan 2019

TL;DR: In this paper, an end-to-end deep learning framework was proposed to predict the number of passengers alighting at each station in the near future, given the last few short-term periods.

...read moreread less

Abstract: Abstract The accurate short-term passenger flow prediction is of great significance for real-time public transit management, timely emergency response as well as systematical medium and long-term planning. In this paper, we propose an end-to-end deep learning framework that can simultaneously make multi-step predictions for all stations in a large scale metro system. A sequence to sequence model embedded with the attention mechanism forms the backbone of this framework. The sequence to sequence model consists of an encoder network and a decoder network, making it good at modeling sequential data with varying lengths and the attention mechanism further enhances its ability to capture long-range dependencies. We use the proposed framework to predict the number of passengers alighting at each station in the near future, given the number of passengers boarding at each station in the last few short-term periods. The large quantities of real-world data collected from Singapore’s metro system are used to validate the proposed model. In addition, a set of comparisons made among our model and other classical approaches evidently indicates that the proposed model is more scalable and robust than other baselines in making multi-step and network-wide predictions for short-term passenger flow.

...read moreread less

95 citations

Journal Article•DOI•

Computational noise in reward-guided learning drives behavioral variability in volatile environments

[...]

Charles Findling¹, Charles Findling², Vasilisa Skvortsova², Rémi Dromnelle², Rémi Dromnelle³, Stefano Palminteri², Valentin Wyart² - Show less +3 more•Institutions (3)

ENSAE ParisTech¹, École Normale Supérieure², University of Paris³

28 Oct 2019-Nature Neuroscience

TL;DR: It is found that a large fraction of non-greedy decisions that humans make in volatile environments do not stem from exploration but from the limited precision of learning, and further identify its neurophysiological correlates.

...read moreread less

Abstract: When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these ‘non-greedy’ decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus–norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning. Findling, Skvortsova et al. find that a large fraction of non-greedy decisions that humans make in volatile environments do not stem from exploration but from the limited precision of learning, and further identify its neurophysiological correlates.

...read moreread less

87 citations

Proceedings Article•DOI•

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

[...]

Johannes Michael¹, Roger Labahn¹, Tobias Grüning, Jochen Zöllner•Institutions (1)

University of Rostock¹

18 Mar 2019

TL;DR: An attention-based sequence-to-sequence model that combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent network to decode the actual character sequence.

...read moreread less

Abstract: Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.

...read moreread less

81 citations

Proceedings Article•DOI•

Exploring Sequence-to-Sequence Learning in Aspect Term Extraction

[...]

Dehong Ma¹, Sujian Li¹, Fangzhao Wu, Xing Xie, Houfeng Wang¹ - Show less +1 more•Institutions (1)

Peking University¹

01 Jul 2019

TL;DR: To formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively, the proposed gated unit networks and position-aware attention mechanism are designed.

...read moreread less

Abstract: Aspect term extraction (ATE) aims at identifying all aspect terms in a sentence and is usually modeled as a sequence labeling problem. However, sequence labeling based methods cannot make full use of the overall meaning of the whole sentence and have the limitation in processing dependencies between labels. To tackle these problems, we first explore to formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively. At the same time, to make Seq2Seq learning suit to ATE where labels correspond to words one by one, we design the gated unit networks to incorporate corresponding word representation into the decoder, and position-aware attention to pay more attention to the adjacent words of a target word. The experimental results on two datasets show that Seq2Seq learning is effective in ATE accompanied with our proposed gated unit networks and position-aware attention mechanism.

...read moreread less

79 citations

Journal Article•DOI•

Pixelwise Deep Sequence Learning for Moving Object Detection

[...]

Yingying Chen¹, Jinqiao Wang¹, Bingke Zhu¹, Ming Tang¹, Hanqing Lu¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Sep 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An end-to-end deep sequence learning architecture for moving object detection is proposed and a novel attention long short-term memory (Attention ConvLSTM) is proposed to model pixelwise changes over time.

...read moreread less

Abstract: Moving object detection is an essential, well-studied but still open problem in computer vision and plays a fundamental role in many applications. Traditional approaches usually reconstruct background images with hand-crafted visual features, such as color, texture, and edge. Due to lack of prior knowledge or semantic information, it is difficult to deal with complicated and rapid changing scenes. To exploit the temporal structure of the pixel-level semantic information, in this paper, we propose an end-to-end deep sequence learning architecture for moving object detection. First, the video sequences are input into a deep convolutional encoder–decoder network for extracting pixel-wise semantic features. Then, to exploit the temporal context, we propose a novel attention long short-term memory (Attention ConvLSTM) to model pixelwise changes over time. A spatial transformer network and a conditional random field layer are finally appended to reduce the sensitivity to camera motion and smooth the foreground boundaries. A multi-task loss is proposed to jointly optimization for frame-based classification and temporal prediction in an end-to-end network. Experimental results on CDnet 2014 and LASIESTA show 12.15% and 16.71% improvement to the state of the art, respectively.

...read moreread less

Journal Article•DOI•

The super-learning hypothesis: Integrating learning processes across cortex, cerebellum and basal ganglia

[...]

Daniele Caligiore¹, Michael A. Arbib², R. Chris Miall³, Gianluca Baldassarre¹•Institutions (3)

National Research Council¹, University of California, San Diego², University of Birmingham³

01 May 2019-Neuroscience & Biobehavioral Reviews

TL;DR: This new hypothesis that different learning mechanisms act in synergy as they affect neural structures often relying on the widespread action of neuromodulators is articulate and empirical evidence supporting it is discussed by specifically referring to motor adaptation and sequence learning.

...read moreread less

Proceedings Article•DOI•

Interpretable and Steerable Sequence Learning via Prototypes

[...]

Yao Ming¹, Panpan Xu², Huamin Qu¹, Liu Ren²•Institutions (2)

Hong Kong University of Science and Technology¹, Bosch²

23 Jul 2019-arXiv: Learning

TL;DR: ProSeNet is proposed, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning that can achieve accuracy on par with state-of-the-art deep learning models and provides a user-friendly approach to model steering.

...read moreread less

Abstract: One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence modeling, it is still challenging to explain the rationales behind the model outputs, which is essential for building trust and supporting the domain experts to validate, critique and refine the model. We propose ProSeNet, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning. The prediction is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain. For better interpretability, we define several criteria for constructing the prototypes, including simplicity, diversity, and sparsity and propose the learning objective and the optimization procedure. ProSeNet also provides a user-friendly approach to model steering: domain experts without any knowledge on the underlying model or parameters can easily incorporate their intuition and experience by manually refining the prototypes. We conduct experiments on a wide range of real-world applications, including predictive diagnostics for automobiles, ECG, and protein sequence classification and sentiment analysis on texts. The result shows that ProSeNet can achieve accuracy on par with state-of-the-art deep learning models. We also evaluate the interpretability of the results with concrete case studies. Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate that the model selects high-quality prototypes which align well with human knowledge and can be interactively refined for better interpretability without loss of performance.

...read moreread less

Journal Article•DOI•

Brain signatures of a multiscale process of sequence learning in humans.

[...]

Maxime Maheu¹, Maxime Maheu², Stanislas Dehaene³, Stanislas Dehaene², Florent Meyniel² - Show less +1 more•Institutions (3)

Paris Descartes University¹, Université Paris-Saclay², Collège de France³

04 Feb 2019-eLife

TL;DR: These findings support the existence of multiple computational systems for sequence processing involving statistical inferences at multiple scales and the possibility that successive brain responses reflect the progressive extraction of sequence statistics at different timescales.

...read moreread less

Abstract: Extracting the temporal structure of sequences of events is crucial for perception, decision-making, and language processing. Here, we investigate the mechanisms by which the brain acquires knowledge of sequences and the possibility that successive brain responses reflect the progressive extraction of sequence statistics at different timescales. We measured brain activity using magnetoencephalography in humans exposed to auditory sequences with various statistical regularities, and we modeled this activity as theoretical surprise levels using several learning models. Successive brain waves related to different types of statistical inferences. Early post-stimulus brain waves denoted a sensitivity to a simple statistic, the frequency of items estimated over a long timescale (habituation). Mid-latency and late brain waves conformed qualitatively and quantitatively to the computational properties of a more complex inference: the learning of recent transition probabilities. Our findings thus support the existence of multiple computational systems for sequence processing involving statistical inferences at multiple scales.

...read moreread less

Proceedings Article•DOI•

Parallel Iterative Edit Models for Local Sequence Transduction

[...]

Abhijeet Awasthi¹, Sunita Sarawagi¹, Rasna Goyal¹, Sabyasachi Ghosh¹, Vihari Piratla¹ - Show less +1 more•Institutions (1)

Indian Institute of Technology Bombay¹

01 Nov 2019

TL;DR: This paper proposed a parallel iterative edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC) and OCR correction.

...read moreread less

Abstract: We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC). Recent approaches are based on the popular encoder-decoder (ED) model for sequence to sequence learning. The ED model auto-regressively captures full dependency among output tokens but is slow due to sequential decoding. The PIE model does parallel decoding, giving up the advantage of modeling full dependency in the output, yet it achieves accuracy competitive with the ED model for four reasons: 1. predicting edits instead of tokens, 2. labeling sequences instead of generating sequences, 3. iteratively refining predictions to capture dependencies, and 4. factorizing logits over edits and their token argument to harness pre-trained language models like BERT. Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.

...read moreread less

Proceedings Article•DOI•

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

[...]

Laurent El Shafey¹, Hagen Soltau¹, Izhak Shafran¹•Institutions (1)

Google¹

09 Jul 2019

TL;DR: In this article, a joint ASR and speaker diarization system using a recurrent neural network transducer was proposed to tackle the two tasks by using both linguistic and acoustic cues to infer speaker roles.

...read moreread less

Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective functions. Often the SD systems operate directly on the acoustics and are not constrained to respect word boundaries and this deficiency is overcome in an ad hoc manner. Motivated by recent advances in sequence to sequence learning, we propose a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer. Our approach utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues. We evaluated the performance of our approach on a large corpus of medical conversations between physicians and patients. Compared to a competitive conventional baseline, our approach improves word-level diarization error rate from 15.8% to 2.2%.

...read moreread less

Journal Article•DOI•

Corrective Feedback and the Role of Implicit Sequence-Learning Ability in L2 Online Performance.

[...]

Gisela Granena¹, Yucel Yilmaz²•Institutions (2)

Open University of Catalonia¹, Indiana University²

01 Mar 2019-Language Learning

Journal Article•DOI•

Deconstructing Procedural Memory: Different Learning Trajectories and Consolidation of Sequence and Statistical Learning.

[...]

Péter Simor¹, Zsófia Zavecz², Zsófia Zavecz¹, Kata Horváth², Kata Horváth¹, Noémi Éltető¹, Csenge Török¹, Orsolya Pesthy¹, Ferenc Gombos³, Karolina Janacsek¹, Karolina Janacsek², Dezso Nemeth⁴, Dezso Nemeth², Dezso Nemeth¹ - Show less +10 more•Institutions (4)

Eötvös Loránd University¹, Hungarian Academy of Sciences², Pázmány Péter Catholic University³, University of Lyon⁴

09 Jan 2019-Frontiers in Psychology

TL;DR: Analysis of training-dependent and off-line changes of two sub-processes of procedural learning: namely, sequence learning and statistical learning can contribute to a deeper understanding of the dynamic changes of multiple parallel learning and consolidation processes that occur during procedural memory formation.

...read moreread less

Abstract: Procedural learning is a fundamental cognitive function that facilitates efficient processing of and automatic responses to complex environmental stimuli. Here, we examined training-dependent and off-line changes of two sub-processes of procedural learning: namely, sequence learning and statistical learning. Whereas sequence learning requires the acquisition of order-based relationships between the elements of a sequence, statistical learning is based on the acquisition of probabilistic associations between elements. Seventy-eight healthy young adults (58 females and 20 males) completed the modified version of the Alternating Serial Reaction Time task that was designed to measure Sequence and Statistical Learning simultaneously. After training, participants were randomly assigned to one of three conditions: active wakefulness, quiet rest, or daytime sleep. We examined off-line changes in Sequence and Statistical Learning as well as further improvements after extended practice. Performance in Sequence Learning increased during training, while Statistical Learning plateaued relatively rapidly. After the off-line period, both the acquired sequence and statistical knowledge was preserved, irrespective of the vigilance state (awake, quiet rest or sleep). Sequence Learning further improved during extended practice, while Statistical Learning did not. Moreover, within the sleep group, cortical oscillations and sleep spindle parameters showed differential associations with Sequence and Statistical Learning. Our findings can contribute to a deeper understanding of the dynamic changes of multiple parallel learning and consolidation processes that occur during procedural memory formation.

...read moreread less

Posted Content•

Parallel Iterative Edit Models for Local Sequence Transduction

[...]

Abhijeet Awasthi¹, Sunita Sarawagi¹, Rasna Goyal¹, Sabyasachi Ghosh¹, Vihari Piratla¹ - Show less +1 more•Institutions (1)

Indian Institute of Technology Bombay¹

07 Oct 2019-arXiv: Computation and Language

TL;DR: Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.

...read moreread less

Abstract: We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC). Recent approaches are based on the popular encoder-decoder (ED) model for sequence to sequence learning. The ED model auto-regressively captures full dependency among output tokens but is slow due to sequential decoding. The PIE model does parallel decoding, giving up the advantage of modelling full dependency in the output, yet it achieves accuracy competitive with the ED model for four reasons: 1.~predicting edits instead of tokens, 2.~labeling sequences instead of generating sequences, 3.~iteratively refining predictions to capture dependencies, and 4.~factorizing logits over edits and their token argument to harness pre-trained language models like BERT. Experiments on tasks spanning GEC, OCR correction and spell correction demonstrate that the PIE model is an accurate and significantly faster alternative for local sequence transduction.

...read moreread less

Journal Article•DOI•

A Two-Stage Neural Network for Sleep Stage Classification Based on Feature Learning, Sequence Learning, and Data Augmentation

[...]

Chenglu Sun¹, Jiahao Fan¹, Chen Chen¹, Wei Li¹, Wei Chen¹ - Show less +1 more•Institutions (1)

Fudan University¹

08 Aug 2019-IEEE Access

TL;DR: Model analysis displayed that the combination of the hand-crafted features and network trained features can improve the classification performance via the comparison experiments and the RNN is a good choice for learning temporal information in sleep epochs.

...read moreread less

Abstract: Sleep stage classification is a fundamental but cumbersome task in sleep analysis. To score the sleep stage automatically, this study presents a stage classification method based on a two-stage neural network. The feature learning stage as the first stage can fuse network trained features with traditional hand-crafted features. A recurrent neural network (RNN) in the second stage is fully utilized for learning temporal information between sleep epochs and obtaining classification results. To solve serious sample imbalance problem, a novel pre-training process combined with data augmentation was introduced. The proposed method was evaluated by two public databases, the Sleep-EDF and Sleep Apnea (SA). The proposed method can achieve the F1-score and Kappa coefficient of 0.806 and 0.80 for healthy subjects, respectively, and achieve 0.790 and 0.74 for the subjects with suspect sleep disorders, respectively. The results show that the method can achieve better performance compared to the state-of-the-art methods for the same databases. Model analysis displayed that the combination of the hand-crafted features and network trained features can improve the classification performance via the comparison experiments. In addition, the RNN is a good choice for learning temporal information in sleep epochs. Besides, the pre-training process with data augmentation is verified that can reduce the impact of sample imbalance. The proposed model has potential to exploit sleep information comprehensively.

...read moreread less

Proceedings Article•

A comprehensive, application-oriented study of catastrophic forgetting in DNNs.

[...]

Benedikt Pfülb¹, Alexander Gepperth¹•Institutions (1)

Fulda University of Applied Sciences¹

01 Jan 2019

TL;DR: A large-scale empirical study of catastrophic forgetting in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning indicates that there is no model that avoids CF for all investigated datasets and SLTs under application conditions.

...read moreread less

Abstract: We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning. A new experimental protocol is proposed that enforces typical constraints encountered in application scenarios. As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF. Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions. We conclude with a discussion of potential solutions and workarounds to CF, notably for the EWC and IMM models.

...read moreread less

Posted Content•

Joint Speech Recognition and Speaker Diarization via Sequence Transduction.

[...]

Laurent El Shafey¹, Hagen Soltau¹, Izhak Shafran¹•Institutions (1)

Google¹

09 Jul 2019-arXiv: Computation and Language

TL;DR: This work proposes a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer that utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues.

...read moreread less

Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective functions. Often the SD systems operate directly on the acoustics and are not constrained to respect word boundaries and this deficiency is overcome in an ad hoc manner. Motivated by recent advances in sequence to sequence learning, we propose a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer. Our approach utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues. We evaluated the performance of our approach on a large corpus of medical conversations between physicians and patients. Compared to a competitive conventional baseline, our approach improves word-level diarization error rate from 15.8% to 2.2%.

...read moreread less

Proceedings Article•DOI•

Interpretable and Steerable Sequence Learning via Prototypes

[...]

Yao Ming¹, Panpan Xu², Huamin Qu¹, Liu Ren²•Institutions (2)

Hong Kong University of Science and Technology¹, Bosch²

25 Jul 2019

TL;DR: ProSeNet as mentioned in this paper is an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning, which is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain.

...read moreread less

Abstract: One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence modeling, it is still challenging to explain the rationales behind the model outputs, which is essential for building trust and supporting the domain experts to validate, critique and refine the model. We propose ProSeNet, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning. The prediction is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain. For better interpretability, we define several criteria for constructing the prototypes, including simplicity, diversity, and sparsity and propose the learning objective and the optimization procedure. ProSeNet also provides a user-friendly approach to model steering: domain experts without any knowledge on the underlying model or parameters can easily incorporate their intuition and experience by manually refining the prototypes. We conduct experiments on a wide range of real-world applications, including predictive diagnostics for automobiles, ECG, and protein sequence classification and sentiment analysis on texts. The result shows that ProSeNet can achieve accuracy on par with state-of-the-art deep learning models. We also evaluate the interpretability of the results with concrete case studies. Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate that the model selects high-quality prototypes which align well with human knowledge and can be interactively refined for better interpretability without loss of performance.

...read moreread less

Journal Article•DOI•

Contextualized Non-Local Neural Networks for Sequence Learning

[...]

Pengfei Liu¹, Shuaichen Chang², Xuanjing Huang¹, Jian Tang³, Jackie Chi Kit Cheung² - Show less +1 more•Institutions (3)

Fudan University¹, Ohio State University², Université de Montréal³

17 Jul 2019

TL;DR: The authors propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood, thus providing better interpretability to users.

...read moreread less

Abstract: Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.

...read moreread less

Posted Content•

A comprehensive, application-oriented study of catastrophic forgetting in DNNs

[...]

Benedikt Pfülb¹, Alexander Gepperth¹•Institutions (1)

Fulda University of Applied Sciences¹

20 May 2019-arXiv: Learning

TL;DR: In this article, a large-scale empirical study of catastrophic forgetting in modern deep neural network (DNN) models that perform sequential (or: incremental) learning is presented, and a new experimental protocol is proposed that enforces typical constraints encountered in application scenarios.

...read moreread less

Abstract: We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning. A new experimental protocol is proposed that enforces typical constraints encountered in application scenarios. As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF. Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions. We conclude with a discussion of potential solutions and workarounds to CF, notably for the EWC and IMM models.

...read moreread less

Proceedings Article•DOI•

Deep Neural Models for Medical Concept Normalization in User-Generated Texts.

[...]

Zulfat Miftahutdinov¹, Elena Tutubalina²•Institutions (2)

Kazan Federal University¹, Samsung²

31 May 2019

TL;DR: This work approaches the medical concept normalization problem as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions.

...read moreread less

Abstract: In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions. Our experimental evaluation over three different benchmarks shows that neural architectures leverage the semantic meaning of the entity mention and significantly outperform existing state of the art models.

...read moreread less

Journal Article•DOI•

Collaborative optimization algorithm for learning path construction in E-learning

[...]

V. Vanitha¹, Pandian Krishnan², R. Elakkiya•Institutions (2)

Techno India¹, National Academy of Agricultural Research Management²

01 Jul 2019-Computers & Electrical Engineering

TL;DR: A collaborative optimization algorithm, combining ant colony optimization and a genetic algorithm to provide learners with a personalized learning path, establishes that the hybrid approach provides a better solution than the traditional approach.

...read moreread less

Journal Article•DOI•

Improved online sequential extreme learning machine for identifying crack behavior in concrete dam

[...]

Bo Dai¹, Chongshi Gu¹, Erfeng Zhao¹, Kai Zhu¹, Wenhan Cao¹, Xiangnan Qin¹ - Show less +2 more•Institutions (1)

Hohai University¹

01 Jan 2019-Advances in Structural Engineering

TL;DR: The comparative results demonstrate that the improved online sequential extreme learning machine can provide highly accurate forecasts and reasonably identify crack behavior.

...read moreread less

Abstract: Prediction models are essential in dam crack behavior identification. Prototype monitoring data arrive sequentially in dam safety monitoring. Given such characteristic, sequential learning algorith...

...read moreread less

Journal Article•DOI•

Where’s the Reward?

[...]

Shayan Doroudi¹, Shayan Doroudi², Shayan Doroudi³, Vincent Aleven¹, Emma Brunskill³ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, University of California, Irvine², Stanford University³

01 Dec 2019-International Journal of Artificial Intelligence in Education

TL;DR: A review of the variety of attempts to use RL for instructional sequencing finds that reinforcement learning has been most successful in cases where it has been constrained with ideas and theories from cognitive psychology and the learning sciences.

...read moreread less

Abstract: Since the 1960s, researchers have been trying to optimize the sequencing of instructional activities using the tools of reinforcement learning (RL) and sequential decision making under uncertainty. Many researchers have realized that reinforcement learning provides a natural framework for optimal instructional sequencing given a particular model of student learning, and excitement towards this area of research is as alive now as it was over fifty years ago. But does RL actually help students learn? If so, when and where might we expect it to be most helpful? To help answer these questions, we review the variety of attempts to use RL for instructional sequencing. First, we present a historical narrative of this research area. We identify three waves of research, which gives us a sense of the various communities of researchers that have been interested in this problem and where the field is going. Second, we review all of the empirical research that has compared RL-induced instructional policies to baseline methods of sequencing. We find that over half of the studies found that RL-induced policies significantly outperform baselines. Moreover, we identify five clusters of studies with different characteristics and varying levels of success in using RL to help students learn. We find that reinforcement learning has been most successful in cases where it has been constrained with ideas and theories from cognitive psychology and the learning sciences. However, given that our theories and models are limited, we also find that it has been useful to complement this approach with running more robust offline analyses that do not rely heavily on the assumptions of one particular model. Given that many researchers are turning to deep reinforcement learning and big data to tackle instructional sequencing, we believe keeping these best practices in mind can help guide the way to the reward in using RL for instructional sequencing.

...read moreread less

Journal Article•DOI•

Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning

[...]

Myeongjun Jang, Seungwan Seo, Pilsung Kang

01 Jul 2019-Information Sciences

TL;DR: Experimental results of three natural language tasks confirm that the proposed RNN--SVAE yields higher performance than two benchmark models, and the mean and standard deviation of the continuous semantic space are learned to take advantage of the variational method.

...read moreread less

Showing papers on "Sequence learning published in 2019"