scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Character-based feature extraction with LSTM networks for POS-tagging task

01 Oct 2016-pp 7991654
TL;DR: A LSTM-based feature extraction layer that reads in a sequence of characters corresponding to a word and outputs a single fixed-length real-valued vector that can offer a solution to the out-of-vocabulary words problem.
Abstract: In this paper we describe a work in progress on designing the continuous vector space word representations able to map unseen data adequately. We propose a LSTM-based feature extraction layer that reads in a sequence of characters corresponding to a word and outputs a single fixed-length real-valued vector. We then test our model on a POS tagging task on four typologically different languages. The results of the experiments suggest that the model can offer a solution to the out-of-vocabulary words problem, as in a comparable setting its OOV accuracy improves over that of a state of the art tagger.
Citations
More filters
Journal ArticleDOI
Guangyu Wang1, Zhibin Li1, Guangjun Li1, Guyu Dai1, Qing Xiao1, Long Bai1, Yisong He1, Yaxin Liu1, Sen Bai1 
TL;DR: In this paper, machine learning methods were applied to predict external respiratory motion signals and predict internal liver motion in this therapeutic context. And the LSTM-based integrated model performs well at predicting liver motion from external respiratory signals with system latencies of up to 450mm.
Abstract: Surface-guided radiation therapy can be used to continuously monitor a patient’s surface motions during radiotherapy by a non-irradiating, noninvasive optical surface imaging technique. In this study, machine learning methods were applied to predict external respiratory motion signals and predict internal liver motion in this therapeutic context. Seven groups of interrelated external/internal respiratory liver motion samples lasting from 5 to 6 min collected simultaneously were used as a dataset, Dv. Long short-term memory (LSTM) and support vector regression (SVR) networks were then used to establish external respiratory signal prediction models (LSTMpred/SVRpred) and external/internal respiratory motion correlation models (LSTMcorr/SVRcorr). These external prediction and external/internal correlation models were then combined into an integrated model. Finally, the LSTMcorr model was used to perform five groups of model updating experiments to confirm the necessity of continuously updating the external/internal correlation model. The root-mean-square error (RMSE), mean absolute error (MAE), and maximum absolute error (MAX_AE) were used to evaluate the performance of each model. The models established using the LSTM neural network performed better than those established using the SVR network in the tasks of predicting external respiratory signals for latency-compensation (RMSE < 0.5 mm at a latency of 450 ms) and predicting internal liver motion using external signals (RMSE < 0.6 mm). The prediction errors of the integrated model (RMSE ≤ 1.0 mm) were slightly higher than those of the external prediction and external/internal correlation models. The RMSE/MAE of the fifth model update was approximately ten times smaller than that of the first model update. The LSTM networks outperform SVR networks at predicting external respiratory signals and internal liver motion because of LSTM’s strong ability to deal with time-dependencies. The LSTM-based integrated model performs well at predicting liver motion from external respiratory signals with system latencies of up to 450 ms. It is necessary to update the external/internal correlation model continuously.

17 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A language-independent, deep learning-based approach to the task of morphological disambiguation that improves on the languagedependent state of the art for two agglutinative languages and can be potentially applied to other morphologically complex languages.
Abstract: We develop a language-independent, deep learning-based approach to the task of morphological disambiguation. Guided by the intuition that the correct analysis should be “most similar” to the context, we propose dense representations for morphological analyses and surface context and a simple yet effective way of combining the two to perform disambiguation. Our approach improves on the language-dependent state of the art for two agglutinative languages (Turkish and Kazakh) and can be potentially applied to other morphologically complex languages.

17 citations


Cites background from "Character-based feature extraction ..."

  • ...It is becoming increasingly popular to use richer architectures to learn better embeddings from characters/words (Yessenbayev and Makazhanov, 2016; Ling et al., 2015; Wieting et al., 2016)....

    [...]

Proceedings ArticleDOI
06 Jul 2019
TL;DR: This work implementing a POS tagger for biomedical domain using deep neural network architecture and evaluated using publicly accessible dataset from GENIA.
Abstract: POS tagging is the process of classifying words into their parts of speech like noun, verb, preposition etc. to a word. It is the most important and basic process in NLP. It is acts as an essential preprocess for other applications in natural language processing (NLP) like sentiment analysis, NER, speech recognition and so on. POS tagging is treated as a sequence labeling problem in which it labels words with their appropriate Part-Of-Speech. This work implementing a POS tagger for biomedical domain using deep neural network architecture. The experiment is RNN, LSTM, and GRU will give better performance since they are able to access more context information and which we evaluated using publicly accessible dataset from GENIA. Most of the applications in NLP became solved due to the advancement of neural network or deep learning.

14 citations


Cites background from "Character-based feature extraction ..."

  • ...Zhandos Yessenbayev [11] discussed a character-based feature extraction with LSTM networks for POS-tagging task....

    [...]

  • ...Great success in several studies [5] [6][11][13][14][15] of sequence labeling task by using three recurrent neural networks is the motivation of this work....

    [...]

Proceedings ArticleDOI
Kobayashi Yuka1, Yoshida Takami1, Iwata Kenji1, Hiroshi Fujimura1, Masami Akamine1 
01 Dec 2018
TL;DR: A Recurrent Neural Network encoder-decoder model is used and a method that uses only in-domain data is proposed that is robust against over-fitting problems because it is independent of the slot values of the training data.
Abstract: This paper proposes an approach to detecting-of-domain slot values from user utterances in spoken dialogue systems based on contexts. The approach detects keywords of slot values from utterances and consults domain knowledge (i.e., an ontology) to check whether the keywords are-of-domain. This can prevent the systems from responding improperly to user requests. We use a Recurrent Neural Network (RNN) encoder-decoder model and propose a method that uses only in-domain data. The method replaces word embedding vectors of the keywords corresponding to slot values with random vectors during training of the model. This allows using context information. The model is robust against over-fitting problems because it is independent of the slot values of the training data. Experiments show that the proposed method achieves a 65% gain in F1 score relative to a baseline model and a further 13 percentage points by combining with other methods.

5 citations

Proceedings ArticleDOI
15 Sep 2019
TL;DR: A new method for slot filling of out-ofdomain (OOD) slot values, which are not included in the training data, in spoken dialogue systems, using two encoders, which distinctly encode contexts and keywords, respectively.
Abstract: This paper proposes a new method for slot filling of out-ofdomain (OOD) slot values, which are not included in the training data, in spoken dialogue systems. Word embeddings have been proposed to estimate the OOD slot values included in the word embedding model from keyword information. At the same time, context information is an important clue for estimation because the values in a given slot tend to appear in similar contexts. The proper use of either or both keyword and context information depends on the sentence. Conventional methods input a whole sentence into an encoder and extract important clues by the attention mechanism. However, it is difficult to properly distinguish context and keyword information from the encoder outputs because these two features are already mixed. Our proposed method uses two encoders, which distinctly encode contexts and keywords, respectively. The model calculates weights for the two encoders based on a user utterance and estimates a slot with weighted outputs from the two encoders. Experimental results show that the proposed method achieves a 50% relative improvement in F1 score compared with a baseline model, which detects slot values from user utterances and estimates slots at once with a single encoder.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations


"Character-based feature extraction ..." refers background in this paper

  • ...The long short-term memory (LSTM) architecture was proposed by Hochreiter and Schmidhuber [9] and it consists of four major components a self-connected memory cell and three multiplicative units – the input, output and forget gates [7]....

    [...]

Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations


"Character-based feature extraction ..." refers methods in this paper

  • ...Possible alternatives are global vectors for representations [27], which capture both the statistical information via count-based methods and meaningful structures via the log-bilinear prediction-based methods....

    [...]

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations