scispace - formally typeset
Open Access

Using Pause Information for More Accurate Entity Recognition

Reads0
Chats0
TLDR
This article showed that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks and applied pause duration to enrich contextual embeddings to improve shallow parsing of entities.
Abstract
Entity tags in human-machine dialog are integral to natural language understanding (NLU) tasks in conversational assistants. However, current systems struggle to accurately parse spoken queries with the typical use of text input alone, and often fail to understand the user intent. Previous work in linguistics has identified a cross-language tendency for longer speech pauses surrounding nouns as compared to verbs. We demonstrate that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks. Analysis of pauses in French and English utterances from a commercial voice assistant shows the statistically significant difference in pause duration around multi-token entity span boundaries compared to within entity spans. Additionally, in contrast to text-based NLU, we apply pause duration to enrich contextual embeddings to improve shallow parsing of entities. Results show that our proposed novel embeddings improve the relative error rate by up to 8% consistently across three domains for French, without any added annotation or alignment costs to the parser.

read more

References
More filters
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Proceedings Article

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

TL;DR: This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.
Book ChapterDOI

Text Chunking Using Transformation-Based Learning

TL;DR: This work has shown that the transformation-based learning approach can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks.
Proceedings ArticleDOI

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

TL;DR: This article proposed Mockingjay, a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.
Related Papers (5)