scispace - formally typeset
Open AccessJournal ArticleDOI

<scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

- 01 Jan 2022 - 
- Vol. 10, pp 73-91
Reads0
Chats0
TLDR
CANINE as discussed by the authors is a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.
Abstract
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE, a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by 2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.

read more

Citations
More filters
Journal ArticleDOI

An ensemble of pre-trained transformer models for imbalanced multiclass malware classification

TL;DR: In this paper , a transformer-based model is proposed to process the API call sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embeddings.
Proceedings ArticleDOI

Continuous Prompt Tuning Based Textual Entailment Model for E-commerce Entity Typing

TL;DR: Zhang et al. as mentioned in this paper proposed a textual entailment model with continuous prompt tuning based hypotheses and fusion embeddings for e-commerce entity typing, which can handle new entities that are not present during training.
Journal ArticleDOI

A Survey of Text Representation Methods and Their Genealogy

- 01 Jan 2022 - 
TL;DR: Text representation methods have been evolving at such a quick pace that the research community is struggling to retain knowledge of the methods and their interrelations as discussed by the authors , which contributes to this lack of compilation, composition, and systematization by providing a survey of current approaches, by arranging them in a genealogy.
Posted Content

Hierarchical Transformers Are More Efficient Language Models

TL;DR: Hourglass as discussed by the authors is a hierarchical Transformer language model that uses the best performing upsampling and downsampling layers to create a hierarchical language model and achieves state-of-the-art results on ImageNet32 generation task.
Proceedings ArticleDOI

Hierarchical Transformers Are More Efficient Language Models

TL;DR: Hourglass as mentioned in this paper proposes a hierarchical Transformer language model, which uses the best performing upsampling and downsampling layers to create Hourglass, which achieves state-of-the-art results on ImageNet32 generation task.
References
More filters
Journal ArticleDOI

Enriching Word Vectors with Subword Information

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Proceedings ArticleDOI

Neural Machine Translation of Rare Words with Subword Units

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.
Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Related Papers (5)