<scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

doi:10.1162/tacl_a_00448

Open AccessJournal ArticleDOI

<scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

- 01 Jan 2022 -

Transactions of the Association for Comp...

- Vol. 10, pp 73-91

Chats0

TLDR

CANINE as discussed by the authors is a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.

Abstract:

Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE, a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by 2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An ensemble of pre-trained transformer models for imbalanced multiclass malware classification

Kelly Long

- 01 Oct 2022 -

Computers & Security

TL;DR: In this paper , a transformer-based model is proposed to process the API call sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embeddings.

...read moreread less

Proceedings ArticleDOI

Continuous Prompt Tuning Based Textual Entailment Model for E-commerce Entity Typing

TL;DR: Zhang et al. as mentioned in this paper proposed a textual entailment model with continuous prompt tuning based hypotheses and fusion embeddings for e-commerce entity typing, which can handle new entities that are not present during training.

...read moreread less

Journal ArticleDOI

A Survey of Text Representation Methods and Their Genealogy

- 01 Jan 2022 -

IEEE Access

TL;DR: Text representation methods have been evolving at such a quick pace that the research community is struggling to retain knowledge of the methods and their interrelations as discussed by the authors , which contributes to this lack of compilation, composition, and systematization by providing a survey of current approaches, by arranging them in a genealogy.

...read moreread less

Posted Content

Hierarchical Transformers Are More Efficient Language Models

Piotr Nawrot, +6 more

- 26 Oct 2021 -

arXiv: Learning

TL;DR: Hourglass as discussed by the authors is a hierarchical Transformer language model that uses the best performing upsampling and downsampling layers to create a hierarchical language model and achieves state-of-the-art results on ImageNet32 generation task.

...read moreread less

Proceedings ArticleDOI

Hierarchical Transformers Are More Efficient Language Models

TL;DR: Hourglass as mentioned in this paper proposes a hierarchical Transformer language model, which uses the best performing upsampling and downsampling layers to create Hourglass, which achieves state-of-the-art results on ImageNet32 generation task.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 12 Jun 2017 -

Transactions of the Association for Comp...

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Proceedings ArticleDOI

Neural Machine Translation of Rare Words with Subword Units

Rico Sennrich, +2 more

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

...read moreread less

Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Erik Tjong Kim Sang, +1 more

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.

...read moreread less

Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, +9 more

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

...read moreread less

Collapse

arXiv: Computation and Language

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks.

Diego de Vargas Feijó, +1 more

- 19 Jul 2020 -

arXiv: Computation and Language

An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert

Wei Bao, +2 more

- 03 May 2020 -

arXiv: Computation and Language

<scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Citations

An ensemble of pre-trained transformer models for imbalanced multiclass malware classification

Continuous Prompt Tuning Based Textual Entailment Model for E-commerce Entity Typing

A Survey of Text Representation Methods and Their Genealogy

Hierarchical Transformers Are More Efficient Language Models

Hierarchical Transformers Are More Efficient Language Models

References

Enriching Word Vectors with Subword Information

Deep contextualized word representations

Neural Machine Translation of Rare Words with Subword Units

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Unsupervised Cross-lingual Representation Learning at Scale

Related Papers (5)

AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization

Ad Text Classification with Bidirectional Encoder Representations

Ad Text Classification with Transformer-Based Natural Language Processing Methods

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks.

An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert