Open AccessPosted Content
KLUE: Korean Language Understanding Evaluation.
Sungjoon Park,Jihyung Moon,Sungdong Kim,Won Ik Cho,Jiyoon Han,Jang-Won Park,Chisung Song,Junseong Kim,Yongsook Song,Tae-Hwan Oh,Joohong Lee,Juhyun Oh,Sungwon Lyu,Younghoon Jeong,Inkwon Lee,Sangwoo Seo,Dongjun Lee,Hyunwoo Kim,Myeonghwa Lee,Seongbo Jang,Seungwon Do,Sunkyoung Kim,KyungTae Lim,Jongwon Lee,Kyumin Park,Jamin Shin,Seonghyun Kim,Lucy Park,Alice Oh,Jung-Woo Ha,Kyunghyun Cho +30 more
TLDR
The Korean Language Understanding Evaluation (KLUE) benchmark as mentioned in this paper is a collection of 8 Korean NLP tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.Abstract:
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at this https URL.read more
Citations
More filters
Posted Content
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim,HyoungSeok Kim,Sang Woo Lee,Gichang Lee,Dong-Hyun Kwak,Dong Hyeon Jeon,Sunghyun Park,Sungju Kim,Seonhoon Kim,Dongpil Seo,Heungsub Lee,Minyoung Jeong,Sungjae Lee,Minsub Kim,Suk Hyun Ko,Seokhun Kim,Taeyong Park,Jinuk Kim,Soyoung Kang,Na-Hyeon Ryu,Kang Min Yoo,Minsuk Chang,Soobin Suh,Sookyo In,Jin-Seong Park,Kyungduk Kim,Hiun Kim,Jisu Jeong,Yong Goo Yeo,Donghoon Ham,Dongju Park,Min Young Lee,Jae-Wook Kang,Inho Kang,Jung-Woo Ha,Woo-Myoung Park,Nako Sung +36 more
TL;DR: HyperCLOVA as discussed by the authors is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.
Journal ArticleDOI
Enhancing Korean Named Entity Recognition With Linguistic Tokenization Strategies
TL;DR: In this article, the authors focus on the effect of tokenization strategies on the quality of input features, and quantitatively and qualitatively analyze the coping process of each tokenization strategy for these challenges.
Posted Content
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
TL;DR: Transformer-based pretrained language models (T-PTLMs) as discussed by the authors have achieved great success in almost every NLP task and are built on the top of transformers, self-supervised learning and transfer learning.
Posted Content
Language Models are Few-shot Multilingual Learners
TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.
Language Models are Few-shot Multilingual Learners
TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.
References
More filters
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.