Author

Inkwon Lee

Bio: Inkwon Lee is an academic researcher from Naver Corporation. The author has contributed to research in topics: Language model & Relationship extraction. The author has an hindex of 1, co-authored 1 publications receiving 7 citations.

Papers

PDF

Open Access

More filters

Posted Content•

KLUE: Korean Language Understanding Evaluation.

[...]

Sungjoon Park¹, Jihyung Moon, Sungdong Kim², Won Ik Cho³, Jiyoon Han⁴, Jang-Won Park, Chisung Song, Junseong Kim, Yongsook Song⁵, Tae-Hwan Oh⁴, Joohong Lee, Juhyun Oh³, Sungwon Lyu, Younghoon Jeong⁶, Inkwon Lee², Sangwoo Seo, Dongjun Lee, Hyunwoo Kim³, Myeonghwa Lee¹, Seongbo Jang, Seungwon Do, Sunkyoung Kim¹, KyungTae Lim⁷, Jongwon Lee, Kyumin Park¹, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha², Kyunghyun Cho⁸ - Show less +27 more•Institutions (8)

KAIST¹, Naver Corporation², Seoul National University³, Yonsei University⁴, Kyung Hee University⁵, Sogang University⁶, Hanbat National University⁷, New York University⁸

20 May 2021-arXiv: Computation and Language

TL;DR: The Korean Language Understanding Evaluation (KLUE) benchmark as mentioned in this paper is a collection of 8 Korean NLP tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.

...read moreread less

Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at this https URL.

...read moreread less

7 citations

Inkwon Lee

Papers

Cited by