Home
/
Authors
/
Gichang Lee

Author

Gichang Lee

Bio: Gichang Lee is an academic researcher. The author has contributed to research in topics: Tokenization (data security). The author has an hindex of 1, co-authored 2 publications receiving 5 citations.

Topics: Tokenization (data security)

Papers

PDF

Open Access

More filters

Posted Content•

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

[...]

Boseop Kim, HyoungSeok Kim, Sang Woo Lee¹, Gichang Lee, Dong-Hyun Kwak¹, Dong Hyeon Jeon, Sunghyun Park², Sungju Kim, Seonhoon Kim³, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee⁴, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park³, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo¹, Minsuk Chang⁵, Soobin Suh, Sookyo In, Jin-Seong Park⁶, Kyungduk Kim⁷, Hiun Kim, Jisu Jeong¹, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee⁸, Jae-Wook Kang⁹, Inho Kang¹, Jung-Woo Ha¹, Woo-Myoung Park⁷, Nako Sung¹ - Show less +33 more•Institutions (9)

Naver Corporation¹, Amazon.com², Seoul National University³, Dong-eui University⁴, KAIST⁵, Hanyang University⁶, Samsung⁷, Yonsei University⁸, Chonbuk National University⁹

10 Sep 2021-arXiv: Computation and Language

TL;DR: HyperCLOVA as discussed by the authors is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.

...read moreread less

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

...read moreread less

6 citations

Proceedings Article•

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

[...]

Naver Corporation¹, Amazon.com², Seoul National University³, Dong-eui University⁴, KAIST⁵, Hanyang University⁶, Samsung⁷, Yonsei University⁸, Chonbuk National University⁹

10 Sep 2021

TL;DR: HyperCLOVA as mentioned in this paper is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.

...read moreread less

Cited by

PDF

Open Access

More filters

Posted Content•

Multitask Prompted Training Enables Zero-Shot Task Generalization

[...]

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal V. Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Tom Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Févry, Jason A. Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers, Thomas Wolf, Alexander M. Rush - Show less +37 more

15 Oct 2021-arXiv: Learning

TL;DR: This article developed a system for easily mapping general natural language tasks into a human-readable prompted form, and fine-tuned a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

...read moreread less

Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6x its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/.

...read moreread less

7 citations

Proceedings Article•

Multitask Prompted Training Enables Zero-Shot Task Generalization

[...]

25 Apr 2022

TL;DR: The authors developed a system for easily mapping general natural language tasks into a human-readable prompted form, and fine-tuned a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

...read moreread less

4 citations

Posted Content•

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

[...]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, Jie Tang - Show less +2 more

14 Oct 2021-arXiv: Computation and Language

TL;DR: P-Tuning v2 as mentioned in this paper is a version of prefix-tuning optimized and adapted for NLU, which matches the performance of fine tuning while having only 0.1\%-3\% tuned parameters.

...read moreread less

Abstract: Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pre-trained models. We also find that existing methods of prompt tuning cannot handle hard sequence tagging tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks. It matches the performance of fine-tuning while having only 0.1\%-3\% tuned parameters. Our method P-Tuning v2 is not a new method, but a version of prefix-tuning \cite{li2021prefix} optimized and adapted for NLU. Given the universality and simplicity of P-Tuning v2, we believe it can serve as an alternative to fine-tuning and a strong baseline for future research.

...read moreread less

1 citations

Posted Content•

Intent-based Product Collections for E-commerce using Pretrained Language Models

[...]

Hiun Kim, Jisu Jeong, Kyungmin Kim, Dongjun Lee, Hyun Dong Lee, Dongpil Seo, Jeeseung Han, Dong Wook Park, Ji Ae Heo, Rak Yeong Kim - Show less +6 more

15 Oct 2021-arXiv: Information Retrieval

TL;DR: Zhang et al. as mentioned in this paper used a pretrained language model (PLM) that leverages textual attributes of web-scale products to make intent-based product collections, and trained a BERT with triplet loss by setting an intent sentence to an anchor and corresponding products to positive examples.

...read moreread less

Abstract: Building a shopping product collection has been primarily a human job. With the manual efforts of craftsmanship, experts collect related but diverse products with common shopping intent that are effective when displayed together, e.g., backpacks, laptop bags, and messenger bags for freshman bag gifts. Automatically constructing a collection requires an ML system to learn a complex relationship between the customer's intent and the product's attributes. However, there have been challenging points, such as 1) long and complicated intent sentences, 2) rich and diverse product attributes, and 3) a huge semantic gap between them, making the problem difficult. In this paper, we use a pretrained language model (PLM) that leverages textual attributes of web-scale products to make intent-based product collections. Specifically, we train a BERT with triplet loss by setting an intent sentence to an anchor and corresponding products to positive examples. Also, we improve the performance of the model by search-based negative sampling and category-wise positive pair augmentation. Our model significantly outperforms the search-based baseline model for intent-based product matching in offline evaluations. Furthermore, online experimental results on our e-commerce platform show that the PLM-based method can construct collections of products with increased CTR, CVR, and order-diversity compared to expert-crafted collections.

...read moreread less

Posted Content•

PAGnol: An Extra-Large French Generative Model.

[...]

Julien Launay, E. L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah - Show less +4 more

16 Oct 2021-arXiv: Computation and Language

TL;DR: PAGnol-XL as discussed by the authors is the largest pre-trained model for the French language and achieves state-of-the-art performance in the abstract summarization task.

...read moreread less

Abstract: Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models. For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language, and compare it with its English counterpart. We find the pre-training dataset significantly conditions the quality of the outputs, with common datasets such as OSCAR leading to low-quality offensive text. We evaluate our models on discriminative and generative tasks in French, comparing to other state-of-the-art French and multilingual models, and reaching the state of the art in the abstract summarization task. Our research was conducted on the public GENCI Jean Zay supercomputer, and our models up to the Large are made publicly available.

...read moreread less