Showing papers by "Naver Corporation published in 2018"

PDF

Open Access

Proceedings Article•DOI•

StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

18 Jun 2018

TL;DR: StarGAN as discussed by the authors proposes a unified model architecture to perform image-to-image translation for multiple domains using only a single model, which leads to superior quality of translated images compared to existing models as well as the capability of flexibly translating an input image to any desired target domain.

...read moreread less

Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

...read moreread less

2,479 citations

Journal Article•DOI•

AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations

[...]

Luciano Floridi¹, Luciano Floridi², Josh Cowls², Josh Cowls¹, Monica Beltrametti³, Raja Chatila⁴, Patrice Chazerand⁵, Virginia Dignum⁶, Virginia Dignum⁷, Christoph Luetge⁸, Robert Madelin², Ugo Pagallo⁹, Francesca Rossi¹⁰, Francesca Rossi¹¹, Burkhard Schafer¹², Peggy Valcke¹³, Peggy Valcke¹⁴, Effy Vayena¹⁵ - Show less +14 more•Institutions (15)

The Turing Institute¹, University of Oxford², Naver Corporation³, Pierre-and-Marie-Curie University⁴, Digital Europe⁵, Delft University of Technology⁶, Umeå University⁷, Technische Universität München⁸, University of Turin⁹, University of Padua¹⁰, IBM¹¹, University of Edinburgh¹², Bocconi University¹³, Catholic University of Leuven¹⁴, ETH Zurich¹⁵

01 Dec 2018-Minds and Machines

TL;DR: The core opportunities and risks of AI for society are introduced; a synthesis of five ethical principles that should undergird its development and adoption are presented; and 20 concrete recommendations are offered to serve as a firm foundation for the establishment of a Good AI Society.

...read moreread less

Abstract: This article reports the findings of AI4People, an Atomium—EISMD initiative designed to lay the foundations for a “Good AI Society”. We introduce the core opportunities and risks of AI for society; present a synthesis of five ethical principles that should undergird its development and adoption; and offer 20 concrete recommendations—to assess, to develop, to incentivise, and to support good AI—which in some cases may be undertaken directly by national or supranational policy makers, while in others may be led by other stakeholders. If adopted, these recommendations would serve as a firm foundation for the establishment of a Good AI Society.

...read moreread less

855 citations

Proceedings Article•

Bilinear Attention Networks

[...]

Jin-Hwa Kim¹, Jaehyun Jun, Byoung-Tak Zhang²•Institutions (2)

Naver Corporation¹, Seoul National University²

21 May 2018

TL;DR: BAN is proposed that find bilinear attention distributions to utilize given vision-language information seamlessly and quantitatively and qualitatively evaluates the model on visual question answering and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.

...read moreread less

Abstract: Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.

...read moreread less

384 citations

Posted Content•

Bilinear Attention Networks

[...]

Jin-Hwa Kim¹, Jaehyun Jun, Byoung-Tak Zhang²•Institutions (2)

Naver Corporation¹, Seoul National University²

21 May 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article proposed bilinear attention networks (BAN) that find bilinearly attention distributions to utilize given vision-language information seamlessly. But, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive.

...read moreread less

281 citations

Posted Content•

The Art, Science, and Engineering of Fuzzing: A Survey

[...]

Valentin J. M. Manès¹, HyungSeok Han¹, Choongwoo Han¹, Sang Kil Cha², Manuel Egele³, Edward J. Schwartz⁴, Maverick Woo⁴ - Show less +3 more•Institutions (4)

KAIST¹, Naver Corporation², Boston University³, Carnegie Mellon University⁴

01 Dec 2018-arXiv: Cryptography and Security

TL;DR: This paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature, and methodically explores the design decisions at every stage of the model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective.

...read moreread less

Abstract: Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective

...read moreread less

180 citations

Proceedings Article•DOI•

Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation

[...]

Buru Chang¹, Yonggyu Park¹, Donghyeon Park¹, Seongsoon Kim², Jaewoo Kang¹ - Show less +1 more•Institutions (2)

Korea University¹, Naver Corporation²

01 Jul 2018

TL;DR: CAPE, the first content-aware POI embedding model which utilizes text content that provides information about the characteristics of a POI, is proposed and constructed.

...read moreread less

Abstract: Recommending a point-of-interest (POI) a user will visit next based on temporal and spatial context information is an important task in mobile-based applications. Recently, several POI recommendation models based on conventional sequential-data modeling approaches have been proposed. However, such models focus on only a user's check-in sequence information and the physical distance between POIs. Furthermore, they do not utilize the characteristics of POIs or the relationships between POIs. To address this problem, we propose CAPE, the first content-aware POI embedding model which utilizes text content that provides information about the characteristics of a POI. CAPE consists of a check-in context layer and a text content layer. The check-in context layer captures the geographical influence of POIs from the check-in sequence of a user, while the text content layer captures the characteristics of POIs from the text content. To validate the efficacy of CAPE, we constructed a large-scale POI dataset. In the experimental evaluation, we show that the performance of the existing POI recommendation models can be significantly improved by simply applying CAPE to the models.

...read moreread less

171 citations

Posted Content•

Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

[...]

Byeongho Heo¹, Minsik Lee², Sangdoo Yun³, Jin Young Choi¹•Institutions (3)

Seoul National University¹, Hanyang University², Naver Corporation³

08 Nov 2018-arXiv: Learning

TL;DR: In this paper, a knowledge transfer method via distillation of activation boundaries formed by hidden neurons is proposed, where the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher.

...read moreread less

Abstract: An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the student coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art.

...read moreread less

155 citations

Proceedings Article•DOI•

CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks

[...]

Dong-Kyu Chae¹, Jin-Soo Kang¹, Sang-Wook Kim¹, Jung-Tae Lee²•Institutions (2)

Hanyang University¹, Naver Corporation²

17 Oct 2018

TL;DR: This paper proposes a novel GAN-based collaborative filtering (CF) framework to provide higher accuracy in recommendation and validate that vector-wise adversarial training employed in CFGAN is really effective to solve the problem of existing GAn-based CF methods.

...read moreread less

Abstract: Generative Adversarial Networks (GAN) have achieved big success in various domains such as image generation, music generation, and natural language generation. In this paper, we propose a novel GAN-based collaborative filtering (CF) framework to provide higher accuracy in recommendation. We first identify a fundamental problem of existing GAN-based methods in CF and highlight it quantitatively via a series of experiments. Next, we suggest a new direction of vector-wise adversarial training to solve the problem and propose our GAN-based CF framework, called CFGAN, based on the direction. We identify a unique challenge that arises when vector-wise adversarial training is employed in CF. We then propose three CF methods realized on top of our CFGAN that are able to address the challenge. Finally, via extensive experiments on real-world datasets, we validate that vector-wise adversarial training employed in CFGAN is really effective to solve the problem of existing GAN-based CF methods. Furthermore, we demonstrate that our proposed CF methods on CFGAN provide recommendation accuracy consistently and universally higher than those of the state-of-the-art recommenders.

...read moreread less

149 citations

Proceedings Article•DOI•

Context-Aware Deep Feature Compression for High-Speed Visual Tracking

[...]

Jongwon Choi¹, Hyung Jin Chang², Tobias Fischer², Sangdoo Yun¹, Sangdoo Yun³, Kyuewang Lee¹, Jiyeoup Jeong¹, Yiannis Demiris², Jin Young Choi¹ - Show less +5 more•Institutions (3)

Seoul National University¹, Imperial College London², Naver Corporation³

17 Dec 2018

TL;DR: A new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers and introduces extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert autoencoders.

...read moreread less

Abstract: We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert autoencoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.

...read moreread less

133 citations

Posted Content•

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

[...]

Seonhoon Kim¹, Inho Kang¹, Nojun Kwak²•Institutions (2)

Naver Corporation¹, Seoul National University²

29 May 2018-arXiv: Computation and Language

TL;DR: The authors proposed a densely-connected co-attentive recurrent neural network (C-RNN), which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers.

...read moreread less

Abstract: Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.

...read moreread less

107 citations

Book Chapter•DOI•

Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation

[...]

Hyojin Bahng¹, Seungjoo Yoo¹, Wonwoong Cho¹, David Keetae Park¹, David Keetae Park², Ziming Wu³, Xiaojuan Ma³, Jaegul Choo¹ - Show less +4 more•Institutions (3)

Korea University¹, Naver Corporation², Hong Kong University of Science and Technology³

08 Sep 2018

TL;DR: A novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette, using a manually curated dataset called Palette-and-Text (PAT).

...read moreread less

Abstract: This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it. For this task, we introduce our manually curated dataset called Palette-and-Text (PAT). Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks. The former captures the semantics of the text input and produce relevant color palettes. The latter colorizes a grayscale image using the generated color palette. Our evaluation results show that people preferred our generated palettes over ground truth palettes and that our model can effectively reflect the given palette when colorizing an image.

...read moreread less

Proceedings Article•DOI•

Context-aware Deep Feature Compression for High-speed Visual Tracking

[...]

Jongwon Choi¹, Hyung Jin Chang², Tobias Fischer², Sangdoo Yun¹, Sangdoo Yun³, Kyuewang Lee¹, Jiyeoup Jeong¹, Yiannis Demiris², Jin Young Choi¹ - Show less +5 more•Institutions (3)

Seoul National University¹, Imperial College London², Naver Corporation³

28 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a context-aware correlation filter based tracking framework is proposed to achieve both high computational speed and state-of-the-art performance among real-time trackers.

...read moreread less

Abstract: We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.

...read moreread less

Book Chapter•DOI•

Multimodal Dual Attention Memory for Video Story Question Answering

[...]

Kyung-Min Kim¹, Seong-Ho Choi², Jin-Hwa Kim, Byoung-Tak Zhang²•Institutions (2)

Naver Corporation¹, Seoul National University²

08 Sep 2018

TL;DR: The best performance of the dual attention mechanism combined with late fusion by ablation studies are confirmed and MDAM achieves new state-of-the-art results with significant margins compared to the runner-up models.

...read moreread less

Abstract: We propose a video story question-answering (QA) architecture, Multimodal Dual Attention Memory (MDAM). The key idea is to use a dual attention mechanism with late fusion. MDAM uses self-attention to learn the latent concepts in scene frames and captions. Given a question, MDAM uses the second attention over these latent concepts. Multimodal fusion is performed after the dual attention processes (late fusion). Using this processing pipeline, MDAM learns to infer a high-level vision-language joint representation from an abstraction of the full video content. We evaluate MDAM on PororoQA and MovieQA datasets which have large-scale QA annotations on cartoon videos and movies, respectively. For both datasets, MDAM achieves new state-of-the-art results with significant margins compared to the runner-up models. We confirm the best performance of the dual attention mechanism combined with late fusion by ablation studies. We also perform qualitative analysis by visualizing the inference mechanisms of MDAM.

...read moreread less

Proceedings Article•DOI•

Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification.

[...]

Jee-weon Jung¹, Hee-Soo Heo², IL-Ho Yang³, Hye-jin Shim¹, Ha-Jin Yu³ - Show less +1 more•Institutions (3)

University of Seoul¹, Naver Corporation², Seoul National University³

02 Sep 2018

TL;DR: This paper investigated regularization techniques, a multistep training scheme, and a residual connection with pooling layers in the perspective of mitigating speaker overfitting which lead to considerable performance improvements.

...read moreread less

Abstract: In this research, we propose a novel raw waveform endto-end DNNs for text-independent speaker verification. For speaker verification, many studies utilize the speaker embedding scheme, which trains deep neural networks as speaker identifiers to extract speaker features. However, this scheme has an intrinsic limitation in which the speaker feature, trained to classify only known speakers, is required to represent the identity of unknown speakers. Owing to this mismatch, speaker embedding systems tend to well generalize towards unseen utterances from known speakers, but are overfitted to known speakers. This phenomenon is referred to as speaker overfitting. In this paper, we investigated regularization techniques, a multistep training scheme, and a residual connection with pooling layers in the perspective of mitigating speaker overfitting which lead to considerable performance improvements. Technique effectiveness is evaluated using the VoxCeleb dataset, which comprises over 1,200 speakers from various uncontrolled environments. To the best of our knowledge, we are the first to verify the success of end-to-end DNNs directly using raw waveforms in text-independent scenario. It shows an equal error rate of 7.4%, which is lower than i-vector/probabilistic linear discriminant analysis and end-to-end DNNs that use spectrograms.

...read moreread less

Book Chapter•DOI•

VisDrone-SOT2018: The Vision Meets Drone Single-Object Tracking Challenge Results

[...]

Longyin Wen, Pengfei Zhu¹, Dawei Du², Xiao Bian³, Haibin Ling⁴, Qinghua Hu¹, Chenfeng Liu¹, Hao Cheng¹, Xiaoyu Liu¹, Wenya Ma¹, Qinqin Nie¹, Haotian Wu¹, Lianjie Wang¹, Asanka G. Perera⁵, Baochang Zhang⁶, Byeongho Heo⁷, Chunlei Liu⁶, Dongdong Li⁸, Emmanouil Michail, Hanlin Chen⁹, Hao Liu⁸, Haojie Li¹⁰, Ioannis Kompatsiaris, Jian Cheng¹¹, Jiaqing Fan¹¹, Jie Zhang¹², Jin-Young Choi⁷, Jing Li¹³, Jinyu Yang⁶, Jongwon Choi⁷, Juanping Zhao¹⁴, Jungong Han¹⁵, Kaihua Zhang¹⁶, Kaiwen Duan¹¹, Ke Song¹⁷, Konstantinos Avgerinakis, Kyuewang Lee⁷, Lu Ding¹⁴, Martin Lauer¹⁸, Panagiotis Giannakeris, Peizhen Zhang¹⁹, Qiang Wang¹¹, Qianqian Xu¹¹, Qingming Huang²⁰, Qingshan Liu¹⁶, Robert Laganiere²¹, Ruixin Zhang²², Sangdoo Yun²³, Shengyin Zhu²⁴, Sihang Wu¹⁰, Stefanos Vrochidis, Wei Tian¹⁸, Wei Zhang¹⁷, Weidong Chen¹¹, Weiming Hu¹¹, Wenhao Wang¹⁷, Wenhua Zhang¹², Wenrui Ding⁶, Xiaohao He²⁵, Xiaotong Li¹², Xin Zhang¹², Xinbin Luo¹⁴, Xixi Hu¹⁷, Yang Meng¹², Yangliu Kuai⁸, Yanyun Zhao²⁴, Yaxuan Li¹⁷, Yifan Yang¹¹, Yifan Zhang¹¹, Yong Wang²¹, Yuankai Qi²⁰, Zhipeng Deng⁸, Zhiqun He²⁴ - Show less +69 more•Institutions (25)

08 Sep 2018

TL;DR: The evaluation protocol of the VisDrone-SOT2018 challenge is presented and the results of a comparison of 22 trackers on the benchmark dataset are presented, which are publicly available on the challenge website.

...read moreread less

Abstract: Single-object tracking, also known as visual tracking, on the drone platform attracts much attention recently with various applications in computer vision, such as filming and surveillance. However, the lack of commonly accepted annotated datasets and standard evaluation platform prevent the developments of algorithms. To address this issue, the Vision Meets Drone Single-Object Tracking (VisDrone-SOT2018) Challenge workshop was organized in conjunction with the 15th European Conference on Computer Vision (ECCV 2018) to track and advance the technologies in such field. Specifically, we collect a dataset, including 132 video sequences divided into three non-overlapping sets, i.e., training (86 sequences with 69, 941 frames), validation (11 sequences with 7, 046 frames), and testing (35 sequences with 29, 367 frames) sets. We provide fully annotated bounding boxes of the targets as well as several useful attributes, e.g., occlusion, background clutter, and camera motion. The tracking targets in these sequences include pedestrians, cars, buses, and animals. The dataset is extremely challenging due to various factors, such as occlusion, large scale, pose variation, and fast motion. We present the evaluation protocol of the VisDrone-SOT2018 challenge and the results of a comparison of 22 trackers on the benchmark dataset, which are publicly available on the challenge website: http://www.aiskyeye.com/. We hope this challenge largely boosts the research and development in single object tracking on drone platforms.

...read moreread less

Posted Content•

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder.

[...]

Xiaodong Gu¹, Kyunghyun Cho¹, Jung-Woo Ha², Sunghun Kim³•Institutions (3)

Hong Kong University of Science and Technology¹, New York University², Naver Corporation³

31 May 2018-arXiv: Computation and Language

TL;DR: DialogWAE is proposed, a conditional Wasserstein autoencoder specially designed for dialogue modeling that models the distribution of data by training a GAN within the latent variable space and develops a Gaussian mixture prior network to enrich the latent space.

...read moreread less

Abstract: Variational autoencoders~(VAEs) have shown a promise in data-driven conversation modeling. However, most VAE conversation models match the approximate posterior distribution over the latent variables to a simple prior such as standard normal distribution, thereby restricting the generated responses to a relatively simple (e.g., unimodal) scope. In this paper, we propose DialogWAE, a conditional Wasserstein autoencoder~(WAE) specially designed for dialogue modeling. Unlike VAEs that impose a simple distribution over the latent variables, DialogWAE models the distribution of data by training a GAN within the latent variable space. Specifically, our model samples from the prior and posterior distributions over the latent variables by transforming context-dependent random noise using neural networks and minimizes the Wasserstein distance between the two distributions. We further develop a Gaussian mixture prior network to enrich the latent space. Experiments on two popular datasets show that DialogWAE outperforms the state-of-the-art approaches in generating more coherent, informative and diverse responses.

...read moreread less

Proceedings Article•DOI•

Review Sentiment-Guided Scalable Deep Recommender System

[...]

Dongmin Hyun¹, Chanyoung Park¹, Min-Chul Yang², Ilhyeon Song², Jung-Tae Lee², Hwanjo Yu¹ - Show less +2 more•Institutions (2)

Pohang University of Science and Technology¹, Naver Corporation²

27 Jun 2018

TL;DR: This work presents a scalable review-aware recommendation method, called SentiRec, that is guided to incorporate the sentiments of reviews when modeling the users and the items and drastically reduces the training time and the memory usage.

...read moreread less

Abstract: Existing review-aware recommendation methods represent users (or items) through the concatenation of the reviews written by (or for) them, and depend entirely on convolutional neural networks (CNNs) to extract meaningful features for modeling users (or items). However, understanding reviews based only on the raw words of reviews is challenging because of the inherent ambiguity contained in them originated from the users' different tendency in writing. Moreover, it is inefficient in time and memory to model users/items by the concatenation of their associated reviews owing to considerably large inputs to CNNs. In this work, we present a scalable review-aware recommendation method, called SentiRec, that is guided to incorporate the sentiments of reviews when modeling the users and the items. SentiRec is a two-step approach composed of the first step that includes the encoding of each review into a fixed-size review vector that is trained to embody the sentiment of the review, followed by the second step that generates recommendations based on the vector-encoded reviews. Through our experiments, we show that SentiRec not only outperforms the existing review-aware methods, but also drastically reduces the training time and the memory usage. We also conduct a qualitative evaluation on the vector-encoded reviews trained by SentiRec to demonstrate that the overall sentiments are indeed encoded therein.

...read moreread less

Journal Article•DOI•

Dynamic coordination of two-metal-ions orchestrates λ-exonuclease catalysis.

[...]

Wonseok Hwang¹, Wonseok Hwang², Jungmin Yoo³, Yuno Lee¹, Suyeon Park³, Phuong Lien Hoang³, Hyeokjin Cho³, Jeongmin Yu³, Thi Minh Hoa Vo³, Minsang Shin⁴, Mi Sun Jin³, Daeho Park³, Changbong Hyeon¹, Gwangrog Lee³ - Show less +10 more•Institutions (4)

Korea Institute for Advanced Study¹, Naver Corporation², Gwangju Institute of Science and Technology³, Kyungpook National University⁴

23 Oct 2018-Nature Communications

TL;DR: The use of single-molecule FRET is used to study λ-exonuclease to find that metal-ion-coordination is correlated with enzymatic reaction-steps and offers insights into the origin of dynamic heterogeneity inenzymatic catalysis.

...read moreread less

Abstract: Metal ions at the active site of an enzyme act as cofactors, and their dynamic fluctuations can potentially influence enzyme activity. Here, we use λ-exonuclease as a model enzyme with two Mg2+ binding sites and probe activity at various concentrations of magnesium by single-molecule-FRET. We find that while MgA2+ and MgB2+ have similar binding constants, the dissociation rate of MgA2+ is two order of magnitude lower than that of MgB2+ due to a kinetic-barrier-difference. At physiological Mg2+ concentration, the MgB2+ ion near the 5'-terminal side of the scissile phosphate dissociates each-round of degradation, facilitating a series of DNA cleavages via fast product-release concomitant with enzyme-translocation. At a low magnesium concentration, occasional dissociation and slow re-coordination of MgA2+ result in pauses during processive degradation. Our study highlights the importance of metal-ion-coordination dynamics in correlation with the enzymatic reaction-steps, and offers insights into the origin of dynamic heterogeneity in enzymatic catalysis.

...read moreread less

Posted Content•

Multimodal Dual Attention Memory for Video Story Question Answering

[...]

Kyung-Min Kim¹, Seong-Ho Choi², Jin-Hwa Kim, Byoung-Tak Zhang²•Institutions (2)

Naver Corporation¹, Seoul National University²

21 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed a video story question-answering (QA) architecture, Multimodal Dual Attention Memory (MDAM), which uses self-attention to learn the latent concepts in scene frames and captions.

...read moreread less

Proceedings Article•DOI•

Replay Spoofing Detection System for Automatic Speaker Verification Using Multi-Task Learning of Noise Classes

[...]

Hye-jin Shim¹, Jee-weon Jung¹, Hee-Soo Heo², Sung-Hyun Yoon², Ha-Jin Yu³ - Show less +1 more•Institutions (3)

University of Seoul¹, Naver Corporation², Seoul National University³

01 Nov 2018

TL;DR: In this paper, a replay attack spoofing detection system for automatic speaker verification using multi-task learning of noise classes is proposed, which includes classifying the noise of playback devices, recording environments, and recording devices as well as the spoofing detecting.

...read moreread less

Abstract: In this paper, we propose a replay attack spoofing detection system for automatic speaker verification using multi-task learning of noise classes. We define the noise that is caused by the replay attack as replay noise. We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification. The multi-task learning includes classifying the noise of playback devices, recording environments, and recording devices as well as the spoofing detection. Each of the three types of the noise classes also includes a genuine class. The experiment results on the version 1.0 of ASVspoof2017 datasets demonstrate that the performance of our proposed system is improved by 30% relatively on the evaluation set.

...read moreread less

Patent•

Method and system for processing personal data base on block chain

[...]

Kim Eun Ji¹, Kim Yuwon•Institutions (1)

Naver Corporation¹

21 Sep 2018

TL;DR: In this paper, a method for processing personal data based on a block chain and a system thereof is presented, where a personal identification key is used to track and utilize the personal data for the corresponding user in the different services through the Personal Identification Key.

...read moreread less

Abstract: The present invention provides a method for processing personal data based on a block chain and a system thereof. According to embodiments of the present invention, the method for processing personal data utilizes a personal identification key in a block chain network registered for a user in different services identifying the same user by different identifiers and provides personal data for the corresponding user on the block chain network, thereby tracking and utilizing the personal data for the corresponding user in the different services through the personal identification key. The method for processing personal data of a medium comprises the following steps of: managing the identifier of a member registered in the medium; interlocking the personal identification key which is issued by the block chain network by the member and identifies the user corresponding to the member with the identifier; and transmitting a block to participants of the block chain network by using the personal identification key so that the block including data related to an activity of the member is connected to the block chain.

...read moreread less

Proceedings Article•DOI•

Aspect Based Sentiment Analysis into the Wild

[...]

Caroline Brun¹, Vassilina Nikoulina²•Institutions (2)

Xerox¹, Naver Corporation²

01 Oct 2018

TL;DR: This paper created a new manually annotated dataset of user generated data from the same domain as the training dataset, but from other sources and analyse the differences between the new and the standard ABSA dataset.

...read moreread less

Abstract: In this paper, we test state-of-the-art Aspect Based Sentiment Analysis (ABSA) systems trained on a widely used dataset on actual data. We created a new manually annotated dataset of user generated data from the same domain as the training dataset, but from other sources and analyse the differences between the new and the standard ABSA dataset. We then analyse the results in performance of different versions of the same system on both datasets. We also propose light adaptation methods to increase system robustness.

...read moreread less

Posted Content•

From handcrafted to deep local features

[...]

Gabriela Csurka, Christopher R. Dance¹, Martin Humenberger•Institutions (1)

Naver Corporation¹

26 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: An overview of the evolution of local features from handcrafted to deep-learning-based methods, followed by a discussion of several benchmarks and papers evaluating such local features, will help to fully understand the topic of image and region description in order to make best use of it in modern computer vision applications.

...read moreread less

Abstract: This paper presents an overview of the evolution of local features from handcrafted to deep-learning-based methods, followed by a discussion of several benchmarks and papers evaluating such local features. Our investigations are motivated by 3D reconstruction problems, where the precise location of the features is important. As we describe these methods, we highlight and explain the challenges of feature extraction and potential ways to overcome them. We first present handcrafted methods, followed by methods based on classical machine learning and finally we discuss methods based on deep-learning. This largely chronologically-ordered presentation will help the reader to fully understand the topic of image and region description in order to make best use of it in modern computer vision applications. In particular, understanding handcrafted methods and their motivation can help to understand modern approaches and how machine learning is used to improve the results. We also provide references to most of the relevant literature and code.

...read moreread less

Posted Content•

Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings

[...]

Jee-weon Jung¹, Hee-Soo Heo², Hye-jin Shim¹, Ha-Jin Yu³•Institutions (3)

University of Seoul¹, Naver Corporation², Seoul National University³

25 Oct 2018-arXiv: Audio and Speech Processing

TL;DR: In this paper, a teacher-student learning framework was applied to short utterance compensation for the first time in their knowledge. And they proposed an integrated text-independent speaker verification system that inputs utterances with short duration of 2 seconds or less.

...read moreread less

Abstract: The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs utterances with short duration of 2 seconds or less. We propose an approach using a teacher-student learning framework for this goal, applied to short utterance compensation for the first time in our knowledge. The core concept of the proposed system is to conduct the compensation throughout the network that extracts the speaker embedding, mainly in phonetic-level, rather than compensating via a separate system after extracting the speaker embedding. In the proposed architecture, phonetic-level features where each feature represents a segment of 130 ms are extracted using convolutional layers. A layer of gated recurrent units extracts an utterance-level feature using phonetic-level features. The proposed approach also adopts a new objective function for teacher-student learning that considers both Kullback-Leibler divergence of output layers and cosine distance of speaker embeddings layers. Experiments were conducted using deep neural networks that take raw waveforms as input, and output speaker embeddings on VoxCeleb1 dataset. The proposed model could compensate approximately 65 \% of the performance degradation due to the shortened duration.

...read moreread less

Posted Content•

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

[...]

Min-Jae Hwang¹, Frank K. Soong², Eunwoo Song¹, Xi Wang², Hyeonjoo Kang³, Hong-Goo Kang³ - Show less +2 more•Institutions (3)

Naver Corporation¹, Microsoft², Yonsei University³

29 Nov 2018-arXiv: Audio and Speech Processing

TL;DR: An LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density networkbased WaveNet model, which outperforms the conventional WaveNet vocoders both objectively and subjectively.

...read moreread less

Abstract: We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density network-based WaveNet model. The experimental results verify that the proposed system outperforms the conventional WaveNet vocoders both objectively and subjectively. In particular, the proposed method achieves 4.47 MOS within the TTS framework.

...read moreread less

Posted Content•

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

[...]

Eunwoo Song¹, Kyungguen Byun², Hong-Goo Kang²•Institutions (2)

Naver Corporation¹, Yonsei University²

09 Nov 2018-arXiv: Audio and Speech Processing

TL;DR: In this article, a WaveNet-based neural excitation model (ExcitNet) is proposed for statistical parametric speech synthesis systems, which employs an adaptive inverse filter to decouple spectral components from the speech signal.

...read moreread less

Abstract: This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of the difficulties in capturing the complicated time-varying nature of speech signals. To improve modeling efficiency, the proposed ExcitNet vocoder employs an adaptive inverse filter to decouple spectral components from the speech signal. The residual component, i.e. excitation signal, is then trained and generated within the WaveNet framework. In this way, the quality of the synthesized speech signal can be further improved since the spectral component is well represented by a deep learning framework and, moreover, the residual component is efficiently generated by the WaveNet framework. Experimental results show that the proposed ExcitNet vocoder, trained both speaker-dependently and speaker-independently, outperforms traditional linear prediction vocoders and similarly configured conventional WaveNet vocoders.

...read moreread less

Posted Content•

C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation

[...]

Hyojin Park¹, Youngjoon Yoo², Geonseok Seo¹, Dongyoon Han², Sangdoo Yun², Nojun Kwak¹ - Show less +2 more•Institutions (2)

Seoul National University¹, Naver Corporation²

12 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new block called Concentrated-Comprehensive Convolution (C3) is proposed which applies the asymmetric convolutions before the depth-wise separable dilated Convolution to compensate for the information loss due to dilated convolution.

...read moreread less

Abstract: One of the practical choices for making a lightweight semantic segmentation model is to combine a depth-wise separable convolution with a dilated convolution. However, the simple combination of these two methods results in an over-simplified operation which causes severe performance degradation due to loss of information contained in the feature map. To resolve this problem, we propose a new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution. The C3 block consists of a concentration stage and a comprehensive convolution stage. The first stage uses two depth-wise asymmetric convolutions for compressed information from the neighboring pixels to alleviate the information loss. The second stage increases the receptive field by using a depth-wise separable dilated convolution from the feature map of the first stage. We applied the C3 block to various segmentation frameworks (ESPNet, DRN, ERFNet, ENet) for proving the beneficial properties of our proposed method. Experimental results show that the proposed method preserves the original accuracies on Cityscapes dataset while reducing the complexity. Furthermore, we modified ESPNet to achieve about 2% better performance while reducing the number of parameters by half and the number of FLOPs by 35% compared with the original ESPNet. Finally, experiments on ImageNet classification task show that C3 block can successfully replace dilated convolutions.

...read moreread less

Journal Article•DOI•

Enriching Translation-Based Knowledge Graph Embeddings Through Continual Learning

[...]

Hyun-Je Song¹, Seong-Bae Park²•Institutions (2)

Naver Corporation¹, Kyung Hee University²

09 Oct 2018-IEEE Access

TL;DR: The experimental results from two tasks of knowledge graph embedding prove that the proposed method not only incorporates new knowledge of new triples into the existing embedding successfully but also preserves the knowledge of the current embedding.

...read moreread less

Abstract: This paper addresses an enrichment of translation-based knowledge graph embeddings. When new knowledge triples become available after a knowledge graph is embedded onto a vector space, the embedding should be enriched with the new triples, but without the triples used in training the embedding. The main challenge is that the enrichment of new triples should be accomplished without forgetting the knowledge of current embedding. This paper achieves the goal by minimizing a risk over the new triples penalized by rapid parameter change between old and new embedding models. The effectiveness of the proposed method is shown by learning a translation-based knowledge graph embedding trained incrementally using a series of knowledge triples. The experimental results from two tasks of knowledge graph embedding prove that the proposed method not only incorporates new knowledge of new triples into the existing embedding successfully but also preserves the knowledge of the current embedding.

...read moreread less

Journal Article•DOI•

A Mathematical Framework for Deep Learning in Elastic Source Imaging

[...]

Jaejun Yoo¹, Abdul Wahab, Jong Chul Ye•Institutions (1)

Naver Corporation¹

18 Oct 2018-Siam Journal on Applied Mathematics

TL;DR: In this article, a generic mathematical framework is proposed which extends a low-dimensional manifold regularization in the conventional sigmoid space to an inverse elastic source problem with sparse measurements.

...read moreread less

Abstract: An inverse elastic source problem with sparse measurements is our concern. A generic mathematical framework is proposed which extends a low-dimensional manifold regularization in the conventional s...

...read moreread less

Proceedings Article•DOI•

[Regular Paper] Interpretable Prediction of Vascular Diseases from Electronic Health Records via Deep Attention Networks

[...]

Seunghyun Park¹, You Jin Kim¹, Jeong Whun Kim², Jin Joo Park, Borim Ryu², Jung-Woo Ha¹ - Show less +2 more•Institutions (2)

Naver Corporation¹, Seoul National University²

06 Dec 2018

TL;DR: A novel disease prediction method, EHAN (EHR History-based prediction using Attention Network), based on the recurrent neural network (RNN) and attention mechanism is proposed, which outperformed the state-of-the-art model with respect to the various performance metrics.

...read moreread less

Abstract: Precise prediction of severe diseases resulting in mortality is one of the main issues in medical fields. Even if pathological and radiological measurements provide competitive precision, they usually require large costs of time and expense to obtain and analyze the data for prediction. Recently, end-to-end approaches based on deep neural networks have been proposed, however, they still suffer from the low classification performance and difficulties of interpretation. In this study, we propose a novel disease prediction method, EHAN (EHR History-based prediction using Attention Network), based on the recurrent neural network (RNN) and attention mechanism. The proposed method incorporates (1) a bidirectional gated recurrent units (GRU) for automated sequential modeling, (2) attention mechanism for improving long-term dependence modeling, (3) RNN-based gradient-weighted class activation mapping (Grad-CAM) to visualize the class specific attention-weights. We conducted the experiments to predict the occurrence of risky disease containing cardiovascular and cerebrovascular diseases from more than 40,000 hypertension patients' electronic health records (EHR). The results showed that the proposed method outperformed the state-of-the-art model with respect to the various performance metrics. Furthermore, we confirmed that the proposed visualizing methods can be used to assist data-driven discovery.

...read moreread less