scispace - formally typeset
Search or ask a question
Institution

Naver Corporation

CompanySeongnam-si, South Korea
About: Naver Corporation is a company organization based out in Seongnam-si, South Korea. It is known for research contribution in the topics: Terminal (electronics) & Computer science. The organization has 4038 authors who have published 4294 publications receiving 35045 citations. The organization is also known as: NAVER Corporation & NAVER.


Papers
More filters
Posted Content
TL;DR: A unified four-stage STR framework is introduced that most existing STR models fit into and allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations.
Abstract: Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent choices of training and evaluation datasets. This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules.

149 citations

Proceedings ArticleDOI
17 Oct 2018
TL;DR: This paper proposes a novel GAN-based collaborative filtering (CF) framework to provide higher accuracy in recommendation and validate that vector-wise adversarial training employed in CFGAN is really effective to solve the problem of existing GAn-based CF methods.
Abstract: Generative Adversarial Networks (GAN) have achieved big success in various domains such as image generation, music generation, and natural language generation. In this paper, we propose a novel GAN-based collaborative filtering (CF) framework to provide higher accuracy in recommendation. We first identify a fundamental problem of existing GAN-based methods in CF and highlight it quantitatively via a series of experiments. Next, we suggest a new direction of vector-wise adversarial training to solve the problem and propose our GAN-based CF framework, called CFGAN, based on the direction. We identify a unique challenge that arises when vector-wise adversarial training is employed in CF. We then propose three CF methods realized on top of our CFGAN that are able to address the challenge. Finally, via extensive experiments on real-world datasets, we validate that vector-wise adversarial training employed in CFGAN is really effective to solve the problem of existing GAN-based CF methods. Furthermore, we demonstrate that our proposed CF methods on CFGAN provide recommendation accuracy consistently and universally higher than those of the state-of-the-art recommenders.

149 citations

Posted Content
TL;DR: This work presents Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning.
Abstract: Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from vision and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.

149 citations

Patent
11 Apr 2005
TL;DR: In this article, a method and a system for providing content are disclosed where a plurality of user clients coupled by a mesh structure transmit large-size multimedia data at a high speed.
Abstract: A method and a system for providing content are disclosed where a plurality of user clients coupled by a mesh structure transmit large-size multimedia data at a high speed. A user client receives content data from other user clients or a content server. Even if many users request content, the load of a server does not increase because the content server and the user clients provide content together. A user client requests content data from a plurality of nodes and receive content data by way of a parallel/distribution method for a stable data receipt.

148 citations

Posted Content
TL;DR: A novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure and achieves state-of-the-art performance in all metrics.
Abstract: Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin.

147 citations


Authors

Showing all 4041 results

NameH-indexPapersCitations
Andrea Vedaldi8930563305
Sunghun Kim5111512994
Eric Gaussier412318203
Un Ju Jung39985696
Hyun-Soo Kim374215650
Gabriela Csurka3714510959
Nojun Kwak342346026
Young-Jin Park312573759
Sung Joo Kim311963078
Jae-Hoon Kim303235847
Jung-Ryul Lee292223322
Joon Son Chung28734900
Ok-Hwan Lee271632896
Diane Larlus27694722
Jung Goo Lee261421917
Network Information
Related Institutions (5)
Kyungpook National University
42.1K papers, 834.6K citations

80% related

Pusan National University
45K papers, 819.3K citations

80% related

Korea University
82.4K papers, 1.8M citations

80% related

Seoul National University
138.7K papers, 3.7M citations

79% related

Chungnam National University
32.1K papers, 543.3K citations

79% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20226
2021144
2020174
2019138
201882
201764