Institution

Naver Corporation

Company•Seongnam-si, South Korea•

About: Naver Corporation is a company organization based out in Seongnam-si, South Korea. It is known for research contribution in the topics: Terminal (electronics) & Computer science. The organization has 4038 authors who have published 4294 publications receiving 35045 citations. The organization is also known as: NAVER Corporation & NAVER.

...read moreread less

Topics: Terminal (electronics), Computer science, Service (business), The Internet, Web page ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Method and system for providing representative phrase

[...]

Jae Seung Shin¹, Young Sub Park¹, Jae Keol Choi¹, Won Sook Noh¹•Institutions (1)

Naver Corporation¹

05 Jul 2011

TL;DR: In this article, a method and system for providing a representative phrase corresponding to a real-time (current time) popular keyword is presented. But it is not shown on a web page, or the like.

...read moreread less

Abstract: A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.

...read moreread less

95 citations

Book Chapter•DOI•

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues

[...]

Samuel Albanie¹, Gül Varol¹, Liliane Momeni¹, Triantafyllos Afouras¹, Joon Son Chung², Joon Son Chung¹, Neil Fox³, Andrew Zisserman¹ - Show less +4 more•Institutions (3)

University of Oxford¹, Naver Corporation², University College London³

23 Aug 2020

TL;DR: A new scalable approach to data collection for sign recognition in continuous videos is introduced, and it is shown that BSL-1K can be used to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks.

...read moreread less

Abstract: Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 h of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data—the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks—we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.

...read moreread less

94 citations

Proceedings Article•DOI•

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation

[...]

Soo-Whan Chung¹, Joon Son Chung², Hong-Goo Kang¹•Institutions (2)

Yonsei University¹, Naver Corporation²

12 May 2019

TL;DR: In this article, a cross-modal retrieval strategy was proposed to find the most relevant audio segment given a short video clip for audio-to-video synchronisation, where the objective is to find an audio segment that is relevant to the video.

...read moreread less

Abstract: This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronisation. Here, we set up the problem as one of cross-modal retrieval, where the objective is to find the most relevant audio segment given a short video clip. The method builds on the recent advances in learning representations from cross-modal self-supervision. The main contributions of this paper are as follows: (1) we propose a new learning strategy where the embeddings are learnt via a multi-way matching problem, as opposed to a binary classification (matching or non-matching) problem as proposed by recent papers; (2) we demonstrate that performance of this method far exceeds the existing baselines on the synchronisation task; (3) we use the learnt embeddings for visual speech recognition in self-supervision, and show that the performance matches the representations learnt end-to-end in a fully-supervised manner.

...read moreread less

94 citations

Proceedings Article•DOI•

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

[...]

Jee-weon Jung¹, Hee-Soo Heo², Ju-ho Kim², Hye-jin Shim¹, Ha-Jin Yu³ - Show less +1 more•Institutions (3)

University of Seoul¹, Naver Corporation², Seoul National University³

17 Apr 2019

TL;DR: In this paper, an end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.

...read moreread less

Abstract: Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification. Adjustment of model architecture using a pre-training scheme can extract speaker embeddings, giving a significant improvement in performance. Additional objective functions simplify the process of extracting speaker embeddings by merging conventional two-phase processes: extracting utterance-level features such as i-vectors or x-vectors and the feature enhancement phase, e.g., linear discriminant analysis. Effective back-end classification models that suit the proposed speaker embedding are also explored. We propose an end-to-end system that comprises two deep neural networks, one front-end for utterance-level speaker embedding extraction and the other for back-end classification. Experiments conducted on the VoxCeleb1 dataset demonstrate that the proposed model achieves state-of-the-art performance among systems without data augmentation. The proposed system is also comparable to the state-of-the-art x-vector system that adopts data augmentation.

...read moreread less

94 citations

Book Chapter•DOI•

Self-Supervised Learning of Audio-Visual Objects from Video

[...]

Triantafyllos Afouras¹, Andrew Owens², Joon Son Chung³, Joon Son Chung¹, Andrew Zisserman¹ - Show less +1 more•Institutions (3)

University of Oxford¹, University of Michigan², Naver Corporation³

23 Aug 2020

TL;DR: In this article, a self-supervised learning approach is proposed to transform a video into a set of discrete audio-visual objects using selfsupervised self-attention and optical flow to aggregate information over time.

...read moreread less

Abstract: Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end, we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time. We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks: (a) multi-speaker sound source separation, (b) localizing and tracking speakers, (c) correcting misaligned audio-visual data, and (d) active speaker detection. Using our representation, these tasks can be solved entirely by training on unlabeled video, without the aid of object detectors. We also demonstrate the generality of our method by applying it to non-human speakers, including cartoons and puppets. Our model significantly outperforms other self-supervised approaches, and obtains performance competitive with methods that use supervised face detection.

...read moreread less

87 citations

Collapse

Authors

Showing all 4041 results

Name	H-index	Papers	Citations
Andrea Vedaldi	89	305	63305
Sunghun Kim	51	115	12994
Eric Gaussier	41	231	8203
Un Ju Jung	39	98	5696
Hyun-Soo Kim	37	421	5650
Gabriela Csurka	37	145	10959
Nojun Kwak	34	234	6026
Young-Jin Park	31	257	3759
Sung Joo Kim	31	196	3078
Jae-Hoon Kim	30	323	5847
Jung-Ryul Lee	29	222	3322
Joon Son Chung	28	73	4900
Ok-Hwan Lee	27	163	2896
Diane Larlus	27	69	4722
Jung Goo Lee	26	142	1917