scispace - formally typeset
Search or ask a question
Institution

Naver Corporation

CompanySeongnam-si, South Korea
About: Naver Corporation is a company organization based out in Seongnam-si, South Korea. It is known for research contribution in the topics: Terminal (electronics) & Computer science. The organization has 4038 authors who have published 4294 publications receiving 35045 citations. The organization is also known as: NAVER Corporation & NAVER.


Papers
More filters
Patent
05 Jul 2011
TL;DR: In this article, a method and system for providing a representative phrase corresponding to a real-time (current time) popular keyword is presented. But it is not shown on a web page, or the like.
Abstract: A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.

95 citations

Book ChapterDOI
23 Aug 2020
TL;DR: A new scalable approach to data collection for sign recognition in continuous videos is introduced, and it is shown that BSL-1K can be used to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks.
Abstract: Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 h of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data—the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks—we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.

94 citations

Proceedings ArticleDOI
12 May 2019
TL;DR: In this article, a cross-modal retrieval strategy was proposed to find the most relevant audio segment given a short video clip for audio-to-video synchronisation, where the objective is to find an audio segment that is relevant to the video.
Abstract: This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronisation. Here, we set up the problem as one of cross-modal retrieval, where the objective is to find the most relevant audio segment given a short video clip. The method builds on the recent advances in learning representations from cross-modal self-supervision. The main contributions of this paper are as follows: (1) we propose a new learning strategy where the embeddings are learnt via a multi-way matching problem, as opposed to a binary classification (matching or non-matching) problem as proposed by recent papers; (2) we demonstrate that performance of this method far exceeds the existing baselines on the synchronisation task; (3) we use the learnt embeddings for visual speech recognition in self-supervision, and show that the performance matches the representations learnt end-to-end in a fully-supervised manner.

94 citations

Proceedings ArticleDOI
17 Apr 2019
TL;DR: In this paper, an end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification.
Abstract: Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification. Adjustment of model architecture using a pre-training scheme can extract speaker embeddings, giving a significant improvement in performance. Additional objective functions simplify the process of extracting speaker embeddings by merging conventional two-phase processes: extracting utterance-level features such as i-vectors or x-vectors and the feature enhancement phase, e.g., linear discriminant analysis. Effective back-end classification models that suit the proposed speaker embedding are also explored. We propose an end-to-end system that comprises two deep neural networks, one front-end for utterance-level speaker embedding extraction and the other for back-end classification. Experiments conducted on the VoxCeleb1 dataset demonstrate that the proposed model achieves state-of-the-art performance among systems without data augmentation. The proposed system is also comparable to the state-of-the-art x-vector system that adopts data augmentation.

94 citations

Book ChapterDOI
23 Aug 2020
TL;DR: In this article, a self-supervised learning approach is proposed to transform a video into a set of discrete audio-visual objects using selfsupervised self-attention and optical flow to aggregate information over time.
Abstract: Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end, we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time. We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks: (a) multi-speaker sound source separation, (b) localizing and tracking speakers, (c) correcting misaligned audio-visual data, and (d) active speaker detection. Using our representation, these tasks can be solved entirely by training on unlabeled video, without the aid of object detectors. We also demonstrate the generality of our method by applying it to non-human speakers, including cartoons and puppets. Our model significantly outperforms other self-supervised approaches, and obtains performance competitive with methods that use supervised face detection.

87 citations


Authors

Showing all 4041 results

NameH-indexPapersCitations
Andrea Vedaldi8930563305
Sunghun Kim5111512994
Eric Gaussier412318203
Un Ju Jung39985696
Hyun-Soo Kim374215650
Gabriela Csurka3714510959
Nojun Kwak342346026
Young-Jin Park312573759
Sung Joo Kim311963078
Jae-Hoon Kim303235847
Jung-Ryul Lee292223322
Joon Son Chung28734900
Ok-Hwan Lee271632896
Diane Larlus27694722
Jung Goo Lee261421917
Network Information
Related Institutions (5)
Kyungpook National University
42.1K papers, 834.6K citations

80% related

Pusan National University
45K papers, 819.3K citations

80% related

Korea University
82.4K papers, 1.8M citations

80% related

Seoul National University
138.7K papers, 3.7M citations

79% related

Chungnam National University
32.1K papers, 543.3K citations

79% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20226
2021144
2020174
2019138
201882
201764