scispace - formally typeset
Search or ask a question
Author

Mingyu Kim

Bio: Mingyu Kim is an academic researcher from Stanford University. The author has contributed to research in topics: Virtual reality & Human–computer interaction. The author has an hindex of 1, co-authored 1 publications receiving 2517 citations.

Papers
More filters
Proceedings Article
28 Jun 2011
TL;DR: This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.
Abstract: Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning.

2,830 citations

Journal ArticleDOI
TL;DR: An interface optimized for the platform by adopting deep learning in an asymmetric virtual environment where virtual reality (VR) and augmented reality (AR) users participate together is designed and metaverse content for an experience environment and a survey experiment is created.
Abstract: In this study, we design an interface optimized for the platform by adopting deep learning in an asymmetric virtual environment where virtual reality (VR) and augmented reality (AR) users participate together. We also propose a novel experience environment called deep learning-based asymmetric virtual environment (DAVE) for immersive experiential metaverse content. First, VR users use their real hands to intuitively interact with the virtual environment and objects. A gesture interface is designed based on deep learning to directly link gestures to actions. AR users interact with virtual scenes, objects, and VR users via a touch-based input method in a mobile platform environment. A text interface is designed using deep learning to directly link handwritten text to actions. This study aims to propose a novel asymmetric virtual environment via an intuitive, easy, and fast interactive interface design as well as to create metaverse content for an experience environment and a survey experiment. This survey experiment is conducted with users to statistically analyze and investigate user interface satisfaction, user experience, and user presence in the experience environment.

7 citations

Journal ArticleDOI
TL;DR: In this paper , the authors developed virtual reality-visual exploration therapy (VR-VET) combining elements from the FOPR test and visual exploration therapy and examined its efficacy for Hemispatial neglect (HSN) rehabilitation following stroke.
Abstract: Background Hemispatial neglect (HSN) was diagnosed using a virtual reality-based test (FOPR test) that explores the field of perception (FOP) and field of regard (FOR). Here, we developed virtual reality-visual exploration therapy (VR-VET) combining elements from the FOPR test and visual exploration therapy (VET) and examined its efficacy for HSN rehabilitation following stroke. Methods Eleven participants were randomly assigned to different groups, training with VR-VET first then waiting without VR-VET training (TW), or vice versa (WT). The TW group completed 20 sessions of a VR-VET program using a head-mounted display followed by 4 weeks of waiting, while the WT group completed the opposite regimen. Clinical HSN measurements [line bisection test (LBT), star cancellation test (SCT), Catherine Bergego Scale (CBS), CBS perceptual-attentional (CBS-PA), and CBS motor-explanatory (CBS-ME)] and FOPR tests [response time (RT), success rate (SR), and head movement (HM) for both FOP and FOR] were assessed by blinded face-to-face assessments. Results Five and six participants were allocated to the TW and WT groups, respectively, and no dropout occurred throughout the study. VR-VET considerably improved LBT scores, FOR variables (FOR-RT, FOR-SR), FOP-LEFT variables (FOP-LEFT-RT, FOP-LEFT-SR), and FOR-LEFT variables (FOR-LEFT-RT, FOR-LEFT-SR) compared to waiting without VR-VET. Additionally, VR-VET extensively improved FOP-SR, CBS, and CBS-PA, where waiting failed to make a significant change. The VR-VET made more improvements in the left hemispace than in the right hemispace in FOP-RT, FOP-SR, FOR-RT, and FOR-SR. Conclusion The observed improvements in clinical assessments and FOPR tests represent the translatability of these improvements to real-world function and the multi-dimensional effects of VR-VET training. Clinical trial registration https://clinicaltrials.gov/ct2/show/NCT03463122, identifier NCT03463122.
Journal ArticleDOI
TL;DR: In this article , Park et al. proposed a method to solve the problem of the lack of resources in the South Korean market, which is a problem in the context of mobile computing.
Abstract: [목적] 발달장애 학생의 교육을 위해 에듀테크로서 증강현실을 적용하였을 시 효과성, AR 적용을 위한 전제 조건, 효용성 증진을 위한 방안 및 UDL 원리와의 접목 가능성에 대하여 알아본 후 특수교육 현장에서 에듀테크로서의 증강현실의 가능성에 대하여 탐색 후 제시하고자 한다. [방법] 증강현실을 적용한 경험이 있는 특수교사(급) 8인을 대상으로 전화로 심층인터뷰를 실시하였다. [결과] 첫째, 증강현실은 발달장애 학생들의 교수·학습 부분에서 자기주도적 학습 가능, 주의집중 유지, 협력활동 증진이 가능한 것으로 나타났다. 또한 학습의 전이가 가능하며, 즉각적 피드백 및 평가 활용에서 용이하다고 하였다. 각 교과와의 연계에 따른 학습 시 학생들의 학습에 도움을 주며, 현장학습 등을 위한 시뮬레이션으로 사용 가능하다고 하였다. 언어교육에서도 어휘 능력, 문장표현능력 증진에 효과적이라고 보았다. 둘째, 선행요건으로 증강현실을 적용할 수 있는 교사의 역량과 학생의 역량이 요구되며, 증강현실이 실현될 수 있는 환경 조성이 필요하다고 보고하였다. 셋째, 증강현실이 UDL의 원칙에 따라 어떻게 적용될 수 있는지를 살펴보았을 시 특수교사(급)은 각 원리에 따라 적용가능한 것으로 인식하고 있었다. 증강현실의 적용성 증진을 위해서는 교사 연수 및 교사가 직접 사용할 수 있는 기회 제공이 필요하다고 보았다. 또한 현장접근성을 높이기 위한 방안으로 사물과 연계할 수 있는 AR 코딩, 제작된 AR 코드를 모아 둔 온라인 자료실 개발 및 에어러블 형태의 스마트 기기 보급 및 교육과정 내용 중심의 AR개발이 필요하다고 보았다. [결론] 특수교사들은 에듀테크로서의 증강현실이 효용성이 있는 것으로 인식하고 있었다. 그러나 이를 적용하기 위해서는 교사 및 학생의 역량이 필요하며, 증강현실의 적용성을 높이기 위한 방안 또한 제시하였다. 증강현실은 UDL의 원리와 연계하여 적용 가능한 것으로 인식하였다.

Cited by
More filters
Posted Content
TL;DR: A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
Abstract: Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multi-kernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.

3,351 citations

Journal ArticleDOI
TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.
Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

3,097 citations

Journal ArticleDOI
TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

2,900 citations

Book
Li Deng1, Dong Yu1
12 Jun 2014
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

2,817 citations

Journal ArticleDOI
TL;DR: This work was supported in part by the Royal Society of the UK, the National Natural Science Foundation of China, and the Alexander von Humboldt Foundation of Germany.

2,404 citations