Home
/
Authors
/
Myunggi Lee

Author

Myunggi Lee

Bio: Myunggi Lee is an academic researcher from Seoul National University. The author has contributed to research in topics: Deep learning & Software system. The author has an hindex of 5, co-authored 11 publications receiving 181 citations.

Topics: Deep learning, Software system, Optical flow, Overfitting, Software architecture ...read more

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Motion Feature Network: Fixed Motion Filter for Action Recognition

[...]

Myunggi Lee¹, Seungeui Lee¹, Sung Joon Son¹, Gyutae Park¹, Nojun Kwak¹ - Show less +1 more•Institutions (1)

Seoul National University¹

08 Sep 2018

TL;DR: MFNet as mentioned in this paper uses motion blocks to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end with only a small additional cost.

...read moreread less

Abstract: Spatio-temporal representations in frame sequences play an important role in the task of action recognition. Previously, a method of using optical flow as a temporal information in combination with a set of RGB images that contain spatial information has shown great performance enhancement in the action recognition tasks. However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework. In this paper, we propose MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end. The motion block can be attached to any existing CNN-based action recognition frameworks with only a small additional cost. We evaluated our network on two of the action recognition datasets (Jester and Something-Something) and achieved competitive performances for both datasets by training the networks from scratch.

...read moreread less

86 citations

Posted Content•

Motion Feature Network: Fixed Motion Filter for Action Recognition.

[...]

Myunggi Lee¹, Seungeui Lee¹, Sung Joon Son¹, Gyutae Park¹, Nojun Kwak¹ - Show less +1 more•Institutions (1)

Seoul National University¹

26 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end.

...read moreread less

77 citations

Proceedings Article•DOI•

Approach of Team SNU to the DARPA Robotics Challenge finals

[...]

Sanghyun Kim¹, Mingon Kim¹, Jimin Lee¹, Soonwook Hwang¹, Joonbo Chae¹, Beomyeong Park¹, Hyunbum Cho¹, Jaehoon Sim¹, Jaesug Jung¹, Hosang Lee¹, Seho Shin¹, Minsung Kim¹, Nojun Kwak¹, Yongjin Lee¹, Sangkuk Lee¹, Myunggi Lee¹, Sangyup Yi, Kyong-Sok K.C. Chang, Jaeheung Park¹ - Show less +15 more•Institutions (1)

Seoul National University¹

28 Dec 2015

TL;DR: The controllers for THORMANG were developed to consider stability as the first priority, because the humanoid robot for rescue should perform complex tasks in unexpected environments.

...read moreread less

Abstract: This paper presents the technical approaches including the system architecture and the controllers that have been used by Team SNU at the DARPA Robotics Challenge (DRC) Finals 2015. The platform THORMANG we used is a modular humanoid robot developed by ROBOTIS. On top of this platform, Team SNU developed the iris camera module and the end effector with passive palm in order to increase success rate of the tasks at the DRC Finals. Also, we developed the software architecture to operate the robot intuitively, in spite of degraded communication. The interface enables operator to select sensor data to be communicated during each task. These efforts on the hardware and the software reduce operation time of the tasks, and increase reliability of the robot. Finally, the controllers for THORMANG were developed to consider stability as the first priority, because the humanoid robot for rescue should perform complex tasks in unexpected environments. The proposed approaches were verified at the DRC Finals 2015, where Team SNU ranked 12th place out of 23 teams.

...read moreread less

25 citations

Journal Article•DOI•

Team SNU's Control Strategies for Enhancing a Robot's Capability: Lessons from the 2015 DARPA Robotics Challenge Finals

[...]

Sanghyun Kim¹, Mingon Kim¹, Jimin Lee¹, Soonwook Hwang¹, Joonbo Chae¹, Beomyeong Park¹, Hyunbum Cho¹, Jaehoon Sim¹, Jaesug Jung¹, Hosang Lee¹, Seho Shin¹, Minsung Kim¹, Wonje Choi¹, Yisoo Lee¹, Sumin Park¹, Jiyong Oh², Yongjin Lee³, Sangkuk Lee¹, Myunggi Lee¹, Sangyup Yi, Kyong-Sok K.C. Chang, Nojun Kwak¹, Jaeheung Park¹ - Show less +19 more•Institutions (3)

Seoul National University¹, Electronics and Telecommunications Research Institute², LG CNS³

01 Mar 2017-Journal of Field Robotics

TL;DR: This paper presents the technical approaches used and experimental results obtained by Team SNU (Seoul National University) at the 2015 DARPA Robotics Challenge (DRC) Finals and a number of lessons learned by analyzing the 2015 DRC Finals.

...read moreread less

Abstract: This paper presents the technical approaches used and experimental results obtained by Team SNU Seoul National University at the 2015 DARPA Robotics Challenge DRC Finals. Team SNU is one of the newly qualified teams, unlike 12 teams who previously participated in the December 2013 DRC Trials. The hardware platform THORMANG, which we used, has been developed by ROBOTIS. THORMANG is one of the smallest robots at the DRC Finals. Based on this platform, we focused on developing software architecture and controllers in order to perform complex tasks in disaster response situations and modifying hardware modules to maximize manipulability. Ensuring stability and modularization are two main keywords in the technical approaches of the architecture. We designed our interface and controllers to achieve a higher robustness level against disaster situations. Moreover, we concentrated on developing our software architecture by integrating a number of modules to eliminate software system complexity and programming errors. With these efforts on the hardware and software, we successfully finished the competition without falling, and we ranked 12th out of 23 teams. This paper is concluded with a number of lessons learned by analyzing the 2015 DRC Finals.

...read moreread less

18 citations

Posted Content•

StyleUV: Diverse and High-fidelity UV Map Generative Model.

[...]

Myunggi Lee, Wonwoong Cho, Moonheum Kim, David I. Inouye, Nojun Kwak - Show less +1 more

25 Nov 2020-arXiv: Graphics

TL;DR: A novel UV map generative model that learns to generate diverse and realistic synthetic UV maps without requiring high-quality UV maps for training is presented.

...read moreread less

Abstract: Reconstructing 3D human faces in the wild with the 3D Morphable Model (3DMM) has become popular in recent years. While most prior work focuses on estimating more robust and accurate geometry, relatively little attention has been paid to improving the quality of the texture model. Meanwhile, with the advent of Generative Adversarial Networks (GANs), there has been great progress in reconstructing realistic 2D images. Recent work demonstrates that GANs trained with abundant high-quality UV maps can produce high-fidelity textures superior to those produced by existing methods. However, acquiring such high-quality UV maps is difficult because they are expensive to acquire, requiring laborious processes to refine. In this work, we present a novel UV map generative model that learns to generate diverse and realistic synthetic UV maps without requiring high-quality UV maps for training. Our proposed framework can be trained solely with in-the-wild images (i.e., UV maps are not required) by leveraging a combination of GANs and a differentiable renderer. Both quantitative and qualitative evaluations demonstrate that our proposed texture model produces more diverse and higher fidelity textures compared to existing methods.

...read moreread less

8 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

Proceedings Article•DOI•

TSM: Temporal Shift Module for Efficient Video Understanding

[...]

Ji Lin¹, Chuang Gan¹, Song Han¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2019

TL;DR: Temporal Shift Module (TSM) as mentioned in this paper shifts part of the channels along the temporal dimension to facilitate information exchanged among neighboring frames, which can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters.

...read moreread less

Abstract: The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN’s complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: https://github. com/mit-han-lab/temporal-shift-module.

...read moreread less

892 citations

Posted Content•

TSM: Temporal Shift Module for Efficient Video Understanding

[...]

Ji Lin¹, Chuang Gan¹, Song Han¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A generic and effective Temporal Shift Module (TSM) that can achieve the performance of 3D CNN but maintain 2D CNN’s complexity and is extended to online setting, which enables real-time low-latency online video recognition and video object detection.

...read moreread less

Abstract: The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: this https URL.

...read moreread less

721 citations

Proceedings Article•DOI•

Video Classification With Channel-Separated Convolutional Networks

[...]

Du Tran¹, Heng Wang¹, Matt Feiszli¹, Lorenzo Torresani¹•Institutions (1)

Facebook¹

04 Apr 2019

TL;DR: It is empirically demonstrated that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks, and this leads to an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate.

...read moreread less

Abstract: Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks. This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate. On Sports1M and Kinetics, our CSNs are comparable with or better than the state-of-the-art while being 2-3 times more efficient.

...read moreread less

505 citations

Proceedings Article•DOI•

X3D: Expanding Architectures for Efficient Video Recognition

[...]

Christoph Feichtenhofer¹•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth, finding that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters.

...read moreread less

Abstract: This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code is available at: https://github.com/facebookresearch/SlowFast.

...read moreread less

392 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Collapse