Home
/
Authors
/
Akihiro Sugimoto

Author

Akihiro Sugimoto

Other affiliations: Kyoto University, Bosch, University College London ...read more

Bio: Akihiro Sugimoto is an academic researcher from National Institute of Informatics. The author has contributed to research in topics: Image registration & Image segmentation. The author has an hindex of 24, co-authored 174 publications receiving 1870 citations. Previous affiliations of Akihiro Sugimoto include Kyoto University & Bosch.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2000
1999
1998
1997
1996
1994
1993

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Fast unsupervised ego-action learning for first-person sports videos

[...]

Kris M. Kitani, Takahiro Okabe¹, Yoichi Sato¹, Akihiro Sugimoto²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

20 Jun 2011

TL;DR: This work addresses the novel task of discovering first-person action categories (which it is called ego-actions) which can be useful for such tasks as video indexing and retrieval and investigates the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content.

...read moreread less

Abstract: Portable high-quality sports cameras (e.g. head or helmet mounted) built for recording dynamic first-person video footage are becoming a common item among many sports enthusiasts. We address the novel task of discovering first-person action categories (which we call ego-actions) which can be useful for such tasks as video indexing and retrieval. In order to learn ego-action categories, we investigate the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content. Our approach assumes a completely unsupervised scenario, where labeled training videos are not available, videos are not pre-segmented and the number of ego-action categories are unknown. In our proposed framework we show that a stacked Dirichlet process mixture model can be used to automatically learn a motion histogram codebook and the set of ego-action categories. We quantitatively evaluate our approach on both in-house and public YouTube videos and demonstrate robust ego-action categorization across several sports genres. Comparative analysis shows that our approach outperforms other state-of-the-art topic models with respect to both classification accuracy and computational speed. Preliminary results indicate that on average, the categorical content of a 10 minute video sequence can be indexed in under 5 seconds.

...read moreread less

266 citations

Journal Article•DOI•

Anabranch network for camouflaged object segmentation

[...]

Trung-Nghia Le¹, Tam V. Nguyen², Zhongliang Nie², Minh-Triet Tran, Akihiro Sugimoto³ - Show less +1 more•Institutions (3)

Graduate University for Advanced Studies¹, University of Dayton², National Institute of Informatics³

01 Jul 2019-Computer Vision and Image Understanding

TL;DR: This paper proposes a general end-to-end network, called the Anabranch Network, that leverages both classification and segmentation tasks and possesses the second branch for classification to predict the probability of containing camouflaged object(s) in an image.

...read moreread less

200 citations

Proceedings Article•DOI•

Using individuality to track individuals: Clustering individual trajectories in crowds using local appearance and frequency trait

[...]

Daisuke Sugimura¹, Kris M. Kitani², Takahiro Okabe¹, Yoichi Sato¹, Akihiro Sugimoto³ - Show less +1 more•Institutions (3)

University of Tokyo¹, University of Electro-Communications², National Institute of Informatics³

01 Sep 2009

TL;DR: The key novelty of the method is to make use of a person's individuality, that is, the gait features and the temporal consistency of local appearance to track each individual in a crowd.

...read moreread less

Abstract: In this work, we propose a method for tracking individuals in crowds. Our method is based on a trajectory-based clustering approach that groups trajectories of image features that belong to the same person. The key novelty of our method is to make use of a person's individuality, that is, the gait features and the temporal consistency of local appearance to track each individual in a crowd. Gait features in the frequency domain have been shown to be an effective biometric cue in discriminating between individuals, and our method uses such features for tracking people in crowds for the first time. Unlike existing trajectory-based tracking methods, our method evaluates the dissimilarity of trajectories with respect to a group of three adjacent trajectories. In this way, we incorporate the temporal consistency of local patch appearance to differentiate trajectories of multiple people moving in close proximity. Our experiments show that the use of gait features and the temporal consistency of local appearance contributes to significant performance improvement in tracking people in crowded scenes.

...read moreread less

73 citations

Journal Article•DOI•

Video Salient Object Detection Using Spatiotemporal Deep Features

[...]

Trung-Nghia Le¹, Akihiro Sugimoto²•Institutions (2)

Graduate University for Advanced Studies¹, National Institute of Informatics²

22 Jun 2018-IEEE Transactions on Image Processing

TL;DR: The proposed method first segments an input video into multiple scales and then computes a saliency map at each scale level using STD features with STCRF, a new spatiotemporal conditional random field to compute saliency from STD features.

...read moreread less

Abstract: This paper presents a method for detecting salient objects in videos, where temporal information in addition to spatial information is fully taken into account. Following recent reports on the advantage of deep features over conventional handcrafted features, we propose a new set of spatiotemporal deep (STD) features that utilize local and global contexts over frames. We also propose new spatiotemporal conditional random field (STCRF) to compute saliency from STD features. STCRF is our extension of CRF to the temporal domain and describes the relationships among neighboring regions both in a frame and over frames. STCRF leads to temporally consistent saliency maps over frames, contributing to accurate detection of salient objects’ boundaries and noise reduction during detection. Our proposed method first segments an input video into multiple scales and then computes a saliency map at each scale level using STD features with STCRF. The final saliency map is computed by fusing saliency maps at different scale levels. Our experiments, using publicly available benchmark datasets, confirm that the proposed method significantly outperforms the state-of-the-art methods. We also applied our saliency computation to the video object segmentation task, showing that our method outperforms existing video object segmentation methods.

...read moreread less

66 citations

Proceedings Article•DOI•

Saliency-based image editing for guiding visual attention

[...]

Aiko Hagiwara¹, Akihiro Sugimoto², Kazuhiko Kawamoto¹•Institutions (2)

Chiba University¹, National Institute of Informatics²

18 Sep 2011

TL;DR: A method for editing an image, when given a region in the image, to synthesize the image in which the region is most salient, and results confirm that the image editing method naturally draws the human visual attention toward the authors' specified region.

...read moreread less

Abstract: The most important part of an information system that assists human activities is a natural interface with human beings. Gaze information strongly reflects the human interest or their attention, and thus, a gaze-based interface is promising for future usage. In particular, if we can smoothly guide the user's visual attention toward a target without interrupting their current visual attention, the usefulness of the gaze-based interface will be highly enhanced. To realize such an interface, this paper proposes a method for editing an image, when given a region in the image, to synthesize the image in which the region is most salient. Our method first computes a saliency map of a given image and then iteratively adjusts the intensity and color until the saliency inside the region becomes the highest for the entire image. Experimental results confirm that our image editing method naturally draws the human visual attention toward our specified region.

...read moreread less

66 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

[신간의 별자리x] 우리/미술, 그리고 ‘슬픔의 박물관’

[...]

이화영

01 Jan 2015

12,972 citations

Journal Article•DOI•

Content-based image retrieval at the end of the early years

[...]

Arnold W. M. Smeulders¹, Marcel Worring¹, Simone Santini², Amarnath Gupta², Ramesh Jain - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of California, San Diego²

01 Dec 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

...read moreread less

6,447 citations

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

[...]

David Forsyth, Jean Ponce

01 Jan 2004

TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.

...read moreread less

Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

...read moreread less

3,627 citations

Proceedings Article•DOI•

Social LSTM: Human Trajectory Prediction in Crowded Spaces

[...]

Alexandre Alahi¹, Kratarth Goel¹, Vignesh Ramanathan¹, Alexandre Robicquet¹, Li Fei-Fei¹, Silvio Savarese¹ - Show less +2 more•Institutions (1)

Stanford University¹

27 Jun 2016

TL;DR: This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.

...read moreread less

Abstract: Pedestrians follow different trajectories to avoid obstacles and accommodate fellow pedestrians. Any autonomous vehicle navigating such a scene should be able to foresee the future positions of pedestrians and accordingly adjust its path to avoid collisions. This problem of trajectory prediction can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions. Following the recent success of Recurrent Neural Network (RNN) models for sequence prediction tasks, we propose an LSTM model which can learn general human movement and predict their future trajectories. This is in contrast to traditional approaches which use hand-crafted functions such as Social forces. We demonstrate the performance of our method on several public datasets. Our model outperforms state-of-the-art methods on some of these datasets. We also analyze the trajectories predicted by our model to demonstrate the motion behaviour learned by our model.

...read moreread less

2,587 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse