Home
/
Authors
/
Takayuki Okatani

Author

Takayuki Okatani

Bio: Takayuki Okatani is an academic researcher from Tohoku University. The author has contributed to research in topics: Image restoration & Convolutional neural network. The author has an hindex of 26, co-authored 167 publications receiving 2695 citations. Previous affiliations of Takayuki Okatani include University of Tokyo.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries

[...]

Junjie Hu, Mete Ozay, Yan Zhang, Takayuki Okatani

04 Mar 2019

TL;DR: Zhang et al. as mentioned in this paper proposed an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module, which achieved higher accuracy than the current state-of-the-art.

...read moreread less

Abstract: This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper, toward more accurate estimation with a focus on depth maps with higher spatial resolution, we propose two improvements to existing approaches. One is about the strategy of fusing features extracted at different scales, for which we propose an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module. The other is about loss functions for measuring inference errors used in training. We show that three loss terms, which measure errors in depth, gradients and surface normals, respectively, contribute to improvement of accuracy in an complementary fashion. Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.

...read moreread less

256 citations

Proceedings Article•DOI•

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering

[...]

Duy-Kien Nguyen, Takayuki Okatani

18 Jun 2018

TL;DR: In this article, an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers in visual question answering (VQA).

...read moreread less

Abstract: A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers. Specifically, we present a simple architecture that is fully symmetric between visual and language representations, in which each question word attends on image regions and each image region attends on question words. It can be stacked to form a hierarchy for multi-step interactions between an image-question pair. We show through experiments that the proposed architecture achieves a new state-of-the-art on VQA and VQA 2.0 despite its small size. We also present qualitative evaluation, demonstrating how the proposed attention mechanism can generate reasonable attention maps on images and questions, which leads to the correct answer prediction.

...read moreread less

206 citations

Journal Article•DOI•

Shape Reconstruction from an Endoscope Image by Shape from Shading Technique for a Point Light Source at the Projection Center

[...]

Takayuki Okatani¹, Koichiro Deguchi¹•Institutions (1)

University of Tokyo¹

01 May 1997-Computer Vision and Image Understanding

TL;DR: Experimental results for real medical endoscope images of the stomach wall show the feasibility of the Kimmel?Bruckstein algorithm of shape from shading to the endoscope image and its promising availability for morphological analyses of tumors on human inner organs.

...read moreread less

195 citations

Posted Content•

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

[...]

Duy-Kien Nguyen, Takayuki Okatani

03 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a simple architecture that is fully symmetric between visual and language representations, in which each question word attends on image regions and each image region attends on question words, and shows through experiments that the proposed architecture achieves a new state-of-the-art on V QA and VQA 2.0 despite its small size.

...read moreread less

186 citations

Proceedings Article•DOI•

Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration

[...]

Xing Liu¹, Masanori Suganuma¹, Zhun Sun, Takayuki Okatani¹•Institutions (1)

Tohoku University¹

15 Jun 2019

TL;DR: In this article, the authors propose a novel style of residual connections dubbed "dual residual connection", which exploits the potential of paired operations, e.g., up-and down-sampling or convolution with large-and small-size kernels.

...read moreread less

Abstract: In this paper, we study design of deep neural networks for tasks of image restoration. We propose a novel style of residual connections dubbed "dual residual connection", which exploits the potential of paired operations, e.g., up- and down-sampling or convolution with large- and small-size kernels. We design a modular block implementing this connection style; it is equipped with two containers to which arbitrary paired operations are inserted. Adopting the "unraveled" view of the residual networks proposed by Veit et al., we point out that a stack of the proposed modular blocks allows the first operation in a block interact with the second operation in any subsequent blocks. Specifying the two operations in each of the stacked blocks, we build a complete network for each individual task of image restoration. We experimentally evaluate the proposed approach on five image restoration tasks using nine datasets. The results show that the proposed networks with properly chosen paired operations outperform previous methods on almost all of the tasks and datasets.

...read moreread less

173 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Proceedings Article•

Robot vision

[...]

Y.J. Tejwani¹•Institutions (1)

Marquette University¹

01 Jan 1989

TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.

...read moreread less

Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >

...read moreread less

2,000 citations

Journal Article•DOI•

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

[...]

Naveed Akhtar¹, Ajmal Mian¹•Institutions (1)

University of Western Australia¹

19 Feb 2018-IEEE Access

TL;DR: A comprehensive survey on adversarial attacks on deep learning in computer vision can be found in this paper, where the authors review the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.

...read moreread less

Abstract: Deep learning is at the heart of the current rise of artificial intelligence. In the field of computer vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas, deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently led to a large influx of contributions in this direction. This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.

...read moreread less

1,542 citations

Posted Content•

Neural Architecture Search: A Survey

[...]

Thomas Elsken¹, Jan Hendrik Metzen¹, Frank Hutter²•Institutions (2)

Bosch¹, University of Freiburg²

16 Aug 2018-arXiv: Machine Learning

TL;DR: An overview of existing work in this field of research is provided and neural architecture search methods are categorized according to three dimensions: search space, search strategy, and performance estimation strategy.

...read moreread less

Abstract: Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.

...read moreread less

1,362 citations

Quantitative Analysis of Culture Using Millions of Digitized Books

[...]

Björn-Olav Dozo

17 Dec 2010

TL;DR: The authors survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, using a corpus of digitized texts containing about 4% of all books ever printed.

...read moreread less

Abstract: L'article, publie dans Science, sur une des premieres utilisations analytiques de Google Books, fondee sur les n-grammes (Google Ngrams) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can ...

...read moreread less

735 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse