Home
/
Authors
/
Aditya Kanade

Author

Aditya Kanade

Other affiliations: Google, University of Pennsylvania, Indian Institute of Technology Bombay

Bio: Aditya Kanade is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Computer science & Concurrency. The author has an hindex of 18, co-authored 45 publications receiving 1254 citations. Previous affiliations of Aditya Kanade include Google & University of Pennsylvania.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2010
2009
2008
2007
2006
2005

Papers

PDF

Open Access

More filters

Proceedings Article•

DeepFix: Fixing Common C Language Errors by Deep Learning

[...]

Rahul Gupta¹, Soham Pal¹, Aditya Kanade¹, Shirish Shevade¹•Institutions (1)

Indian Institute of Science¹

12 Feb 2017

TL;DR: DeepFix is a multi-layered sequence-to-sequence neural network with attention which is trained to predict erroneous program locations along with the required correct statements and could fix 1881 programs completely and 1338 programs partially.

...read moreread less

Abstract: The problem of automatically fixing programming errors is a very active research topic in software engineering. This is a challenging problem as fixing even a single error may require analysis of the entire program. In practice, a number of errors arise due to programmer's inexperience with the programming language or lack of attention to detail. We call these common programming errors. These are analogous to grammatical errors in natural languages. Compilers detect such errors, but their error messages are usually inaccurate. In this work, we present an end-to-end solution, called DeepFix, that can fix multiple such errors in a program without relying on any external tool to locate or fix them. At the heart of DeepFix is a multi-layered sequence-to-sequence neural network with attention which is trained to predict erroneous program locations along with the required correct statements. On a set of 6971 erroneous C programs written by students for 93 programming tasks, DeepFix could fix 1881 (27%) programs completely and 1338 (19%) programs partially.

...read moreread less

415 citations

Posted Content•

Learning and Evaluating Contextual Embedding of Source Code

[...]

Aditya Kanade¹, Petros Maniatis², Gogul Balakrishnan², Kensen Shi²•Institutions (2)

Indian Institute of Science¹, Google²

21 Dec 2019-arXiv: Software Engineering

TL;DR: This paper curates a massive, deduplicated corpus of 7.4M Python files from GitHub, and creates an open-sourced benchmark that comprises five classification tasks and one program-repair task, akin to code-understanding tasks proposed in the literature before, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples.

...read moreread less

Abstract: Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come with the development of pre-trained contextual embeddings, such as BERT, which can be fine-tuned for downstream tasks with less labeled data and training budget, while achieving better accuracies. However, there is no attempt yet to obtain a high-quality contextual embedding of source code, and to evaluate it on multiple program-understanding tasks simultaneously; that is the gap that this paper aims to mitigate. Specifically, first, we curate a massive, deduplicated corpus of 7.4M Python files from GitHub, which we use to pre-train CuBERT, an open-sourced code-understanding BERT model; and, second, we create an open-sourced benchmark that comprises five classification tasks and one program-repair task, akin to code-understanding tasks proposed in the literature before. We fine-tune CuBERT on our benchmark tasks, and compare the resulting models to different variants of Word2Vec token embeddings, BiLSTM and Transformer models, as well as published state-of-the-art models, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples. Future work on source-code embedding can benefit from reusing our benchmark, and from comparing against CuBERT models as a strong baseline.

...read moreread less

145 citations

Proceedings Article•DOI•

Race detection for Android applications

[...]

Pallavi Maiya¹, Aditya Kanade¹, Rupak Majumdar•Institutions (1)

Indian Institute of Science¹

09 Jun 2014

TL;DR: The happens-before relation for Android applications is defined, and a dynamic race detection technique based on this relation is developed, which indicates that data races are prevalent in Android applications, and that DroidRacer is an effective tool to identify data races.

...read moreread less

Abstract: Programming environments for smartphones expose a concurrency model that combines multi-threading and asynchronous event-based dispatch. While this enables the development of efficient and feature-rich applications, unforeseen thread interleavings coupled with non-deterministic reorderings of asynchronous tasks can lead to subtle concurrency errors in the applications. In this paper, we formalize the concurrency semantics of the Android programming model. We further define the happens-before relation for Android applications, and develop a dynamic race detection technique based on this relation. Our relation generalizes the so far independently studied happens-before relations for multi-threaded programs and single-threaded event-driven programs. Additionally, our race detection technique uses a model of the Android runtime environment to reduce false positives. We have implemented a tool called DroidRacer. It generates execution traces by systematically testing Android applications and detects data races by computing the happens-before relation on the traces. We analyzed 15 Android applications including popular applications such as Facebook, Twitter and K-9 Mail. Our results indicate that data races are prevalent in Android applications, and that DroidRacer is an effective tool to identify data races.

...read moreread less

123 citations

Posted Content•

Neural Program Repair by Jointly Learning to Localize and Repair

[...]

Marko Vasic¹, Aditya Kanade², Petros Maniatis², David Bieber², Rishabh Singh² - Show less +1 more•Institutions (2)

University of Texas at Austin¹, Google²

03 Apr 2019-arXiv: Learning

TL;DR: It is beneficial to train a model that jointly and directly localizes and repairs variable-misuse bugs, and the experimental results show that the joint model significantly outperforms an enumerative solution that uses a pointer based model for repair alone.

...read moreread less

Abstract: Due to its potential to improve programmer productivity and software quality, automated program repair has been an active topic of research. Newer techniques harness neural networks to learn directly from examples of buggy programs and their fixes. In this work, we consider a recently identified class of bugs called variable-misuse bugs. The state-of-the-art solution for variable misuse enumerates potential fixes for all possible bug locations in a program, before selecting the best prediction. We show that it is beneficial to train a model that jointly and directly localizes and repairs variable-misuse bugs. We present multi-headed pointer networks for this purpose, with one head each for localization and repair. The experimental results show that the joint model significantly outperforms an enumerative solution that uses a pointer based model for repair alone.

...read moreread less

108 citations

Proceedings Article•DOI•

Symbolic analysis for improving simulation coverage of Simulink/Stateflow models

[...]

Rajeev Alur¹, Aditya Kanade¹, S. Ramesh, K. C. Shashidhar•Institutions (1)

University of Pennsylvania¹

19 Oct 2008

TL;DR: Aimed at verifying safety properties and improving simulation coverage for hybrid systems models of embedded control software, this work proposes a technique that combines numerical simulation and symbolic methods for computing state-sets that computes a set of initial states that are guaranteed to be equivalent to x.

...read moreread less

Abstract: Aimed at verifying safety properties and improving simulation coverage for hybrid systems models of embedded control software, we propose a technique that combines numerical simulation and symbolic methods for computing state-sets. We consider systems with linear dynamics described in the commercial modeling tool Simulink/Stateflow. Given an initial state x, and a discrete-time simulation trajectory, our method computes a set of initial states that are guaranteed to be equivalent to x, where two initial states are considered to be equivalent if the resulting simulation trajectories contain the same discrete components at each step of the simulation. We illustrate the benefits of our method on two case studies. One case study is a benchmark proposed in the literature for hybrid systems verification and another is a Simulink demo model from Mathworks.

...read moreread less

99 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

[...]

柴田知秀

15 Feb 2020

1,595 citations

Journal Article•

PaLM: Scaling Language Modeling with Pathways

[...]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Velu Prabhakaran, Emily Reif, Nan Du, B. C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Peng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, L Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Oliveira Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zong Tuan Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, M. Catasta, Jason Loh Seong Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeffrey Dean, Slav Petrov, Noah Fiedel - Show less +63 more

05 Apr 2022-arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-speciﬁc training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly eﬃcient training across multiple TPU Pods. We demonstrate continued beneﬁts of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the ﬁnetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A signiﬁcant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

...read moreread less

1,429 citations

Mathematical Methods Of Statistics

[...]

Claudia Biermann

01 Jan 2016

TL;DR: The mathematical methods of statistics is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.

...read moreread less

Abstract: Thank you for downloading mathematical methods of statistics. Maybe you have knowledge that, people have search numerous times for their favorite novels like this mathematical methods of statistics, but end up in infectious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they are facing with some infectious virus inside their laptop. mathematical methods of statistics is available in our book collection an online access to it is set as public so you can download it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the mathematical methods of statistics is universally compatible with any devices to read.

...read moreread less

878 citations

Posted Content•

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

[...]

Zhangyin Feng¹, Daya Guo¹, Duyu Tang¹, Nan Duan¹, Xiaocheng Feng², Ming Gong³, Linjun Shou³, Bing Qin³, Ting Liu³, Daxin Jiang³, Ming Zhou³ - Show less +7 more•Institutions (3)

Harbin Institute of Technology¹, Sun Yat-sen University², Microsoft³

19 Feb 2020-arXiv: Computation and Language

TL;DR: This work develops CodeBERT with Transformer-based neural architecture, and trains it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.

...read moreread less

Abstract: We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing.

...read moreread less

867 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse