Home
/
Authors
/
Thibaud Lutellier

Author

Thibaud Lutellier

Other affiliations: Queen's University, Télécom Saint-Étienne

Bio: Thibaud Lutellier is an academic researcher from University of Waterloo. The author has contributed to research in topics: Computer science & Software quality. The author has an hindex of 9, co-authored 12 publications receiving 263 citations. Previous affiliations of Thibaud Lutellier include Queen's University & Télécom Saint-Étienne.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

CoCoNuT: combining context-aware neural translation models using ensemble for program repair

[...]

Thibaud Lutellier¹, Hung Viet Pham¹, Lawrence Pang¹, Yitong Li¹, Moshi Wei¹, Lin Tan² - Show less +2 more•Institutions (2)

University of Waterloo¹, Purdue University²

18 Jul 2020

TL;DR: A new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages.

...read moreread less

Abstract: Automated generate-and-validate (GV) program repair techniques (APR) typically rely on hard-coded rules, thus only fixing bugs following specific fix patterns. These rules require a significant amount of manual effort to discover and it is hard to adapt these rules to different programming languages. To address these challenges, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs), since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuT takes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs. Our evaluation on six popular benchmarks for four programming languages (Java, C, Python, and JavaScript) shows that CoCoNuT correctly fixes (i.e., the first generated patch is semantically equivalent to the developer’s patch) 509 bugs, including 309 bugs that are fixed by none of the 27 techniques with which we compare.

...read moreread less

176 citations

Proceedings Article•DOI•

CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

[...]

Nan Jiang¹, Thibaud Lutellier², Lin Tan¹•Institutions (2)

Purdue University¹, University of Waterloo²

22 May 2021

TL;DR: CURE as mentioned in this paper pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task, and uses a subword tokenization technique to generate a smaller search space that contains more correct fixes.

...read moreread less

Abstract: Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.

...read moreread less

130 citations

Proceedings Article•DOI•

CRADLE: Cr oss-backend v a lidation to D etect and L ocalize bugs in D e ep learning libraries

[...]

Hung Viet Pham¹, Thibaud Lutellier¹, Weizhen Qi², Lin Tan³•Institutions (3)

University of Waterloo¹, University of Science and Technology of China², Purdue University³

25 May 2019

TL;DR: This work proposes CRADLE, a new approach that performs cross-implementation inconsistency checking to detect bugs in DL libraries, and leverages anomaly propagation tracking and analysis to localize faulty functions inDL libraries that cause the bugs.

...read moreread less

Abstract: Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.

...read moreread less

120 citations

Proceedings Article•DOI•

CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

[...]

Nan Jiang¹, Thibaud Lutellier², Lin Tan¹•Institutions (2)

Purdue University¹, University of Waterloo²

26 Feb 2021-arXiv: Software Engineering

...read moreread less

Abstract: Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.

...read moreread less

108 citations

Proceedings Article•DOI•

Problems and opportunities in training deep learning software systems: an analysis of variance

[...]

Hung Viet Pham¹, Shangshu Qian², Jiannan Wang², Thibaud Lutellier¹, Jonathan Rosenthal², Lin Tan², Yaoliang Yu¹, Nachiappan Nagappan³ - Show less +4 more•Institutions (3)

University of Waterloo¹, Purdue University², Microsoft³

21 Dec 2020

TL;DR: In this paper, the authors study the variance of deep learning systems and the awareness of this variance among researchers and practitioners, and find that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance in their DL approaches.

...read moreread less

Abstract: Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy and training efficiency. Hence, multiple identical training runs (e.g., identical training data, algorithm, and network) produce different models with different accuracies and training times. In addition to these algorithmic factors, DL libraries (e.g., TensorFlow and cuDNN) introduce additional variance (referred to as implementation-level variance) due to parallelism, optimization, and floating-point computation. This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners. Our experiments on three datasets with six popular networks show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%, the per-class accuracy difference to be up to 52.4%, and the training time difference to be up to 145.3%. All core libraries (TensorFlow, CNTK, and Theano) and low-level libraries (e.g., cuDNN) exhibit implementation-level variance across all evaluated versions. Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. In addition, our literature survey shows that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance of their DL approaches. This paper raises awareness of DL variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.

...read moreread less

75 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Unified Pre-training for Program Understanding and Generation

[...]

Wasi Uddin Ahmad¹, Saikat Chakraborty², Baishakhi Ray², Kai-Wei Chang¹•Institutions (2)

University of California, Los Angeles¹, Columbia University²

01 Jun 2021

TL;DR: Analysis reveals that PLBART learns program syntax, style, logical flow, and style that are crucial to program semantics and thus excels even with limited annotations, and outperforms or rivals state-of-the-art models.

...read moreread less

Abstract: Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., “if“ block inside an “else“ block is equivalent to “else if“ block) that are crucial to program semantics and thus excels even with limited annotations.

...read moreread less

318 citations

Posted Content•

Machine Learning Testing: Survey, Landscapes and Horizons

[...]

Jie Zhang¹, Mark Harman¹, Lei Ma², Yang Liu³•Institutions (3)

University College London¹, Kyushu University², Nanyang Technological University³

19 Jun 2019-arXiv: Learning

TL;DR: This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research, covering 144 papers on testing properties, testing components, and application scenarios.

...read moreread less

Abstract: This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.

...read moreread less

225 citations

Proceedings Article•DOI•

CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

[...]

Nan Jiang¹, Thibaud Lutellier², Lin Tan¹•Institutions (2)

Purdue University¹, University of Waterloo²

22 May 2021

...read moreread less

130 citations

Proceedings Article•DOI•

CRADLE: Cr oss-backend v a lidation to D etect and L ocalize bugs in D e ep learning libraries

[...]

Hung Viet Pham¹, Thibaud Lutellier¹, Weizhen Qi², Lin Tan³•Institutions (3)

University of Waterloo¹, University of Science and Technology of China², Purdue University³

25 May 2019

...read moreread less

120 citations

Journal Article•DOI•

Machine learning for medical imaging: methodological failures and recommendations for the future

[...]

Gaël Varoquaux, Veronika Cheplygina

12 Apr 2022-npj digital medicine

TL;DR: In this article , the authors review roadblocks to developing and assessing methods in computer analysis of medical images and provide recommendations on how to further address these problems in the future, and also discuss on-going efforts to counteract these problems.

...read moreread less

Abstract: Research in computer analysis of medical images bears many promises to improve patients' health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.

...read moreread less

114 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

Collapse