Home
/
Authors
/
Lars Buitinck

Author

Lars Buitinck

Bio: Lars Buitinck is an academic researcher from University of Amsterdam. The author has contributed to research in topics: Python (programming language) & Application programming interface. The author has an hindex of 7, co-authored 9 publications receiving 1207 citations.

Papers

PDF

Open Access

More filters

Posted Content•

API design for machine learning software: experiences from the scikit-learn project

[...]

Lars Buitinck, Gilles Louppe, Mathieu Blondel¹, Fabian Pedregosa², Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort², Jaques Grobler², Robert Layton, Jake Vanderplas³, Arnaud Joly, Brian Holt⁴, Gaël Varoquaux² - Show less +11 more•Institutions (4)

Kobe University¹, French Institute for Research in Computer Science and Automation², University of Washington³, Samsung⁴

01 Sep 2013-arXiv: Learning

TL;DR: Scikit-learn as mentioned in this paper is a machine learning library written in Python, which is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts.

...read moreread less

Abstract: Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.

...read moreread less

1,122 citations

Journal Article•DOI•

Scikit-learn: Machine Learning Without Learning the Machinery

[...]

Gaël Varoquaux¹, Lars Buitinck², Gilles Louppe³, Olivier Grisel¹, Fabian Pedregosa¹, A. Mueller⁴ - Show less +2 more•Institutions (4)

French Institute for Research in Computer Science and Automation¹, University of Amsterdam², University of Liège³, Amazon.com⁴

01 Jun 2015

TL;DR: A quick introduction to scikit-learn as well as to machine-learning basics are given.

...read moreread less

Abstract: Machine learning is a pervasive development at the intersection of statistics and computer science. While it can benefit many data-related applications, the technical nature of the research literature and the corresponding algorithms slows down its adoption. Scikit-learn is an open-source software project that aims at making machine learning accessible to all, whether it be in academia or in industry. It benefits from the general-purpose Python language, which is both broadly adopted in the scientific world, and supported by a thriving ecosystem of contributors. Here we give a quick introduction to scikit-learn as well as to machine-learning basics.

...read moreread less

391 citations

Proceedings Article•

API design for machine learning software: experiences from the scikit-learn project

[...]

Lars Buitinck, Gilles Louppe, Mathieu Blondel¹, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas², Arnaud Joly, Brian Holt³, Gaël Varoquaux - Show less +11 more•Institutions (3)

Kobe University¹, University of Washington², Samsung³

23 Sep 2013

TL;DR: Scikit-learn as discussed by the authors is a machine learning library written in Python, which is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts.

...read moreread less

337 citations

Book Chapter•DOI•

Multi-emotion Detection in User-Generated Reviews

[...]

Lars Buitinck¹, Jesse van Amerongen¹, Ed S. Tan¹, Maarten de Rijke¹•Institutions (1)

University of Amsterdam¹

29 Mar 2015

TL;DR: A new dataset of user-generated movie reviews annotated for emotional expressions is described, and two algorithms that can detect multiple emotions in each sentence of these reviews are experimentally validated.

...read moreread less

Abstract: Expressions of emotion abound in user-generated content, whether it be in blogs, reviews, or on social media. Much work has been devoted to detecting and classifying these emotions, but little of it has acknowledged the fact that emotionally charged text may express multiple emotions at the same time. We describe a new dataset of user-generated movie reviews annotated for emotional expressions, and experimentally validate two algorithms that can detect multiple emotions in each sentence of these reviews.

...read moreread less

16 citations

Proceedings Article•DOI•

Linking the kingdom: enriched access to a historiographical text

[...]

Victor de Boer¹, Johan van Doornik², Lars Buitinck², Maarten Marx², Tim Veken³, Kees Ribbens³ - Show less +2 more•Institutions (3)

VU University Amsterdam¹, University of Amsterdam², NIOD Institute for War, Holocaust and Genocide Studies³

23 Jun 2013

TL;DR: This paper presents a method for connecting a historiographical text to the Linked Data cloud, and presents two sources of structured knowledge that link to individual text sources, retrievable on the Web of Data.

...read moreread less

Abstract: Digital history is a branch of digital humanities concerned using ICT to improve study of history. Linked Data provides a way of effective enriched digital access to scientific texts about history (historiographies). In this paper, we present a method for connecting a historiographical text to the Linked Data cloud. We present the method and tools that we use in each of the method's steps. We focus on one extensive case study: the enriched access of an important work of Dutch World War II historiography "Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog". We describe the digitization and present two sources of structured knowledge that link to individual text sources, retrievable on the Web of Data. The first is the manually constructed and highly curated "Back of the Book Index". The second is a list of extracted Named Entities. We compare both structured sources as stepping stones to the Web of Data and present a number of use cases relevant for both historical researchers as well as for the general public.

...read moreread less

15 citations

Cited by

PDF

Open Access

More filters

Journal Article•

MLlib: machine learning in apache spark

[...]

Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks¹, Shivaram Venkataraman¹, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen², Doris Xin³, Reynold Xin, Michael J. Franklin¹, Reza Bosagh Zadeh⁴, Matei Zaharia⁵, Ameet Talwalkar⁶ - Show less +12 more•Institutions (6)

University of California, Berkeley¹, Cloudera², Urbana University³, Stanford University⁴, Massachusetts Institute of Technology⁵, University of California, Los Angeles⁶

01 Jan 2016-Journal of Machine Learning Research

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.

...read moreread less

Abstract: Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLLIB provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLLIB supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLLIB has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

...read moreread less

1,551 citations

Proceedings Article•DOI•

Auto-Keras: An Efficient Neural Architecture Search System

[...]

Haifeng Jin¹, Qingquan Song¹, Xia Hu¹•Institutions (1)

Texas A&M University¹

25 Jul 2019

TL;DR: In this article, the authors propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search, which keeps the functionality of a neural network while changing its neural architecture, enabling more efficient training during the search.

...read moreread less

Abstract: Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Extensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The code and documentation are available at https://autokeras.com. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.

...read moreread less

563 citations

Journal Article•DOI•

Scikit-learn: Machine Learning Without Learning the Machinery

[...]

Gaël Varoquaux¹, Lars Buitinck², Gilles Louppe³, Olivier Grisel¹, Fabian Pedregosa¹, A. Mueller⁴ - Show less +2 more•Institutions (4)

French Institute for Research in Computer Science and Automation¹, University of Amsterdam², University of Liège³, Amazon.com⁴

01 Jun 2015

TL;DR: A quick introduction to scikit-learn as well as to machine-learning basics are given.

...read moreread less

391 citations

Journal Article•DOI•

CatBoost for big data: an interdisciplinary review

[...]

John Hancock¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

19 Aug 2020-Journal of Big Data

TL;DR: This survey takes an interdisciplinary approach to cover studies related to CatBoost in a single work, and provides researchers an in-depth understanding to help clarify proper application of Cat boost in solving problems.

...read moreread less

Abstract: Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

...read moreread less

247 citations

Journal Article•DOI•

Robust Smartphone App Identification via Encrypted Network Traffic Analysis

[...]

Vincent F. Taylor¹, Riccardo Spolaor², Mauro Conti², Ivan Martinovic¹•Institutions (2)

University of Oxford¹, University of Padua²

01 Jan 2018-IEEE Transactions on Information Forensics and Security

TL;DR: In this paper, a passive eavesdropper can feasibly identify smartphone apps by fingerprinting the network traffic that they send, which can reveal much information about a user, such as their medical conditions, sexual orientation or religious beliefs.

...read moreread less

Abstract: The apps installed on a smartphone can reveal much information about a user, such as their medical conditions, sexual orientation, or religious beliefs. In addition, the presence or absence of particular apps on a smartphone can inform an adversary, who is intent on attacking the device. In this paper, we show that a passive eavesdropper can feasibly identify smartphone apps by fingerprinting the network traffic that they send. Although SSL/TLS hides the payload of packets, side-channel data, such as packet size and direction is still leaked from encrypted connections. We use machine learning techniques to identify smartphone apps from this side-channel data. In addition to merely fingerprinting and identifying smartphone apps, we investigate how app fingerprints change over time, across devices, and across different versions of apps. In addition, we introduce strategies that enable our app classification system to identify and mitigate the effect of ambiguous traffic, i.e., traffic in common among apps, such as advertisement traffic. We fully implemented a framework to fingerprint apps and ran a thorough set of experiments to assess its performance. We fingerprinted 110 of the most popular apps in the Google Play Store and were able to identify them six months later with up to 96% accuracy. Additionally, we show that app fingerprints persist to varying extents across devices and app versions.

...read moreread less

225 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse