Author

A. Mueller

Bio: A. Mueller is an academic researcher from Amazon.com. The author has an hindex of 1, co-authored 1 publications receiving 223 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Scikit-learn: Machine Learning Without Learning the Machinery

[...]

Gaël Varoquaux¹, Lars Buitinck², Gilles Louppe³, Olivier Grisel¹, Fabian Pedregosa¹, A. Mueller⁴ - Show less +2 more•Institutions (4)

French Institute for Research in Computer Science and Automation¹, University of Amsterdam², University of Liège³, Amazon.com⁴

01 Jun 2015

TL;DR: A quick introduction to scikit-learn as well as to machine-learning basics are given.

...read moreread less

Abstract: Machine learning is a pervasive development at the intersection of statistics and computer science. While it can benefit many data-related applications, the technical nature of the research literature and the corresponding algorithms slows down its adoption. Scikit-learn is an open-source software project that aims at making machine learning accessible to all, whether it be in academia or in industry. It benefits from the general-purpose Python language, which is both broadly adopted in the scientific world, and supported by a thriving ecosystem of contributors. Here we give a quick introduction to scikit-learn as well as to machine-learning basics.

...read moreread less

391 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Image-Based malware classification using ensemble of CNN architectures (IMCEC)

[...]

Danish Vasan¹, Danish Vasan², Mamoun Alazab³, Sobia Wassan⁴, Sobia Wassan⁵, Babak Safaei⁶, Qin Zheng¹ - Show less +3 more•Institutions (6)

Tsinghua University¹, Isra University², Charles Darwin University³, University of Sindh⁴, Nanjing University⁵, Eastern Mediterranean University⁶

01 May 2020-Computers & Security

TL;DR: A novel ensemble convolutional neural networks (CNNs) based architecture for effective detection of both packed and unpacked malware, named Image-based Malware Classification using Ensemble of CNNs (IMCEC).

...read moreread less

221 citations

Journal Article•DOI•

A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting

[...]

Mohamed Massaoudi¹, Mohamed Massaoudi², Shady S. Refaat², Ines Chihi³, Mohamed Trabelsi, Fakhreddine S. Oueslati¹, Haitham Abu-Rub² - Show less +3 more•Institutions (3)

Carthage College¹, Texas A&M University at Qatar², Tunis El Manar University³

01 Jan 2021-Energy

TL;DR: A novel stacking ensemble-based algorithm is proposed that copes with the stochastic variations of the load demand using a stacked generalization approach and is validated using two datasets from different locations: Malaysia and New England.

...read moreread less

145 citations

Posted Content•

sktime: A Unified Interface for Machine Learning with Time Series.

[...]

Markus Löning, Anthony J. Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, Franz J. Király - Show less +2 more

17 Sep 2019-arXiv: Learning

TL;DR: The main rationale for creating a unified interface, including reduction, as well as the design of sktime's core API, are discussed, supported by a clear overview of common time series tasks and reduction approaches.

...read moreread less

Abstract: We present sktime -- a new scikit-learn compatible Python library with a unified interface for machine learning with time series. Time series data gives rise to various distinct but closely related learning tasks, such as forecasting and time series classification, many of which can be solved by reducing them to related simpler tasks. We discuss the main rationale for creating a unified interface, including reduction, as well as the design of sktime's core API, supported by a clear overview of common time series tasks and reduction approaches.

...read moreread less

111 citations

Journal Article•DOI•

SMILES-based deep generative scaffold decorator for de-novo drug design

[...]

Josep Arús-Pous¹, Josep Arús-Pous², Atanas Patronov¹, Esben Jannik Bjerrum¹, Christian Tyrchan¹, Jean-Louis Reymond², Hongming Chen, Ola Engkvist¹ - Show less +4 more•Institutions (2)

AstraZeneca¹, University of Bern²

29 May 2020-Journal of Cheminformatics

TL;DR: A new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set and serves as a data augmentation technique and is readily coupled with randomized SMilES to obtain even better results with small sets.

...read moreread less

Abstract: Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

...read moreread less

104 citations

Journal Article•DOI•

Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test.

[...]

Andrew Soltan¹, Andrew Soltan², Samaneh Kouchaki³, Samaneh Kouchaki², Tingting Zhu², Dani Kiyasseh², Thomas Taylor², Zaamin B. Hussain⁴, Tim E. A. Peto¹, Tim E. A. Peto², Tim E. A. Peto⁵, Andrew Brent¹, Andrew Brent², David W Eyre², David W Eyre⁵, David W Eyre¹, David A. Clifton² - Show less +13 more•Institutions (5)

John Radcliffe Hospital¹, University of Oxford², University of Surrey³, Harvard University⁴, Public Health England⁵

01 Feb 2021

TL;DR: Two early-detection models for COVID-19 were developed and validated, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs).

...read moreread less

Abstract: Summary Background The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. Methods We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. Findings We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). Interpretation Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. Funding Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.

...read moreread less

81 citations

Collapse