scispace - formally typeset
Search or ask a question

Showing papers by "Luca Oneto published in 2019"


Journal ArticleDOI
TL;DR: This work aims to combine both technical and fundamental analysis through the application of data science and machine learning techniques and produces a robust predictive model able to forecast the trend of a portfolio composed by the twenty most capitalized companies listed in the NASDAQ100 index.
Abstract: Stock market prediction is one of the most challenging problems which has been distressing both researchers and financial analysts for more than half a century. To tackle this problem, two completely opposite approaches, namely technical and fundamental analysis, emerged. Technical analysis bases its predictions on mathematical indicators constructed on the stocks price, while fundamental analysis exploits the information retrieved from news, profitability, and macroeconomic factors. The competition between these schools of thought has led to many interesting achievements, however, to date, no satisfactory solution has been found. Our work aims to combine both technical and fundamental analysis through the application of data science and machine learning techniques. In this paper, the stock market prediction problem is mapped in a classification task of time series data. Indicators of technical analysis and the sentiment of news articles are both exploited as input. The outcome is a robust predictive model able to forecast the trend of a portfolio composed by the twenty most capitalized companies listed in the NASDAQ100 index. As a proof of real effectiveness of our approach, we exploit the predictions to run a high frequency trading simulation reaching more than 80% of annualized return. This project represents a step forward to combine technical and fundamental analysis and provides a starting point for developing new trading strategies.

167 citations


Journal ArticleDOI
TL;DR: In this paper, a data driven Digital Twin of the ship is built, leveraging on the large amount of information collected from the on-board sensors, and is used for estimating the speed loss due to marine fouling.

101 citations


Proceedings ArticleDOI
27 Jan 2019
TL;DR: This paper proposes to use Multitask Learning (MTL), enhanced with fairness constraints, to jointly learn group specific classifiers that leverage information between sensitive groups and proposes a three-pronged approach to tackle fairness, by increasing accuracy on each group, enforcing measures of fairness during training, and protecting sensitive information during testing.
Abstract: A central goal of algorithmic fairness is to reduce bias in automated decision making. An unavoidable tension exists between accuracy gains obtained by using sensitive information as part of a statistical model, and any commitment to protect these characteristics. Often, due to biases present in the data, using the sensitive information in the functional form of a classifier improves classification accuracy. In this paper we show how it is possible to get the best of both worlds: optimize model accuracy and fairness without explicitly using the sensitive feature in the functional form of the model, thereby treating different individuals equally. Our method is based on two key ideas. On the one hand, we propose to use Multitask Learning (MTL), enhanced with fairness constraints, to jointly learn group specific classifiers that leverage information between sensitive groups. On the other hand, since learning group specific models might not be permitted, we propose to first predict the sensitive features by any learning method and then to use the predicted sensitive feature to train MTL with fairness constraints. This enables us to tackle fairness with a three-pronged approach, that is, by increasing accuracy on each group, enforcing measures of fairness during training, and protecting sensitive information during testing. Experimental results on two real datasets support our proposal, showing substantial improvements in both accuracy and fairness.

52 citations


Posted Content
TL;DR: It is shown that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold and the overall procedure is shown to be statistically consistent both in terms of the classification error and fairness measure.
Abstract: We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets.

33 citations


Proceedings Article
08 Dec 2019
TL;DR: In this paper, the authors proposed a fair binary classification method based on the notion of Equal Opportunity, which requires the true positive rate to distribute equally across the sensitive groups in the binary classification.
Abstract: We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets.

23 citations


Journal ArticleDOI
01 Dec 2019
TL;DR: An induction motor bearings monitoring tool which leverages on stator currents signals processed with a deep learning architecture able to extract from the stator current signal a compact and expressive representation of the bearings state, ultimately providing a bearing fault detection system.
Abstract: Induction motors are fundamental components of several modern automation system, and they are one of the central pivot of the developing e-mobility era. The most vulnerable parts of an induction motor are the bearings, the stator winding, and the rotor bars. Consequently, monitoring and maintaining them during operations is vital. In this work, authors propose an induction motor bearings monitoring tool which leverages on stator currents signals processed with a deep learning architecture. Differently from the state-of-the-art approaches which exploit vibration signals, collected by easily damageable and intrusive vibration probes, the stator currents signals are already commonly available, or easily and unintrusively collectable. Moreover, instead of using now-classical data-driven models, authors exploit a deep learning architecture able to extract from the stator current signal a compact and expressive representation of the bearings state, ultimately providing a bearing fault detection system. In order to estimate the effectiveness of the proposal, authors collected a series of data from an inverter-fed motor mounting different artificially damaged bearings. Results show that the proposed approach provides a promising and effective yet simple bearing fault detection system.

21 citations


Proceedings ArticleDOI
05 Jun 2019
TL;DR: The paper describes the whole IAMS decisional process based on a real railway signaling use case: from field data acquisition to decision support, which includes data collection, preparation and analytics to extract knowledge on current and future assets’ status.
Abstract: One of the main benefits of the railways digital transformation is the possibility of increasing the efficiency of the Asset Management process through the combination of data-driven models and decision support systems, paving the road towards an Intelligent Asset Management System (IAMS). The paper describes the whole IAMS decisional process based on a real railway signaling use case: from field data acquisition to decision support. The process includes data collection, preparation and analytics to extract knowledge on current and future assets’ status. Then, the extracted knowledge is used within the decision support system to prioritize asset management interventions in a fully-automated way, by applying optimization logics and operational constraints.The target is to optimize the scheduling of maintenance activities, to maximize the service reliability and optimize both usage of resources and possession times, avoiding (or minimizing) contractual penalties and delays.In this context, a real use case related to signaling system and, in particular, to track circuits, is presented, applying the proposed methodology to an Italian urban rail network and showing the usefulness of the approach and its possible further developments.

21 citations


Journal ArticleDOI
TL;DR: The authors investigate the problems of predicting the hull condition in real operations based on data measured by the on-board systems and uses an unsupervised Machine Learning (ML) modelling technique to eliminate the need for collecting labeled data related to the hull and propeller fouling condition.

19 citations


Posted Content
TL;DR: This work argues that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.
Abstract: Developing learning methods which do not discriminate subgroups in the population is a central goal of algorithmic fairness. One way to reach this goal is by modifying the data representation in order to meet certain fairness constraints. In this work we measure fairness according to demographic parity. This requires the probability of the possible model decisions to be independent of the sensitive information. We argue that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting. We leverage task similarities by encouraging a shared fair representation across the tasks via low rank matrix factorization. We derive learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics. We present experiments on three real world datasets, showing that the proposed method outperforms state-of-the-art approaches by a significant margin.

18 citations


Journal ArticleDOI
TL;DR: An RF implementation called ReForeSt is developed, which, unlike the currently available solutions, can distribute data on available machines in two different ways to optimize the computational and memory requirements of RF with arbitrarily large datasets.
Abstract: In the current big data era, naive implementations of well-known learning algorithms cannot efficiently and effectively deal with large datasets. Random forests (RFs) are a popular ensemble-based method for classification. RFs have been shown to be effective in many different real-world classification problems and are commonly considered one of the best learning algorithms in this context. In this paper, we develop an RF implementation called ReForeSt, which, unlike the currently available solutions, can distribute data on available machines in two different ways to optimize the computational and memory requirements of RF with arbitrarily large datasets ranging from millions of samples to millions of features. A recently proposed improved RF formulation called random rotation ensembles can be used in conjunction with model selection to automatically tune the RF hyperparameters. We perform an extensive experimental evaluation on a wide range of large datasets and several environments with different numbers of machines and numbers of cores per machine. Results demonstrate that ReForeSt, in comparison to other state-of-the-art alternatives such as MLlib, is less computationally intensive, more memory efficient, and more effective.

14 citations


Proceedings ArticleDOI
14 Jul 2019
TL;DR: Results on a real simulation of trading show how the problem of stock market forecasting is formulated as regression of market returns can be used to invest in high performing stocks and achieve higher profits with less trades.
Abstract: Forecasting stock market behavior is an interesting and challenging problem. Regression of prices and classification of daily returns have been widely studied with the main goal of supplying forecasts useful in real trading scenarios. Unfortunately, the outcomes are not directly related with the maximization of the financial gain. Firstly, the optimal strategy requires to invest on the most performing asset every period and trading accordingly is not trivial given the predictions. Secondly, price fluctuations of different magnitude are often treated as equals even if during market trading losses or gains of different intensities are derived. In this paper, the problem of stock market forecasting is formulated as regression of market returns. This approach is able to estimate the amount of price change and thus the most performing assets. Price fluctuations of different magnitude are treated differently through the application of different weights on samples and the scarcity of data is addressed using transfer learning. Results on a real simulation of trading show how, given a finite amount of capital, the predictions can be used to invest in high performing stocks and, hence, achieve higher profits with less trades.

Posted Content
TL;DR: The generalized fairness measure reduces to well known notions of fairness available in literature, and derives learning guarantees for the method, that imply in particular its statistical consistency, both in terms of the risk and the fairness measure.
Abstract: We tackle the problem of algorithmic fairness, where the goal is to avoid the unfairly influence of sensitive information, in the general context of regression with possible continuous sensitive attributes. We extend the framework of fair empirical risk minimization to this general scenario, covering in this way the whole standard supervised learning setting. Our generalized fairness measure reduces to well known notions of fairness available in literature. We derive learning guarantees for our method, that imply in particular its statistical consistency, both in terms of the risk and the fairness measure. We then specialize our approach to kernel methods and propose a convex fair estimator in that setting. We test the estimator on a commonly used benchmark dataset (Communities and Crime) and on a new dataset collected at the University of Genova, containing the information of the academic career of five thousand students. The latter dataset provides a challenging real case scenario of unfair behaviour of standard regression methods that benefits from our methodology. The experimental results show that our estimator is effective at mitigating the trade-off between accuracy and fairness requirements.

Book ChapterDOI
18 Apr 2019
TL;DR: Results show that the most important patent’s features useful to predict IC refer to the specific technological areas, the backward citations, the technological domains and the family size.
Abstract: Capabilities and, in particular, Innovation Capability (IC), are fundamental strategic assets for companies in providing and sustaining their competitive advantage. IC is the firms’ ability to mobilize and create new knowledge applying appropriate process technologies and it has been investigated by means of its main determinants, usually divided into internal and external factors. In this paper, starting from the patent data, the patent’s forward citations are used as proxy of IC and the main patents’ features are considered as proxy of the determinants. In details, the main purpose of the paper is to understand the patent’s features that are relevant to predict IC. Three different algorithms of machine learning, i.e., Least Squares (RLS), Deep Neural Networks (DNN), and Decision Trees (DT), are employed for this investigation. Results show that the most important patent’s features useful to predict IC refer to the specific technological areas, the backward citations, the technological domains and the family size. These findings are confirmed by all the three algorithms used.

Journal ArticleDOI
TL;DR: In this paper, a hybrid approach is proposed to predict the cavitation noise spectra without requiring an actual test in a cavitation tunnel with a model of the propeller, which can be used to overcome typical model scale problems.

Book ChapterDOI
18 Apr 2019
TL;DR: The architecture of a system that leverages on bothBig Data Technologies and Distributed Ledger Technologies to better manage maintenance actions in the railways context is described.
Abstract: Big Data Technologies (BDTs) and Distributed Ledger Technologies (DLTs) can bring disruptive innovation in the way we handle, store, and process data to gain knowledge. In this paper, we describe the architecture of a system that leverages on both these technologies to better manage maintenance actions in the railways context. On one side we employ a permissioned DLT to ensure the complete transparency and auditability of the process, the integrity and availability of the inserted data and, most of all, the non-repudiation of the actions performed by each participant in the maintenance management process. On the other side, exploiting the availability of the data in a single repository (the ledger) and with a standardised format, thanks to the utilisation of a DLT, we adopt BDTs to leverage on the features of each maintenance job, together with external factors, to estimate the maintenance restoration time.

Book ChapterDOI
01 Jan 2019
TL;DR: The last chapter deals with the practical implementation in hardware of systems similar to the ones presented in previous chapters and tested by simulation only.
Abstract: The last chapter deals with the practical implementation in hardware of systems similar to the ones presented in previous chapters and tested by simulation only. The devices that host the projects are Field-Programmable Gate Arrays (FPGAs), inserted on commercially available boards and managed by Deeds and proprietary tools. A short description of the devices and the associated tools is presented. An original, hands-on introduction of the VHDL hardware description language is included. A few exercises of digital system design and prototyping complete the chapter.

Proceedings Article
01 Jan 2019
TL;DR: It is shown that, even where technological solutions are available, the law needs to keep up to support and accurately regulate the use of the technological solutions and to identify stumble points in this regard.
Abstract: This paper discusses whether the law is up to regulate machine learning (”ML”) model-based decision-making in the context of the railways. We especially deal with the fairness and accountability of these models when exploited in the context of train traffic management (”TTM”). Railway sector-specific regulation, in their quality as network industry, hereby serves as a pilot. We show that, even where technological solutions are available, the law needs to keep up to support and accurately regulate the use of the technological solutions and we identify stumble points in this regard.

Book ChapterDOI
18 Apr 2019
TL;DR: This work exploits finance-related numerical and textual data to predict different trend windows through several learning algorithms and demonstrates the non optimality of the daily trend prediction with the aim to establish a new guideline for future research.
Abstract: The problem of predicting future market trends has been attracting the interest of researches, mathematicians, and financial analysts for more then fifty years. Many different approaches have been proposed to solve the task. However only few of them have focused on the selection of the optimal trend window to be forecasted and most of the research focuses on the daily prediction without a proper explanation. In this work, we exploit finance-related numerical and textual data to predict different trend windows through several learning algorithms. We demonstrate the non optimality of the daily trend prediction with the aim to establish a new guideline for future research.

Journal ArticleDOI
TL;DR: This work proposes a multi-purpose algorithm for unsupervised or semi-supervised learning in order to determine a simple continuous region of points which can be adopted to describe a component or a product nominal behavior and used to detect anomalies which are outside it.

Book ChapterDOI
18 Apr 2019
TL;DR: This work deals with the problem of building an interpretable and reliable restoration time prediction system which leverages on the large amount of data generated by the network, on other freely available exogenous data such as the weather information, and the experience of the operators.
Abstract: Every time an asset of a large scale railway network is affected by a failure or maintained, it will impact not only the single asset functional behaviour but also the normal execution of the railway operations and trains circulation. In this framework, the restoration time, namely the time needed to restore the asset functionality, is a crucial information for handling and reducing this impact. In this work we deal with the problem of building an interpretable and reliable restoration time prediction system which leverages on the large amount of data generated by the network, on other freely available exogenous data such as the weather information, and the experience of the operators. Results on real world data coming from the Italian railway network will show the effectiveness and potentiality of our proposal.


Book ChapterDOI
01 Jan 2019
TL;DR: In this chapter, positional number systems (decimal, binary, octal, hexadecimal), BCD and Gray codes are presented together with the rules for the conversion between numbers encoded in different bases and the representations of negative numbers.
Abstract: The representation of numbers is essential for the digital logic design. In this chapter, positional number systems (decimal, binary, octal, hexadecimal), BCD and Gray codes are presented together with the rules for the conversion between numbers encoded in different bases and the representations of negative numbers. Then, the rules for the arithmetic operations and the circuits that execute them are presented. The addition of binary number is examined with particular attention, since it is the operation at the basis of all computational circuits. Alphanumeric codes and the concept of parity for error detection complete the chapter.

Proceedings Article
01 Jan 2019
TL;DR: This tutorial aims to showcase the state of the art on these increasingly relevant topics among ML theoreticians and practitioners by incorporating human-relevant requirements such as safety, fairness, privacy, and interpretability but also considering broad societal issues such as ethics and legislation.
Abstract: It has been argued that Artificial Intelligence (AI) is experiencing a fast process of commodification. Such characterization is on the interest of big IT companies, but it correctly reflects the current industrialization of AI. This phenomenon means that AI systems and products are reaching the society at large and, therefore, that societal issues related to the use of AI and Machine Learning (ML) cannot be ignored any longer. Designing ML models from this human-centered perspective means incorporating human-relevant requirements such as safety, fairness, privacy, and interpretability, but also considering broad societal issues such as ethics and legislation. These are essential aspects to foster the acceptance of ML-based technologies, as well as to ensure compliance with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. The ESANN special session for which this tutorial acts as an introduction aims to showcase the state of the art on these increasingly relevant topics among ML theoreticians and practitioners. For this purpose, we welcomed both solid contributions and preliminary relevant results showing the potential, the limitations and the challenges of new ideas, as well as refinements, or hybridizations among the different fields of research, ML and related approaches in facing real-world problems involving societal issues.

Proceedings ArticleDOI
14 Jul 2019
TL;DR: This work proposes a hybrid model for the prediction of ships propeller underwater radiated noise, able to exploit both the physical knowledge of the problem and the real data obtained from cavitation tunnel experiments performed on different propellers in different working conditions.
Abstract: In the latest years, models combining physical knowledge of a phenomenon and statistical inference are becoming of much interest in many real world applications. In this context, ship propeller underwater radiated noise is an interesting field of application for these so-called hybrid models, especially when the propeller cavitates. Nowadays, model scale tests are considered the state-of-the-art technique to predict the cavitation noise spectra. Unfortunately, they are negatively affected by scale effects which could alter the onset of some interesting cavitating phenomena respect to the full scale propeller; as a consequence, for some ship operational conditions it is not trivial to correctly reproduce the cavitation pattern in model scale tests. Moreover, model scale tests are quite expensive and time-consuming; it is not feasible to include them in the early stage of the design. Nevertheless, data collected during these tests can be adopted in order to tune a data-driven model while the physical equation describing the occurring phenomenon can be used to refine the prediction. In this work, the authors propose a hybrid model for the prediction of ships propeller underwater radiated noise, able to exploit both the physical knowledge of the problem and the real data obtained from cavitation tunnel experiments performed on different propellers in different working conditions. Results on real data will support the validity and the effectiveness of the proposal.


Book ChapterDOI
01 Jan 2019
TL;DR: This chapter introduces to the idea of digitally representing analog quantities and goes step by step through the main concepts of the Boolean algebra: variables, functions, truth tables, operations, and properties.
Abstract: This chapter introduces to the idea of digitally representing analog quantities and goes step by step through the main concepts of the Boolean algebra: variables, functions, truth tables, operations, and properties. The chapter is quite detailed and accompanied by many examples and exercises in order to provide a precise framework of the fundamentals of digital design. It includes the theorems which constitute the foundation for the application of the Boolean algebra to logic networks, with a precise focus on their application for combinational network design.

Book ChapterDOI
01 Jan 2019
TL;DR: The chapter introduces the concept and techniques for synchronization that will be further examined in the following ones.
Abstract: The transition from combinational to sequential networks is explained step by step, starting from a simple gate with feedback and arriving to the structure and behavior of the principal types of flip-flops. They are classified according to their temporal response (direct command, level enabled, master–slave, and edge triggered) and the logical operation (SR, D, JK). The timing parameters of physically implemented devices are considered. The chapter introduces the concept and techniques for synchronization that will be further examined in the following ones.

Book ChapterDOI
18 Apr 2019
TL;DR: This work proposes an hybrid approach for the prediction of the ship propeller cavitating vortex noise, adopting real data collected during extensive model scale tests in a cavitation tunnel.
Abstract: In many real world applications the physical knowledge of a phenomenon and data science can be combined together in order to get mutual benefits. As a result, it is possible to formulate a so-called hybrid model from the combination of the two approaches. In this work, we propose an hybrid approach for the prediction of the ship propeller cavitating vortex noise, adopting real data collected during extensive model scale tests in a cavitation tunnel. Results will show the effectiveness of the proposal.

Proceedings Article
01 Jan 2019
TL;DR: This work further develops the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually needing to know it and defines a prior and a posterior which gives more weight to functions which exhibit good generalization and fairness properties.
Abstract: We address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier. We face this issue in the PAC-Bayes framework and we present an approach which trades off and bounds the risk and the fairness of the Gibbs Classifier measured with respect to different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually needing to know it. In particular, we define a prior and a posterior which gives more weight to functions which exhibit good generalization and fairness properties.

Book ChapterDOI
18 Apr 2019
TL;DR: Results on real world data coming from the Italian railway network will show that the proposed solution outperforms the fully data-driven approach and could help the operators in timely identify and schedule the best train overtaking solution.
Abstract: Every time two or more trains are in the wrong relative position on the railway network because of maintenance, delays or other causes, it is required to decide if, where, and when to make them overtake. This is a quite complex problem that is tackled every day by the train operators exploiting their knowledge and experience since no effective automatic tools are available for large scale railway networks. In this work we propose a train overtaking hybrid prediction system. Our model is hybrid in the sense that it is able to both encapsulate the experience of the operators and integrate this knowledge with information coming from the historical data about the railway network using state-of-the-art data-driven techniques. Results on real world data coming from the Italian railway network will show that the proposed solution outperforms the fully data-driven approach and could help the operators in timely identify and schedule the best train overtaking solution.