scispace - formally typeset
Search or ask a question
Author

Paolo Napoletano

Bio: Paolo Napoletano is an academic researcher from University of Milano-Bicocca. The author has contributed to research in topics: Convolutional neural network & Deep learning. The author has an hindex of 27, co-authored 131 publications receiving 2634 citations. Previous affiliations of Paolo Napoletano include University of Milan & Istituto Nazionale di Fisica Nucleare.


Papers
More filters
Journal ArticleDOI
TL;DR: An in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition, with a complete view of what solutions have been explored so far and in which research directions are worth exploring in the future.
Abstract: This paper presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition. For each DNN, multiple performance indices are observed, such as recognition accuracy, model complexity, computational complexity, memory usage, and inference time. The behavior of such performance indices and some combinations of them are analyzed and discussed. To measure the indices, we experiment the use of DNNs on two different computer architectures, a workstation equipped with a NVIDIA Titan X Pascal, and an embedded system based on a NVIDIA Jetson TX1 board. This experimentation allows a direct comparison between DNNs running on machines with very different computational capacities. This paper is useful for researchers to have a complete view of what solutions have been explored so far and in which research directions are worth exploring in the future, and for practitioners to select the DNN architecture(s) that better fit the resource constraints of practical deployments and applications. To complete this work, all the DNNs, as well as the software used for the analysis, are available online.

626 citations

Journal ArticleDOI
TL;DR: A new dataset of acceleration samples acquired with an Android smartphone designed for human activity recognition and fall detection is presented and shows that the presence of samples of the same subject both in the training and in the test datasets, increases the performance of the classifiers regardless of the feature vector used.
Abstract: Smartphones, smartwatches, fitness trackers, and ad-hoc wearable devices are being increasingly used to monitor human activities. Data acquired by the hosted sensors are usually processed by machine-learning-based algorithms to classify human activities. The success of those algorithms mostly depends on the availability of training (labeled) data that, if made publicly available, would allow researchers to make objective comparisons between techniques. Nowadays, there are only a few publicly available data sets, which often contain samples from subjects with too similar characteristics, and very often lack specific information so that is not possible to select subsets of samples according to specific criteria. In this article, we present a new dataset of acceleration samples acquired with an Android smartphone designed for human activity recognition and fall detection. The dataset includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL) and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL performed, the age, the gender, and so on. Finally, the dataset has been benchmarked with four different classifiers and with two different feature vectors. We evaluated four different classification tasks: fall vs. no fall, 9 activities, 8 falls, 17 activities and falls. For each classification task, we performed a 5-fold cross-validation (i.e., including samples from all the subjects in both the training and the test dataset) and a leave-one-subject-out cross-validation (i.e., the test data include the samples of a subject only, and the training data, the samples of all the other subjects). Regarding the classification tasks, the major findings can be summarized as follows: (i) it is quite easy to distinguish between falls and ADLs, regardless of the classifier and the feature vector selected. Indeed, these classes of activities present quite different acceleration shapes that simplify the recognition task; (ii) on average, it is more difficult to distinguish between types of falls than between types of activities, regardless of the classifier and the feature vector selected. This is due to the similarity between the acceleration shapes of different kinds of falls. On the contrary, ADLs acceleration shapes present differences except for a small group. Finally, the evaluation shows that the presence of samples of the same subject both in the training and in the test datasets, increases the performance of the classifiers regardless of the feature vector used. This happens because each human subject differs from other subjects in performing activities even if she shares with them the same physical characteristics.

352 citations

Journal ArticleDOI
TL;DR: The best proposal, named DeepBIQ, estimates the image quality by average-pooling the scores predicted on multiple subregions of the original image, having a linear correlation coefficient with human subjective scores of almost 0.91.
Abstract: In this work, we investigate the use of deep learning for distortion-generic blind image quality assessment. We report on different design choices, ranging from the use of features extracted from pre-trained convolutional neural networks (CNNs) as a generic image description, to the use of features extracted from a CNN fine-tuned for the image quality task. Our best proposal, named DeepBIQ, estimates the image quality by average-pooling the scores predicted on multiple subregions of the original image. Experimental results on the LIVE In the Wild Image Quality Challenge Database show that DeepBIQ outperforms the state-of-the-art methods compared, having a linear correlation coefficient with human subjective scores of almost 0.91. These results are further confirmed also on four benchmark databases of synthetically distorted images: LIVE, CSIQ, TID2008, and TID2013.

254 citations

Journal ArticleDOI
12 Jan 2018-Sensors
TL;DR: A region-based method for the detection and localization of anomalies in SEM images, based on Convolutional Neural Networks (CNNs) and self-similarity, which outperforms the state of the art.
Abstract: Automatic detection and localization of anomalies in nanofibrous materials help to reduce the cost of the production process and the time of the post-production visual inspection process. Amongst all the monitoring methods, those exploiting Scanning Electron Microscope (SEM) imaging are the most effective. In this paper, we propose a region-based method for the detection and localization of anomalies in SEM images, based on Convolutional Neural Networks (CNNs) and self-similarity. The method evaluates the degree of abnormality of each subregion of an image under consideration by computing a CNN-based visual similarity with respect to a dictionary of anomaly-free subregions belonging to a training set. The proposed method outperforms the state of the art.

218 citations

Journal ArticleDOI
TL;DR: A new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class.
Abstract: We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1027 canteen trays for a total of 3616 food instances belonging to 73 food classes. The food on the tray images has been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented with three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using convolutional-neural-networks-based features. The dataset, as well as the benchmark framework, are available to the research community.

180 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

3,627 citations

01 Jan 2006

3,012 citations

Journal ArticleDOI
TL;DR: The challenges of using deep learning for remote-sensing data analysis are analyzed, recent advances are reviewed, and resources are provided that hope will make deep learning in remote sensing seem ridiculously simple.
Abstract: Central to the looming paradigm shift toward data-intensive science, machine-learning techniques are becoming increasingly important. In particular, deep learning has proven to be both a major breakthrough and an extremely powerful tool in many fields. Shall we embrace deep learning as the key to everything? Or should we resist a black-box solution? These are controversial issues within the remote-sensing community. In this article, we analyze the challenges of using deep learning for remote-sensing data analysis, review recent advances, and provide resources we hope will make deep learning in remote sensing seem ridiculously simple. More importantly, we encourage remote-sensing scientists to bring their expertise into deep learning and use it as an implicit general model to tackle unprecedented, large-scale, influential challenges, such as climate change and urbanization.

2,095 citations

Proceedings Article
01 Jan 1989
TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.
Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >

2,000 citations