scispace - formally typeset
Search or ask a question

Showing papers by "French Institute for Research in Computer Science and Automation published in 2011"


Journal ArticleDOI
TL;DR: In this article, the authors show how to improve the performance of NumPy arrays through vectorizing calculations, avoiding copying data in memory, and minimizing operation counts, which is a technique similar to the one described in this paper.
Abstract: In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.

9,149 citations


Journal ArticleDOI
TL;DR: This effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.
Abstract: In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied to improve performance: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. We first present the NumPy array structure, then show how to use it for efficient computation, and finally how to share array data with other libraries.

5,307 citations


Journal ArticleDOI
TL;DR: This paper introduces a product quantization-based approach for approximate nearest neighbor search to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately.
Abstract: This paper introduces a product quantization-based approach for approximate nearest neighbor search. The idea is to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace quantization indices. The euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code. Experimental results show that our approach searches for nearest neighbors efficiently, in particular in combination with an inverted file system. Results for SIFT and GIST image descriptors show excellent search accuracy, outperforming three state-of-the-art approaches. The scalability of our approach is validated on a data set of two billion vectors.

2,559 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
Abstract: Feature trajectories have shown to be efficient for representing videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. However, the quality as well as quantity of these trajectories is often not sufficient. Inspired by the recent success of dense sampling in image classification, we propose an approach to describe videos by dense trajectories. We sample dense points from each frame and track them based on displacement information from a dense optical flow field. Given a state-of-the-art optical flow algorithm, our trajectories are robust to fast irregular motions as well as shot boundaries. Additionally, dense trajectories cover the motion information in videos well. We, also, investigate how to design descriptors to encode the trajectory information. We introduce a novel descriptor based on motion boundary histograms, which is robust to camera motion. This descriptor consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos. We evaluate our video description in the context of action classification with a bag-of-features approach. Experimental results show a significant improvement over the state of the art on four datasets of varying difficulty, i.e. KTH, YouTube, Hollywood2 and UCF sports.

2,383 citations


Journal ArticleDOI
TL;DR: This survey focuses on approaches that aim on classification of full-body motions, such as kicking, punching, and waving, and categorizes them according to how they represent the spatial and temporal structure of actions.

1,058 citations


Book ChapterDOI
TL;DR: An analysis of the current landscape of smart city pilot programmes, Future Internet experimentally-driven research and projects in the domain of Living Labs, common resources regarding research and innovation can be identified that can be shared in open innovation environments.
Abstract: Cities nowadays face complex challenges to meet objectives regarding socio-economic development and quality of life. The concept of "smart cities" is a response to these challenges. This paper explores "smart cities" as environments of open and user-driven innovation for experimenting and validating Future Internet-enabled services. Based on an analysis of the current landscape of smart city pilot programmes, Future Internet experimentally-driven research and projects in the domain of Living Labs, common resources regarding research and innovation can be identified that can be shared in open innovation environments. Effectively sharing these common resources for the purpose of establishing urban and regional innovation ecosystems requires sustainable partnerships and cooperation strategies among the main stakeholders.

1,007 citations


Posted Content
TL;DR: In this article, the authors present from a general perspective optimization tools and techniques dedicated to such sparsityinducing penalties, including proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions.
Abstract: Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

776 citations


Book
23 Dec 2011
TL;DR: This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view.
Abstract: Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. The goal of this monograph is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

775 citations


Journal ArticleDOI
TL;DR: The Assemblathon 1 competition is described, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies, and it is established that it is possible to assemble the genome to a high level of coverage and accuracy.
Abstract: Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome A total of 41 assemblies from 17 different groups were received Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://wwwassemblathonorg/

548 citations


Journal ArticleDOI
01 Mar 2011
TL;DR: Mayavi as mentioned in this paper is a general purpose, open source 3D scientific visualization package that is tightly integrated with the rich ecosystem of Python scientific packages, providing a continuum of tools for developing scientific applications, ranging from interactive and script-based data visualization in Python to full-blown custom end-user applications.
Abstract: Mayavi is a general purpose, open source 3D scientific visualization package that is tightly integrated with the rich ecosystem of Python scientific packages. Mayavi provides a continuum of tools for developing scientific applications, ranging from interactive and script-based data visualization in Python to full-blown custom end-user applications.

520 citations


Journal ArticleDOI
TL;DR: This article identifies requirements for the next generation of IoT experimental facilities, while providing a taxonomy, and survey currently available research testbeds, identify existing gaps, and suggest new directions based on experience from recent efforts in this field.
Abstract: The initial vision of the Internet of Things was of a world in which all physical objects are tagged and uniquely identified by RFID transponders. However, the concept has grown into multiple dimensions, encompassing sensor networks able to provide real-world intelligence and goal-oriented collaboration of distributed smart objects via local networks or global interconnections such as the Internet. Despite significant technological advances, difficulties associated with the evaluation of IoT solutions under realistic conditions in real-world experimental deployments still hamper their maturation and significant rollout. In this article we identify requirements for the next generation of IoT experimental facilities. While providing a taxonomy, we also survey currently available research testbeds, identify existing gaps, and suggest new directions based on experience from recent efforts in this field.

Journal ArticleDOI
TL;DR: A large‐scale analysis of the Aux/IAA‐ARF pathway in the shoot apex of Arabidopsis uncovered an unexpectedly simple distribution and structure of this pathway inThe shoot apex, providing evidence that the auxin signalling network is essential to create robust patterns at theshoot apex.
Abstract: The plant hormone auxin is thought to provide positional information for patterning during development. It is still unclear, however, precisely how auxin is distributed across tissues and how the hormone is sensed in space and time. The control of gene expression in response to auxin involves a complex network of over 50 potentially interacting transcriptional activators and repressors, the auxin response factors (ARFs) and Aux/IAAs. Here, we perform a large-scale analysis of the Aux/IAA-ARF pathway in the shoot apex of Arabidopsis, where dynamic auxin-based patterning controls organogenesis. A comprehensive expression map and full interactome uncovered an unexpectedly simple distribution and structure of this pathway in the shoot apex. A mathematical model of the Aux/IAA-ARF network predicted a strong buffering capacity along with spatial differences in auxin sensitivity. We then tested and confirmed these predictions using a novel auxin signalling sensor that reports input into the signalling pathway, in conjunction with the published DR5 transcriptional output reporter. Our results provide evidence that the auxin signalling network is essential to create robust patterns at the shoot apex.

Journal ArticleDOI
22 Jan 2011
TL;DR: Most notable initiatives towards whole application scalability in cloud environments are presented and relevant efforts at the edge of state of the art technology are presented, providing an encompassing overview of the trends they each follow.
Abstract: Scalability is said to be one of the major advantages brought by the cloud paradigm and, more specifically, the one that makes it different to an "advanced outsourcing" solution. However, there are some important pending issues before making the dreamed automated scaling for applications come true. In this paper, the most notable initiatives towards whole application scalability in cloud environments are presented. We present relevant efforts at the edge of state of the art technology, providing an encompassing overview of the trends they each follow. We also highlight pending challenges that will likely be addressed in new research efforts and present an ideal scalable cloud system.

Journal ArticleDOI
TL;DR: The organization of the challenge, the data and evaluation methods and the outcome of the initial launch with 20 algorithms, which comprised the comprehensive evaluation and comparison of 20 individual algorithms from leading academic and industrial research groups are detailed.
Abstract: EMPIRE10 (Evaluation of Methods for Pulmonary Image REgistration 2010) is a public platform for fair and meaningful comparison of registration algorithms which are applied to a database of intra patient thoracic CT image pairs. Evaluation of nonrigid registration techniques is a nontrivial task. This is compounded by the fact that researchers typically test only on their own data, which varies widely. For this reason, reliable assessment and comparison of different registration algorithms has been virtually impossible in the past. In this work we present the results of the launch phase of EMPIRE10, which comprised the comprehensive evaluation and comparison of 20 individual algorithms from leading academic and industrial research groups. All algorithms are applied to the same set of 30 thoracic CT pairs. Algorithm settings and parameters are chosen by researchers expert in the con figuration of their own method and the evaluation is independent, using the same criteria for all participants. All results are published on the EMPIRE10 website (http://empire10.isi.uu.nl). The challenge remains ongoing and open to new participants. Full results from 24 algorithms have been published at the time of writing. This paper details the organization of the challenge, the data and evaluation methods and the outcome of the initial launch with 20 algorithms. The gain in knowledge and future work are discussed.

Journal ArticleDOI
TL;DR: It is shown that, for any time-invariant exponentially stable linear system with additive disturbances, time-varying exponentially stable interval observers can be constructed.

Journal ArticleDOI
TL;DR: A general-purpose deformable registration algorithm referred to as "DRAMMS" is presented, which extracts Gabor attributes at each voxel and selects the optimal components, so that they form a highly distinctive morphological signature reflecting the anatomical context around each v oxel in a multi-scale and multi-resolution fashion.

Journal ArticleDOI
TL;DR: An action descriptor is developed that captures the structure of temporal similarities and dissimilarities within an action sequence and is shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored.
Abstract: This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to the current work, temporal ordering of local self-similarity descriptors can simply be ignored within a bag-of-features type of approach. Sufficient action discrimination is still retained in this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multiview correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public data sets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions, such as when recognizing actions from top views while using side views only for training.

Journal ArticleDOI
TL;DR: A common dataset with known ground truth and a reproducible methodology to quantitatively evaluate the performance of various diffusion models and tractography algorithms is used and evidence that diffusion models such as (fiber) orientation distribution functions correctly model the underlying fiber distribution is provided.

Journal ArticleDOI
TL;DR: The proposed approach to establishing correspondences between two sets of visual features using higher order constraints instead of the unary or pairwise ones used in classical methods is compared to state-of-the-art algorithms on both synthetic and real data.
Abstract: This paper addresses the problem of establishing correspondences between two sets of visual features using higher order constraints instead of the unary or pairwise ones used in classical methods. Concretely, the corresponding hypergraph matching problem is formulated as the maximization of a multilinear objective function over all permutations of the features. This function is defined by a tensor representing the affinity between feature tuples. It is maximized using a generalization of spectral techniques where a relaxed problem is first solved by a multidimensional power method and the solution is then projected onto the closest assignment matrix. The proposed approach has been implemented, and it is compared to state-of-the-art algorithms on both synthetic and real data.

Book ChapterDOI
18 May 2011
TL;DR: This paper presents a new privacy-preserving smart metering system that is private under the differential privacy model and therefore provides strong and provable guarantees.
Abstract: This paper presents a new privacy-preserving smart metering system. Our scheme is private under the differential privacy model and therefore provides strong and provable guarantees.With our scheme, an (electricity) supplier can periodically collect data from smart meters and derive aggregated statistics without learning anything about the activities of individual households. For example, a supplier cannot tell from a user's trace whether or when he watched TV or turned on heating. Our scheme is simple, efficient and practical. Processing cost is very limited: smart meters only have to add noise to their data and encrypt the results with an efficient stream cipher.

Journal ArticleDOI
TL;DR: This work proposes effective pursuit methods that aim to solve inverse problems regularized with the analysis-model prior, accompanied by a preliminary theoretical study of their performance.
Abstract: After a decade of extensive study of the sparse representation synthesis model, we can safely say that this is a mature and stable field, with clear theoretical foundations, and appealing applications. Alongside this approach, there is an analysis counterpart model, which, despite its similarity to the synthesis alternative, is markedly different. Surprisingly, the analysis model did not get a similar attention, and its understanding today is shallow and partial. In this paper we take a closer look at the analysis approach, better define it as a generative model for signals, and contrast it with the synthesis one. This work proposes effective pursuit methods that aim to solve inverse problems regularized with the analysis-model prior, accompanied by a preliminary theoretical study of their performance. We demonstrate the effectiveness of the analysis model in several experiments.

Journal ArticleDOI
TL;DR: A new prioritized task-regulation framework based on a sequence of quadratic programs (QP) that removes the limitation of inequality constraints and is implemented and illustrated in simulation on the humanoid robot HRP-2.
Abstract: Redundant mechanical systems like humanoid robots are designed to fulfill multiple tasks at a time. A task, in velocity-resolved inverse kinematics, is a desired value for a function of the robot configuration that can be regulated with an ordinary differential equation (ODE). When facing simultaneous tasks, the corresponding equations can be grouped in a single system or, better, sorted in priority and solved each in the solutions set of higher priority tasks. This elegant framework for hierarchical task regulation has been implemented as a sequence of least-squares problems. Its limitation lies in the handling of inequality constraints, which are usually transformed into more restrictive equality constraints through potential fields. In this paper, we propose a new prioritized task-regulation framework based on a sequence of quadratic programs (QP) that removes the limitation. At the basis of the proposed algorithm, there is a study of the optimal sets resulting from the sequence of QPs. The algorithm is implemented and illustrated in simulation on the humanoid robot HRP-2.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work addresses the problem of person detection and tracking in crowded video scenes by exploring constraints imposed by the crowd density and formulate person detection as the optimization of a joint energy function combining crowd density estimation and the localization of individual people.
Abstract: We address the problem of person detection and tracking in crowded video scenes. While the detection of individual objects has been improved significantly over the recent years, crowd scenes remain particularly challenging for the detection and tracking tasks due to heavy occlusions, high person densities and significant variation in people's appearance. To address these challenges, we propose to leverage information on the global structure of the scene and to resolve all detections jointly. In particular, we explore constraints imposed by the crowd density and formulate person detection as the optimization of a joint energy function combining crowd density estimation and the localization of individual people. We demonstrate how the optimization of such an energy function significantly improves person detection and tracking in crowds. We validate our approach on a challenging video dataset of crowded scenes.

Journal ArticleDOI
15 Apr 2011
TL;DR: This is the first study using a ground truth showing that the overly fine granularity of database entries makes their accuracy worse, not better, and quantifies the accuracy of geolocation databases on a large European ISP based on ground truth information.
Abstract: The most widely used technique for IP geolocation consists in building a database to keep the mapping between IP blocks and a geographic location. Several databases are available and are frequently used by many services and web sites in the Internet. Contrary to widespread belief, geolocation databases are far from being as reliable as they claim. In this paper, we conduct a comparison of several current geolocation databases -both commercial and free- to have an insight of the limitations in their usability.First, the vast majority of entries in the databases refer only to a few popular countries (e.g., U.S.). This creates an imbalance in the representation of countries across the IP blocks of the databases. Second, these entries do not reflect the original allocation of IP blocks, nor BGP announcements. In addition, we quantify the accuracy of geolocation databases on a large European ISP based on ground truth information. This is the first study using a ground truth showing that the overly fine granularity of database entries makes their accuracy worse, not better. Geolocation databases can claim country-level accuracy, but certainly not city-level.

Posted Content
TL;DR: Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular function and (2) the lovasz extension of sub-modular Functions provides a useful set of regularization functions for supervised and unsupervised learning as discussed by the authors.
Abstract: Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this monograph, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.

Journal ArticleDOI
TL;DR: A clearer picture of the frontier between decidability and non-decidability of reasoning with positive rules, which have the same logical form as tuple-generating dependencies in databases and as conceptual graph rules are provided.

Proceedings ArticleDOI
28 Mar 2011
TL;DR: YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge, is presented, automatically built from Wikipedia, GeoNames, and WordNet, and contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge.
Abstract: We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, and WordNet, and contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge. An enhanced data representation introduces time and location as first-class citizens. The wealth of spatio-temporal information in YAGO can be explored either graphically or through a special time- and space-aware query language.

Journal ArticleDOI
TL;DR: Self-composition enables the use of standard techniques for information flow policy verification, such as program logics and model checking, that are suitable in Proof Carrying Code infrastructures and is illustrated in several settings, including different security policies such as non-interference and controlled forms of declassification and programming languages including an imperative language with parallel composition.
Abstract: Information flow policies are confidentiality policies that control information leakage through program execution. A common way to enforce secure information flow is through information flow type systems. Although type systems are compositional and usually enjoy decidable type checking or inference, their extensibility is very poor: type systems need to be redefined and proved sound for each new variation of security policy and programming language for which secure information flow verification is desired. In contrast, program logics offer a general mechanism for enforcing a variety of safety policies, and for this reason are favoured in Proof Carrying Code, which is a promising security architecture for mobile code. However, the encoding of information flow policies in program logics is not straightforward because they refer to a relation between two program executions. The purpose of this paper is to investigate logical formulations of secure information flow based on the idea of self-composition, which reduces the problem of secure information flow of a program P to a safety property for a program derived from P by composing P with a renaming of itself. Self-composition enables the use of standard techniques for information flow policy verification, such as program logics and model checking, that are suitable in Proof Carrying Code infrastructures. We illustrate the applicability of self-composition in several settings, including different security policies such as non-interference and controlled forms of declassification, and programming languages including an imperative language with parallel composition, a non-deterministic language and, finally, a language with shared mutable data structures.

Journal ArticleDOI
01 Nov 2011
TL;DR: This work presents paris, an approach for the automatic alignment of ontologies, which aligns not only instances, but also relations and classes and provides a truly holistic solution to the problem of ontology alignment.
Abstract: One of the main challenges that the Semantic Web faces is the integration of a growing number of independently designed ontologies. In this work, we present paris, an approach for the automatic alignment of ontologies. paris aligns not only instances, but also relations and classes. Alignments at the instance level cross-fertilize with alignments at the schema level. Thereby, our system provides a truly holistic solution to the problem of ontology alignment. The heart of the approach is probabilistic, i.e., we measure degrees of matchings based on probability estimates. This allows paris to run without any parameter tuning. We demonstrate the efficiency of the algorithm and its precision through extensive experiments. In particular, we obtain a precision of around 90% in experiments with some of the world's largest ontologies.

Journal ArticleDOI
TL;DR: A family of objective measures aiming to predict subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined are proposed.
Abstract: We aim to assess the perceived quality of estimated source signals in the context of audio source separation. These signals may involve one or more kinds of distortions, including distortion of the target source, interference from the other sources or musical noise artifacts. We propose a subjective test protocol to assess the perceived quality with respect to each kind of distortion and collect the scores of 20 subjects over 80 sounds. We then propose a family of objective measures aiming to predict these subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined. These measures increase correlation with subjective scores up to 0.5 compared to nonlinear mapping of individual state-of-the-art source separation measures. Finally, we released the data and code presented in this paper in a freely available toolkit called PEASS.