scispace - formally typeset
Search or ask a question
Author

Jorge Cadima

Bio: Jorge Cadima is an academic researcher from Instituto Superior de Agronomia. The author has contributed to research in topics: Principal component analysis & Germination. The author has an hindex of 11, co-authored 21 publications receiving 2702 citations. Previous affiliations of Jorge Cadima include Technical University of Lisbon & University of Lisbon.

Papers
More filters
Journal ArticleDOI
TL;DR: The basic ideas of PCA are introduced, discussing what it can and cannot do, and some variants of the technique have been developed that are tailored to various different data types and structures.
Abstract: Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

4,289 citations

Journal ArticleDOI
TL;DR: Several algorithms for the optimization problems resulting from three different criteria in the context of principal components analysis are considered, and computational results are presented.

146 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of identifying subsets of variables that best approximate the full set of variables or their first few principal components is considered, thus stressing dimensionality reduction in terms of the original variables rather than the derived variables.
Abstract: Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.

95 citations

01 Jan 2009
TL;DR: The authors explored the relationship between the results from both the standard, column-centred, PCA and its uncentred counterpart, and found that the results of both types of PCA have more in common than might be supposed.
Abstract: Principal component analysis (PCA) can be seen as a singular value decomposition (SVD) of a column-centred data matrix. In a number of applications, no pre-processing of the data is carried out, and it is the uncentred data matrix that is subjected to an SVD, in what is often called an uncentred PCA. This paper explores the relationships between the results from both the standard, column-centred, PCA, and its uncentred counterpart. In particular, it obtains both exact results and bounds relating the eigenvalues and eigenvectors of the covariation matrices, as well as the principal components, in both types of analysis. These relationships highlight how the eigenvalues of both the covariance matrix and the matrix of non-central second moments contain much information that is highly informative for a comparative assessment of PCA and its uncentred variant. The relations and the examples also suggest that the results of both types of PCA have more in common than might be supposed.

55 citations

Journal ArticleDOI
TL;DR: In this article, the separation of morphometric variation into a component related to size and other components associated with shape is discussed, and a new technique is proposed within the principal component analysis (PCA) class.
Abstract: The separation of morphometric variation into a component related to size and other components associated with shape is of considerable interest and has generated much discussion in the literature. One class of approaches to achieving this separation is based on principal component analysis. A new technique is proposed within this class, which overcomes some of the disadvantages of existing approaches.

42 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The basic ideas of PCA are introduced, discussing what it can and cannot do, and some variants of the technique have been developed that are tailored to various different data types and structures.
Abstract: Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

4,289 citations

Book
03 May 2007
TL;DR: In this paper, the effects of rice farming on aquatic birds with mixed modelling were investigated using additive and generalised additive modeling and univariate methods to analyse abundance of decapod larvae.
Abstract: Introduction.- Data management and software.- Advice for teachers.- Exploration.- Linear regression.- Generalised linear modelling.- Additive and generalised additive modelling.- Introduction to mixed modelling.- Univariate tree models.- Measures of association.- Ordination--first encounter.- Principal component analysis and redundancy analysis.- Correspondence analysis and canonical correspondence analysis.- Introduction to discriminant analysis.- Principal coordinate analysis and non-metric multidimensional scaling.- Time series analysis--Introduction.- Common trends and sudden changes.- Analysis and modelling lattice data.- Spatially continuous data analysis and modelling.- Univariate methods to analyse abundance of decapod larvae.- Analysing presence and absence data for flatfish distribution in the Tagus estuary, Portugual.- Crop pollination by honeybees in an Argentinean pampas system using additive mixed modelling.- Investigating the effects of rice farming on aquatic birds with mixed modelling.- Classification trees and radar detection of birds for North Sea wind farms.- Fish stock identification through neural network analysis of parasite fauna.- Monitoring for change: using generalised least squares, nonmetric multidimensional scaling, and the Mantel test on western Montana grasslands.- Univariate and multivariate analysis applied on a Dutch sandy beach community.- Multivariate analyses of South-American zoobenthic species--spoilt for choice.- Principal component analysis applied to harbour porpoise fatty acid data.- Multivariate analysis of morphometric turtle data--size and shape.- Redundancy analysis and additive modelling applied on savanna tree data.- Canonical correspondence analysis of lowland pasture vegetation in the humid tropics of Mexico.- Estimating common trends in Portuguese fisheries landings.- Common trends in demersal communities on the Newfoundland-Labrador Shelf.- Sea level change and salt marshes in the Wadden Sea: a time series analysis.- Time series analysis of Hawaiian waterbirds.- Spatial modelling of forest community features in the Volzhsko-Kamsky reserve.

1,788 citations

Journal ArticleDOI
TL;DR: In this paper, an R package for automated model selection and multi-model inference with glm and related functions is presented. But it is not suitable for large candidate sets by avoiding memory limitation, facilitating parallelization and providing, in addition to exhaustive screening, a compiled genetic algorithm method.
Abstract: We introduce glmulti, an R package for automated model selection and multi-model inference with glm and related functions. From a list of explanatory variables, the provided function glmulti builds all possible unique models involving these variables and, optionally, their pairwise interactions. Restrictions can be specified for candidate models, by excluding specific terms, enforcing marginality, or controlling model complexity. Models are fitted with standard R functions like glm. The n best models and their support (e.g., (Q)AIC, (Q)AICc, or BIC) are returned, allowing model selection and multi-model inference through standard R functions. The package is optimized for large candidate sets by avoiding memory limitation, facilitating parallelization and providing, in addition to exhaustive screening, a compiled genetic algorithm method. This article briefly presents the statistical framework and introduces the package, with applications to simulated and real data.

962 citations

Journal ArticleDOI
TL;DR: The least absolute shrinkage and selection approach (LASSO) as mentioned in this paper is a technique for interpreting multiple regression equations, which is based on principal component analysis (PCA) in the context of multiple regression.
Abstract: In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difficult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefficients on all p variables. A common approach is to effectively ignore (treat as zero) any coefficients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading. There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivation of the linear functions, or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regression where similar problems arise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection o...

841 citations

Journal ArticleDOI
TL;DR: A brief overview of text classification algorithms is discussed in this article, where different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods are discussed, and the limitations of each technique and their application in real-world problems are discussed.
Abstract: In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.

624 citations