Relative-Error $CUR$ Matrix Decompositions
Reads0
Chats0
TLDR
Subspace sampling as discussed by the authors is a sampling method for low-rank matrix decompositions with relative error guarantees. But it is not known whether such a matrix decomposition exists in general.Abstract:
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.read more
Citations
More filters
Journal ArticleDOI
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Journal ArticleDOI
Hybrid Whale Optimization Algorithm with Simulated Annealing for Feature Selection
Majdi Mafarja,Seyedali Mirjalili +1 more
TL;DR: The experimental results confirm the efficiency of the proposed approaches in improving the classification accuracy compared to other wrapper-based algorithms, which insures the ability of WOA algorithm in searching the feature space and selecting the most informative attributes for classification tasks.
Journal ArticleDOI
CUR matrix decompositions for improved data analysis
TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.
Posted Content
Randomized algorithms for matrices and data
TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.
Book
Sketching as a Tool for Numerical Linear Algebra
TL;DR: A survey of linear sketching algorithms for numeric allinear algebra can be found in this paper, where the authors consider least squares as well as robust regression problems, low rank approximation, and graph sparsification.
References
More filters
Book
Matrix Analysis
Roger A. Horn,Charles R. Johnson +1 more
TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Journal ArticleDOI
An introduction to variable and feature selection
Isabelle Guyon,André Elisseeff +1 more
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Journal ArticleDOI
A haplotype map of the human genome
John W. Belmont,Andrew Boudreau,Suzanne M. Leal,Paul Hardenbol,Shiran Pasternak,David A. Wheeler,Thomas D. Willis,Fuli Yu,Huanming Yang,Gao Yang,H. B. Hu,Weitao Hu,Chaohua Li,Wei Lin,Siqi Liu,Hao Pan,Xiaoli Tang,Jian Wang,Wei Wang,Jun Yu,Bo Zhang,Qingrun Zhang,Hongbin Zhao,Jun Zhou,Rachel Barry,Brendan Blumenstiel,Amy L. Camargo,Matthew Defelice,Maura Faggart,Mary Goyette,Supriya Gupta,Jamie Moore,Huy Nguyen,Melissa Parkin,Jessica Roy,Erich Stahl,Ellen Winchester,David Altshuler,Yan Shen,Zhijian Yao,Wei Huang,Xun Chu,Yungang He,Li Jin,Yangfan Liu,Yayun Shen,Weiwei Sun,Haifeng Wang,Yi Wang,Ying Wang,Xiaoyan Xiong,Liang Xu,Mary M.Y. Waye,Stephen Kwok-Wing Tsui,Hong Xue,J. Tze Fei Wong,Launa M. Galver,Jian-Bing Fan,Sarah S. Murray,Arnold Oliphant,Mark S. Chee,Alexandre Montpetit,Fanny Chagnon,Vincent Ferretti,Martin Leboeuf,Jean François Olivier,Michael S. Phillips,Stéphanie Roumy,Clémentine Sallée,Andrei Verner,Thomas J. Hudson,Kelly A. Frazer,Dennis G. Ballinger,David R. Cox,David A. Hinds,Laura L. Stuve,Pui-Yan Kwok,Dongmei Cai,Daniel C. Koboldt,Raymond D. Miller,Ludmila Pawlikowska,Patricia Taillon-Miller,Ming Xiao,Lap-Chee Tsui,William Mak,Pak C. Sham,You-Qiang Song,Paul K.H. Tam,Yusuke Nakamura,Takahisa Kawaguchi,Takuya Kitamoto,Takashi Morizono,Atsushi Nagashima,Yozo Ohnishi,Akihiro Sekine,Toshihiro Tanaka,Panos Deloukas,Christine P. Bird,Marcos Delgado,Emmanouil T. Dermitzakis,Rhian Gwilliam,Sarah E. Hunt,Jonathan Morrison,Don Powell,Barbara E. Stranger,Pamela Whittaker,David R. Bentley,Paul I.W. de Bakker,Jeffrey C. Barrett,Ben Fry,Julian Maller,Steve McCarroll,Nick Patterson,Itsik Pe'er,Shaun Purcell,Daniel J. Richter,Pardis C. Sabeti,Richa Saxena,Stephen F. Schaffner,Patrick Varilly,Lincoln Stein,Lalitha Krishnan,Albert V. Smith,Gudmundur A. Thorisson,Aravinda Chakravarti,Peter E. Chen,David J. Cutler,Carl S. Kashuk,Shin Lin,Gonçalo R. Abecasis,Weihua Guan,Heather M. Munro,Zhaohui S. Qin,Daryl J. Thomas,Gilean McVean,Leonardo Bottolo,Susana Eyheramendy,Colin Freeman,Jonathan Marchini,Simon Myers,Chris C. A. Spencer,Matthew Stephens,Peter Donnelly,Lon R. Cardon,Geraldine M. Clarke,David M. Evans,Andrew P. Morris,Bruce S. Weir,Tatsuhiko Tsunoda,James C. Mullikin,Stephen T. Sherry,Michael Feolo,Houcan Zhang,Changqing Zeng,Hui Zhao,Ichiro Matsuda,Yoshimitsu Fukushima,Darryl Macer,Eiko Suda,Charles N. Rotimi,Clement Adebamowo,Ike Ajayi,Toyin Aniagwu,Patricia A. Marshall,Chibuzor Nkwodimmah,Charmaine D.M. Royal,Mark Leppert,Missy Dixon,Andy Peiffer,Renzong Qiu,Alastair Kent,Kazuto Kato,Norio Niikawa,Isaac F. Adewole,Bartha Maria Knoppers,Morris W. Foster,Ellen Wright Clayton,Jessica Watkin,Richard A. Gibbs,Donna M. Muzny,Lynne V. Nazareth,Erica Sodergren,George M. Weinstock,Imtiaz Yakub,Stacey Gabriel,Robert C. Onofrio,Liuda Ziaugra,Bruce W. Birren,Mark J. Daly,Richard K. Wilson,Lucinda Fulton,Jane Rogers,John Burton,Nigel P. Carter,C M Clee,Mark Griffiths,Matthew C. Jones,Kirsten McLay,Robert W. Plumb,Mark T. Ross,Sarah Sims,David Willey,Zhu Chen,Hua Han,L. Kang,Martin Godbout,John C. Wallenburg,Paul L'Archevêque,Guy Bellemare,Koji Saeki,Hongguang Wang,Daochang An,Hongbo Fu,Qing Li,Zhen Wang,Renwu Wang,Arthur L. Holden,Lisa D. Brooks,Jean E. McEwen,Christianne R. Bird,Mark S. Guyer,Patrick J. Nailer,Vivian Ota Wang,Jane Peterson,Michael Shi,Jack Spiegel,Lawrence M. Sung,Jonathan Witonsky,Lynn F. Zacharia,Francis S. Collins,Karen Kennedy,Ruth Jamieson,John Stewart +232 more
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Book
Generalized inverses: theory and applications
Adi Ben-Israel,T. N. E. Greville +1 more
TL;DR: In this paper, the Moore of the Moore-Penrose Inverse is described as a generalized inverse of a linear operator between Hilbert spaces, and a spectral theory for rectangular matrices is proposed.