scispace - formally typeset
Search or ask a question

Showing papers by "David L. Donoho published in 2019"


Posted Content
TL;DR: This paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks using the Stein's Unbiased Risk Estimator (SURE), and proves that DOF is well-approximated by the weighted path sparsity of the network under incoherence conditions on the trained weights.
Abstract: Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks. We particularly investigate the degrees-of-freedom (DOF) component of SURE, trace of the end-to-end network Jacobian, to quantify the prediction variance. We prove that DOF is well-approximated by the weighted \textit{path sparsity} of the network under incoherence conditions on the trained weights. Empirically, we examine the SURE components as a function of train sample size for both recurrent and non-recurrent (with many more parameters) unrolled networks. Our key observations indicate that: 1) DOF increases with train sample size and converges to the generalization risk for both recurrent and non-recurrent schemes; 2) recurrent network converges significantly faster (with less train samples) compared with non-recurrent scheme, hence recurrence serves as a regularization for low sample size regimes.

6 citations


Journal ArticleDOI
TL;DR: The ability to recover a sparse object decreases with an increasing number of exhaustively sampled dimensions, and novel exact formulas for the sparsity/undersampling tradeoffs in such measurement systems, assuming uniform sparsity fractions in each column are developed.
Abstract: We study anisotropic undersampling schemes like those used in multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in certain time dimensions and randomly in others. Our analysis shows that anisotropic undersampling schemes are equivalent to certain block-diagonal measurement systems. We develop novel exact formulas for the sparsity/undersampling tradeoffs in such measurement systems. Our formulas predict finite-N phase transition behavior differing substantially from the well known asymptotic phase transitions for classical Gaussian undersampling. Extensive empirical work shows that our formulas accurately describe observed finite-N behavior, while the usual formulas based on universality are substantially inaccurate. We also vary the anisotropy, keeping the total number of samples fixed, and for each variation we determine the precise sparsity/undersampling tradeoff (phase transition). We show that, other things being equal, the ability to recover a sparse object decreases with an increasing number of exhaustively-sampled dimensions.

6 citations


Posted Content
TL;DR: This article discusses several painless computing stacks that abstract away the difficulties of massive experimentation, thereby allowing a proliferation of ambitious experiments for scientific discovery.
Abstract: Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared campus-resident resources, are inadequate for experiments at the massive scale and varied scope that we now see in data science. On the other hand, modern cloud computing promises seemingly unlimited computational resources that can be custom configured, and seems to offer a powerful new venue for ambitious data-driven science. Exploiting the cloud fully, the amount of work that could be completed in a fixed amount of time can expand by several orders of magnitude. As potentially powerful as cloud-based experimentation may be in the abstract, it has not yet become a standard option for researchers in many academic disciplines. The prospect of actually conducting massive computational experiments in today's cloud systems confronts the potential user with daunting challenges. Leading considerations include: (i) the seeming complexity of today's cloud computing interface, (ii) the difficulty of executing an overwhelmingly large number of jobs, and (iii) the difficulty of monitoring and combining a massive collection of separate results. Starting a massive experiment `bare-handed' seems therefore highly problematic and prone to rapid `researcher burn out'. New software stacks are emerging that render massive cloud experiments relatively painless. Such stacks simplify experimentation by systematizing experiment definition, automating distribution and management of tasks, and allowing easy harvesting of results and documentation. In this article, we discuss several painless computing stacks that abstract away the difficulties of massive experimentation, thereby allowing a proliferation of ambitious experiments for scientific discovery.

5 citations


Journal ArticleDOI
01 Jul 2019
TL;DR: In this article, the authors discuss three such painless computing stacks, CodaLab, PyWren, and ElastiCluster-ClusterJob, which simplify experimentation by systematizing experiment definition, automating distribution and management of all tasks, and allowing easy harvesting of results and documentation.
Abstract: Modern data science research, at the cutting edge, can involve massive computational experimentation; an ambitious PhD in computational fields may conduct experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops, PCs, or campus-resident resources with shared policies, are awkward or inadequate for experiments at the massive scale and varied scope that we now see in the most ambitious data science. On the other hand, modern cloud computing promises seemingly unlimited computational resources that can be custom configured, and seems to offer a powerful new venue for ambitious data-driven science. Exploiting the cloud fully, the amount of raw experimental work that could be completed in a fixed amount of calendar time ought to expand by several orders of magnitude. Still, at the moment, starting a massive experiment using cloud resources from scratch is commonly perceived as cumbersome, problematic, and prone to rapid ‘researcher burnout.’New software stacks are emerging that render massive cloud-based experiments relatively painless, thereby allowing a proliferation of ambitious experiments for scientific discovery. Such stacks simplify experimentation by systematizing experiment definition, automating distribution and management of all tasks, and allowing easy harvesting of results and documentation. In this article, we discuss three such painless computing stacks. These include CodaLab, from Percy Liang’s lab in Stanford Computer Science; PyWren, developed by Eric Jonas in the RISELab at UC Berkeley; and the ElastiCluster-ClusterJob stack developed at David Donoho’s research lab in Stanford Statistics in collaboration with the University of Zurich.Keywords: ambitious data science, painless computing stacks, cloud computing, experiment management system, massive computational experiments

5 citations