Showing papers by "David L. Donoho published in 2019"

PDF

Open Access

Posted Content•

Degrees of Freedom Analysis of Unrolled Neural Networks

[...]

Morteza Mardani, Qingyun Sun, Vardan Papyan, Shreyas S. Vasanawala, John M. Pauly, David L. Donoho - Show less +2 more

10 Jun 2019-arXiv: Learning

TL;DR: This paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks using the Stein's Unbiased Risk Estimator (SURE), and proves that DOF is well-approximated by the weighted path sparsity of the network under incoherence conditions on the trained weights.

...read moreread less

Abstract: Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks. We particularly investigate the degrees-of-freedom (DOF) component of SURE, trace of the end-to-end network Jacobian, to quantify the prediction variance. We prove that DOF is well-approximated by the weighted \textit{path sparsity} of the network under incoherence conditions on the trained weights. Empirically, we examine the SURE components as a function of train sample size for both recurrent and non-recurrent (with many more parameters) unrolled networks. Our key observations indicate that: 1) DOF increases with train sample size and converges to the generalization risk for both recurrent and non-recurrent schemes; 2) recurrent network converges significantly faster (with less train samples) compared with non-recurrent scheme, hence recurrence serves as a regularization for low sample size regimes.

...read moreread less

6 citations

Journal Article•DOI•

Sparsity/undersampling tradeoffs in anisotropic undersampling, with applications in MR imaging/spectroscopy

[...]

Hatef Monajemi¹, David L. Donoho¹•Institutions (1)

Stanford University¹

19 Sep 2019-Information and Inference: A Journal of the IMA

TL;DR: The ability to recover a sparse object decreases with an increasing number of exhaustively sampled dimensions, and novel exact formulas for the sparsity/undersampling tradeoffs in such measurement systems, assuming uniform sparsity fractions in each column are developed.

...read moreread less

Abstract: We study anisotropic undersampling schemes like those used in multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in certain time dimensions and randomly in others. Our analysis shows that anisotropic undersampling schemes are equivalent to certain block-diagonal measurement systems. We develop novel exact formulas for the sparsity/undersampling tradeoffs in such measurement systems. Our formulas predict finite-N phase transition behavior differing substantially from the well known asymptotic phase transitions for classical Gaussian undersampling. Extensive empirical work shows that our formulas accurately describe observed finite-N behavior, while the usual formulas based on universality are substantially inaccurate. We also vary the anisotropy, keeping the total number of samples fixed, and for each variation we determine the precise sparsity/undersampling tradeoff (phase transition). We show that, other things being equal, the ability to recover a sparse object decreases with an increasing number of exhaustively-sampled dimensions.

...read moreread less

6 citations

Posted Content•

Ambitious Data Science Can Be Painless

[...]

Hatef Monajemi¹, Riccardo Murri², Eric Jonas³, Percy Liang¹, Victoria Stodden, David L. Donoho¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of Zurich², University of California, Berkeley³

25 Jan 2019-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This article discusses several painless computing stacks that abstract away the difficulties of massive experimentation, thereby allowing a proliferation of ambitious experiments for scientific discovery.

...read moreread less

Abstract: Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared campus-resident resources, are inadequate for experiments at the massive scale and varied scope that we now see in data science. On the other hand, modern cloud computing promises seemingly unlimited computational resources that can be custom configured, and seems to offer a powerful new venue for ambitious data-driven science. Exploiting the cloud fully, the amount of work that could be completed in a fixed amount of time can expand by several orders of magnitude. As potentially powerful as cloud-based experimentation may be in the abstract, it has not yet become a standard option for researchers in many academic disciplines. The prospect of actually conducting massive computational experiments in today's cloud systems confronts the potential user with daunting challenges. Leading considerations include: (i) the seeming complexity of today's cloud computing interface, (ii) the difficulty of executing an overwhelmingly large number of jobs, and (iii) the difficulty of monitoring and combining a massive collection of separate results. Starting a massive experiment `bare-handed' seems therefore highly problematic and prone to rapid `researcher burn out'. New software stacks are emerging that render massive cloud experiments relatively painless. Such stacks simplify experimentation by systematizing experiment definition, automating distribution and management of tasks, and allowing easy harvesting of results and documentation. In this article, we discuss several painless computing stacks that abstract away the difficulties of massive experimentation, thereby allowing a proliferation of ambitious experiments for scientific discovery.

...read moreread less

5 citations

Journal Article•DOI•

Ambitious Data Science Can Be Painless

[...]

Hatef Monajemi¹, Riccardo Murri², Eric Jonas³, Percy Liang¹, Victoria Stodden, David L. Donoho¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of Zurich², University of California, Berkeley³

01 Jul 2019

TL;DR: In this article, the authors discuss three such painless computing stacks, CodaLab, PyWren, and ElastiCluster-ClusterJob, which simplify experimentation by systematizing experiment definition, automating distribution and management of all tasks, and allowing easy harvesting of results and documentation.

...read moreread less

Abstract: Modern data science research, at the cutting edge, can involve massive computational experimentation; an ambitious PhD in computational fields may conduct experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops, PCs, or campus-resident resources with shared policies, are awkward or inadequate for experiments at the massive scale and varied scope that we now see in the most ambitious data science. On the other hand, modern cloud computing promises seemingly unlimited computational resources that can be custom configured, and seems to offer a powerful new venue for ambitious data-driven science. Exploiting the cloud fully, the amount of raw experimental work that could be completed in a fixed amount of calendar time ought to expand by several orders of magnitude. Still, at the moment, starting a massive experiment using cloud resources from scratch is commonly perceived as cumbersome, problematic, and prone to rapid ‘researcher burnout.’New software stacks are emerging that render massive cloud-based experiments relatively painless, thereby allowing a proliferation of ambitious experiments for scientific discovery. Such stacks simplify experimentation by systematizing experiment definition, automating distribution and management of all tasks, and allowing easy harvesting of results and documentation. In this article, we discuss three such painless computing stacks. These include CodaLab, from Percy Liang’s lab in Stanford Computer Science; PyWren, developed by Eric Jonas in the RISELab at UC Berkeley; and the ElastiCluster-ClusterJob stack developed at David Donoho’s research lab in Stanford Statistics in collaboration with the University of Zurich.Keywords: ambitious data science, painless computing stacks, cloud computing, experiment management system, massive computational experiments

...read moreread less

5 citations