Home
/
Authors
/
James Bornholt

Author

James Bornholt

Other affiliations: Australian National University, University of Texas at Austin

Bio: James Bornholt is an academic researcher from University of Washington. The author has contributed to research in topics: Compiler & Semantics (computer science). The author has an hindex of 16, co-authored 26 publications receiving 981 citations. Previous affiliations of James Bornholt include Australian National University & University of Texas at Austin.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A DNA-Based Archival Storage System

[...]

James Bornholt¹, Randolph Lopez¹, Douglas Carmean², Luis Ceze¹, Georg Seelig¹, Karin Strauss² - Show less +2 more•Institutions (2)

University of Washington¹, Microsoft²

25 Mar 2016

TL;DR: An architecture for a DNA-based archival storage system is presented, structured as a key-value store, and leverages common biochemical techniques to provide random access, and a new encoding scheme is proposed that offers controllable redundancy, trading off reliability for density.

...read moreread less

Abstract: Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm3 (109 GB/mm3), and long-lasting, with observed half-life of over 500 years. This paper presents an architecture for a DNA-based archival storage system. It is structured as a key-value store, and leverages common biochemical techniques to provide random access. We also propose a new encoding scheme that offers controllable redundancy, trading off reliability for density. We demonstrate feasibility, random access, and robustness of the proposed encoding with wet lab experiments involving 151 kB of synthesized DNA and a 42 kB random-access subset, and simulation experiments of larger sets calibrated to the wet lab experiments. Finally, we highlight trends in biotechnology that indicate the impending practicality of DNA storage for much larger datasets.

...read moreread less

349 citations

Proceedings Article•DOI•

UncertainT>: a first-order type for uncertain data

[...]

James Bornholt¹, Todd Mytkowicz², Kathryn S. McKinley²•Institutions (2)

Australian National University¹, Microsoft²

24 Feb 2014

TL;DR: A Bayesian network semantics for computation and conditionals that improves program correctness and the Uncertain type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives.

...read moreread less

Abstract: Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces UncertainT>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that UncertainT> improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, UncertainT> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The UncertainT> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make UncertainT> a compelling programming model for modern applications facing the challenge of uncertainty.

...read moreread less

121 citations

Proceedings Article•DOI•

Optimizing synthesis with metasketches

[...]

James Bornholt¹, Emina Torlak¹, Dan Grossman¹, Luis Ceze¹•Institutions (1)

University of Washington¹

11 Jan 2016

TL;DR: In this paper, the authors propose metasketches, a general framework for specifying and solving optimal synthesis problems, which makes the search strategy a part of the problem definition by specifying a fragmentation of the search space into an ordered set of classic sketches.

...read moreread less

Abstract: Many advanced programming tools---for both end-users and expert developers---rely on program synthesis to automatically generate implementations from high-level specifications. These tools often need to employ tricky, custom-built synthesis algorithms because they require synthesized programs to be not only correct, but also optimal with respect to a desired cost metric, such as program size. Finding these optimal solutions efficiently requires domain-specific search strategies, but existing synthesizers hard-code the strategy, making them difficult to reuse. This paper presents metasketches, a general framework for specifying and solving optimal synthesis problems. metasketches make the search strategy a part of the problem definition by specifying a fragmentation of the search space into an ordered set of classic sketches. We provide two cooperating search algorithms to effectively solve metasketches. A global optimizing search coordinates the activities of local searches, informing them of the costs of potentially-optimal solutions as they explore different regions of the candidate space in parallel. The local searches execute an incremental form of counterexample-guided inductive synthesis to incorporate information sent from the global search. We present Synapse, an implementation of these algorithms, and show that it effectively solves optimal synthesis problems with a variety of different cost functions. In addition, metasketches can be used to accelerate classic (non-optimal) synthesis by explicitly controlling the search strategy, and we show that Synapse solves classic synthesis problems that state-of-the-art tools cannot.

...read moreread less

80 citations

Proceedings Article•DOI•

Push-button verification of file systems via crash refinement

[...]

Helgi Sigurbjarnarson¹, James Bornholt¹, Emina Torlak¹, Xi Wang¹•Institutions (1)

University of Washington¹

02 Nov 2016

TL;DR: Yggdrasil is a toolkit for writing file systems with push-button verification that requires no manual annotations or proofs about the implementation code, and it produces a counterexample if there is a bug, and the experience shows that the ease of proof and countereXample-based debugging support make YggdrAsil practical for building reliable storage applications.

...read moreread less

Abstract: The file system is an essential operating system component for persisting data on storage devices. Writing bug-free file systems is non-trivial, as they must correctly implement and maintain complex on-disk data structures even in the presence of system crashes and reorderings of disk operations.This paper presents Yggdrasil, a toolkit for writing file systems with push-button verification: Yggdrasil requires no manual annotations or proofs about the implementation code, and it produces a counterexample if there is a bug. Yggdrasil achieves this automation through a novel definition of file system correctness called crash refinement, which requires the set of possible disk states produced by an implementation (including states produced by crashes) to be a subset of those allowed by the specification. Crash refinement is amenable to fully automated satisfiability modulo theories (SMT) reasoning, and enables developers to implement file systems in a modular way for verification.With Yggdrasil, we have implemented and verified the Yxv6 journaling file system, the Ycp file copy utility, and the Ylog persistent log. Our experience shows that the ease of proof and counterexample-based debugging support make Yggdrasil practical for building reliable storage applications.

...read moreread less

80 citations

Proceedings Article•DOI•

Hyperkernel: Push-Button Verification of an OS Kernel

[...]

Luke Nelson¹, Helgi Sigurbjarnarson¹, Kaiyuan Zhang¹, Dylan Johnson¹, James Bornholt¹, Emina Torlak¹, Xi Wang¹ - Show less +3 more•Institutions (1)

University of Washington¹

14 Oct 2017

TL;DR: Experience shows that Hyperkernel can avoid bugs similar to those found in xv6, and that the verification of Hyper kernel can be achieved with a low proof burden.

...read moreread less

Abstract: This paper describes an approach to designing, implementing, and formally verifying the functional correctness of an OS kernel, named Hyperkernel, with a high degree of proof automation and low proof burden. We base the design of Hyperkernel's interface on xv6, a Unix-like teaching operating system. Hyperkernel introduces three key ideas to achieve proof automation: it finitizes the kernel interface to avoid unbounded loops or recursion; it separates kernel and user address spaces to simplify reasoning about virtual memory; and it performs verification at the LLVM intermediate representation level to avoid modeling complicated C semantics. We have verified the implementation of Hyperkernel with the Z3 SMT solver, checking a total of 50 system calls and other trap handlers. Experience shows that Hyperkernel can avoid bugs similar to those found in xv6, and that the verification of Hyperkernel can be achieved with a low proof burden.

...read moreread less

73 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Survey of Techniques for Approximate Computing

[...]

Sparsh Mittal¹•Institutions (1)

Oak Ridge National Laboratory¹

18 Mar 2016-ACM Computing Surveys

TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.

...read moreread less

Abstract: Approximate computing trades off computation quality with effort expended, and as rising performance demands confront plateauing resource budgets, approximate computing has become not merely attractive, but even imperative. In this article, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units (e.g., CPU, GPU, and FPGA), processor components, memory technologies, and so forth, as well as programming frameworks for AC. We classify these techniques based on several key characteristics to emphasize their similarities and differences. The aim of this article is to provide insights to researchers into working of AC techniques and inspire more efforts in this area to make AC the mainstream computing approach in future systems.

...read moreread less

890 citations

Journal Article•DOI•

DNA Fountain enables a robust and efficient storage architecture.

[...]

Yaniv Erlich¹, Dina Zielinski•Institutions (1)

Columbia University¹

03 Mar 2017-Science

TL;DR: A storage strategy that is highly robust and approaches the information capacity per nucleotide, and a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports are reported.

...read moreread less

Abstract: DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.

...read moreread less

509 citations

Journal Article•DOI•

Approximate Computing: A Survey

[...]

Qiang Xu¹, Todd Mytkowicz², Nam Sung Kim³•Institutions (3)

The Chinese University of Hong Kong¹, Microsoft², University of Illinois at Urbana–Champaign³

01 Feb 2016-IEEE Design & Test of Computers

TL;DR: This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.

...read moreread less

Abstract: As one of the most promising energy-efficient computing paradigms, approximate computing has gained a lot of research attention in the past few years. This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.

...read moreread less

420 citations

Posted Content•

SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning

[...]

Xiaojun Xu, Chang Liu, Dawn Song

13 Nov 2017-arXiv: Computation and Language

TL;DR: A sketch-based approach where the sketch contains a dependency graph, so that one prediction can be done by taking into consideration only the previous predictions that it depends on, and it is shown that SQLNet can outperform the prior art by 9% to 13% on the WikiSQL task.

...read moreread less

Abstract: Synthesizing SQL queries from natural language is a long-standing open problem and has been attracting considerable interest recently. Toward solving the problem, the de facto approach is to employ a sequence-to-sequence-style model. Such an approach will necessarily require the SQL queries to be serialized. Since the same SQL query may have multiple equivalent serializations, training a sequence-to-sequence-style model is sensitive to the choice from one of them. This phenomenon is documented as the "order-matters" problem. Existing state-of-the-art approaches rely on reinforcement learning to reward the decoder when it generates any of the equivalent serializations. However, we observe that the improvement from reinforcement learning is limited. In this paper, we propose a novel approach, i.e., SQLNet, to fundamentally solve this problem by avoiding the sequence-to-sequence structure when the order does not matter. In particular, we employ a sketch-based approach where the sketch contains a dependency graph so that one prediction can be done by taking into consideration only the previous predictions that it depends on. In addition, we propose a sequence-to-set model as well as the column attention mechanism to synthesize the query based on the sketch. By combining all these novel techniques, we show that SQLNet can outperform the prior art by 9% to 13% on the WikiSQL task.

...read moreread less

304 citations

Proceedings Article•DOI•

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

[...]

Jiaqi Guo¹, Zecheng Zhan, Yan Gao², Yan Xiao, Jian-Guang Lou², Ting Liu¹, Dongmei Zhang² - Show less +3 more•Institutions (2)

Xi'an Jiaotong University¹, Microsoft²

01 Aug 2019

TL;DR: The proposed IRNet aims to address two challenges: the mismatch between intents expressed in natural language (NL) and the implementation details in SQL and the challenge in predicting columns caused by the large number of out-of-domain words.

...read moreread less

Abstract: We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard.

...read moreread less

290 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse