Home
/
Authors
/
Sean Dorward

Author

Sean Dorward

Bio: Sean Dorward is an academic researcher from Google. The author has contributed to research in topics: Web query classification & Procedural programming. The author has an hindex of 3, co-authored 5 publications receiving 883 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Interpreting the data: Parallel analysis with Sawzall

[...]

Rob Pike¹, Sean Dorward¹, Robert Griesemer¹, Sean Quinlan¹•Institutions (1)

Google¹

01 Oct 2005-Scientific Programming

TL;DR: The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.

...read moreread less

Abstract: Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.

...read moreread less

718 citations

Patent•

System and method for analyzing data records

[...]

Rob Pike¹, Sean Quinlan¹, Sean Dorward¹, Jeffrey Dean¹, Sanjay Ghemawat¹ - Show less +1 more•Institutions (1)

Google¹

28 Feb 2012

TL;DR: In this paper, a method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values.

...read moreread less

Abstract: A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.

...read moreread less

115 citations

Patent•

Enhanced search results

[...]

Walton W. Lin¹, Sean Dorward¹, Luis Castro¹•Institutions (1)

Google¹

31 Jan 2007

TL;DR: In this article, a method for generating search results based on a user's search query was proposed, where the search results and information identifying at least one of a telephone number or an address associated with the first one of the search result were provided to the user.

...read moreread less

Abstract: A method includes receiving a search query from a user and generating search results based on the search query. The method may also include providing the search results and information identifying at least one of a telephone number or an address associated with a first one of the search results to the user. The method may further include providing a link to a map associated with at least the first search result to the user.

...read moreread less

54 citations

Patent•

Decompression of block-sorted data

[...]

Sean Dorward¹, Sean Quinlan¹, Michael Burrows¹•Institutions (1)

Google¹

15 Jul 2004

TL;DR: In this article, the computational efficiency of decoding of block-sorted compressed data is improved by ensuring that more than one set of operations corresponding to a plurality of paths through a mapping array T are being handled by a processor.

...read moreread less

Abstract: In an embodiment of the present invention, the computational efficiency of decoding of block-sorted compressed data is improved by ensuring that more than one set of operations corresponding to a plurality of paths through a mapping array T are being handled by a processor. This sequence of operations, including instructions from the plurality of sets of operations, ensures that there is another operation in the pipeline if a cache miss on any given lookup operation in the mapping array results in a slower main memory access. In this way, the processor utilization is improved. While the sets of operations in the sequence of operations are independent of another other, there will be an overlap of a plurality of the main memory access operations due to the long time required for main memory access.

...read moreread less

2 citations

Patent•

Resultats de recherche ameliores

[...]

Walton W. Lin, Sean Dorward, Luis Castro

31 Jan 2007

TL;DR: The authors concerne un procede consistant a recevoir une demande de recherche d'un utilisateur and a produire les resultats de la recherches sur la base of cette demande.

...read moreread less

Abstract: L'invention concerne un procede consistant a recevoir une demande de recherche d'un utilisateur et a produire les resultats de la recherche sur la base de cette demande. Le procede peut egalement consister a fournir a l'utilisateur les resultats de la recherche et des informations identifiant au moins un numero de telephone ou une adresse associe(e) a un premier resultat parmi les resultats de la recherche. Le procede peut en outre consister a fournir a l'utilisateur un lien pointant vers une carte associee au moins au premier resultat de la recherche.

...read moreread less

Cited by

PDF

Open Access

More filters

Proceedings Article•

Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!).

[...]

Fay W. Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Deepak Chandra, Andrew Fikes, Robert Gruber - Show less +5 more

01 Jan 2006

TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

4,843 citations

Proceedings Article•DOI•

Pregel: a system for large-scale graph processing

[...]

Grzegorz Malewicz, Matthew H. Austern¹, Aart J. C. Bik¹, James C. Dehnert¹, Ilan Horn¹, Naty Leiser¹, Grzegorz Czajkowski¹ - Show less +3 more•Institutions (1)

Google¹

06 Jun 2010

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

...read moreread less

Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

...read moreread less

3,840 citations

Journal Article•DOI•

Bigtable: A Distributed Storage System for Structured Data

[...]

Fay W. Chang¹, Jeffrey Dean¹, Sanjay Ghemawat¹, Wilson C. Hsieh¹, Deborah A. Wallach¹, Michael Burrows¹, Tushar Deepak Chandra¹, Andrew Fikes¹, Robert E. Gruber¹ - Show less +5 more•Institutions (1)

Google¹

01 Jun 2008-ACM Transactions on Computer Systems

TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

...read moreread less

3,259 citations

Proceedings Article•DOI•

Dryad: distributed data-parallel programs from sequential building blocks

[...]

Michael Isard¹, Mihai Budiu¹, Yuan Yu¹, Andrew Birrell¹, Dennis Fetterly¹ - Show less +1 more•Institutions (1)

Microsoft¹

21 Mar 2007

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

...read moreread less

Abstract: Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "channels" to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through flies, TCP pipes, and shared-memory FIFOs.The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources.Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

...read moreread less

2,867 citations

Journal Article•DOI•

Big Data: A Survey

[...]

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

2,303 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

Collapse