Home
/
Authors
/
Saul Schleimer

Author

Saul Schleimer

Other affiliations: Rutgers University, University of California, University of California, Berkeley ...read more

Bio: Saul Schleimer is an academic researcher from University of Warwick. The author has contributed to research in topics: Surface (mathematics) & Genus (mathematics). The author has an hindex of 24, co-authored 101 publications receiving 2726 citations. Previous affiliations of Saul Schleimer include Rutgers University & University of California.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1997
1994

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Winnowing: local algorithms for document fingerprinting

[...]

Saul Schleimer¹, Daniel Shawcross Wilkerson², Alex Aiken²•Institutions (2)

University of Illinois at Chicago¹, University of California, Berkeley²

09 Jun 2003

TL;DR: The class of local document fingerprinting algorithms is introduced, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies, and a novel lower bound on the performance of any local algorithm is proved.

...read moreread less

Abstract: Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.

...read moreread less

1,220 citations

Journal Article•DOI•

The geometry of the disk complex

[...]

Howard Masur¹, Saul Schleimer²•Institutions (2)

University of Chicago¹, University of Warwick²

22 Aug 2012-Journal of the American Mathematical Society

TL;DR: In this paper, the authors gave a distance estimate for the disk complex and used the distance estimate to prove that disk complex is Gromov hyperbolic, up to an error depending only on the genus of the disk.

...read moreread less

Abstract: We give a distance estimate for the disk complex. We use the distance estimate to prove that the disk complex is Gromov hyperbolic. As another application of our techniques, we find an algorithm which computes the Hempel distance of a Heegaard splitting, up to an error depending only on the genus.

...read moreread less

141 citations

Posted Content•

The geometry of the disk complex

[...]

Howard Masur¹, Saul Schleimer²•Institutions (2)

University of Chicago¹, University of Warwick²

15 Oct 2010-arXiv: Geometric Topology

TL;DR: In this article, the authors gave a distance estimate for the metric on the disk complex and showed that it is Gromov hyperbolic, up to an error depending only on the genus of the genus.

...read moreread less

Abstract: We give a distance estimate for the metric on the disk complex and show that it is Gromov hyperbolic. As another application of our techniques, we find an algorithm which computes the Hempel distance of a Heegaard splitting, up to an error depending only on the genus.

...read moreread less

105 citations

Patent•

Method and apparatus for indexing document content and content comparison with World Wide Web search service

[...]

Alex Aiken¹, Saul Schleimer¹, Joel Auslander¹, Daniel Shawcross Wilkerson¹, Anthony Tomasic¹, Steve Fink¹ - Show less +2 more•Institutions (1)

University of California¹

12 Feb 2003

TL;DR: In this article, a method for comparing the contents of a query document to the content on the World Wide Web is presented, where the query document is indexed and compared to content from the Web which is continuously retrieved and indexed.

...read moreread less

Abstract: Methods and related systems for indexing the contents of documents for comparison with the contents of other documents to identify matching content. A method for comparing the contents of a query document to the content on the World Wide Web is set forth. The contents of a query document are indexed and compared to content from the World Wide Web which is continuously retrieved and indexed. The method for indexing may comprise selecting substrings from the document, hashing the substrings to generate a plurality of hash values having a known range of values, selecting certain hash values to save from the generated hash values, and sorting the saved hash values. Methods for selecting certain hash values to save are set forth.

...read moreread less

103 citations

Journal Article•DOI•

Distance and bridge position

[...]

David Bachman, Saul Schleimer

01 Apr 2005-Pacific Journal of Mathematics

TL;DR: In this paper, it was shown that the distance of a knot in bridge position is bounded above by twice the genus, plus the number of boundary components, of an essential surface in the knot complement.

...read moreread less

Abstract: J. Hempel's denition of the distance of a Heegaard surface generalizes to a notion of complexity for any knot that is in bridge position with respect to a Heegaard surface. Our main result is that the distance of a knot in bridge position is bounded above by twice the genus, plus the number of boundary components, of an essential surface in the knot complement. As a consequence knots constructed via suciently high powers of pseudo-Anosov maps have minimal bridge presentations which are thin.

...read moreread less

79 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Phd by thesis

[...]

Richard Lathe¹•Institutions (1)

French Institute of Health and Medical Research¹

01 Apr 1988-Nature

TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.

...read moreread less

Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

...read moreread less

9,929 citations

Book•

Parallel Computer Architecture: A Hardware/Software Approach

[...]

David E. Culler, Anoop Gupta, Jaswinder Pal Singh

15 Aug 1998

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.

...read moreread less

Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

...read moreread less

1,571 citations

Journal Article•DOI•

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

[...]

Heng Li¹•Institutions (1)

Broad Institute¹

15 Jul 2016-Bioinformatics

TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.

...read moreread less

Abstract: Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10–15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm Contact: gro.etutitsnidaorb@ilgneh Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

1,060 citations

Proceedings Article•DOI•

DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

[...]

Lingxiao Jiang¹, Ghassan Misherghi¹, Zhendong Su¹, Stéphane Glondu•Institutions (1)

University of California, Davis¹

24 May 2007

TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.

...read moreread less

Abstract: Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space \mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

...read moreread less

1,008 citations

Journal Article•DOI•

Scalable statistical bug isolation

[...]

Ben Liblit¹, Mayur Naik², Alice X. Zheng³, Alex Aiken², Michael I. Jordan³ - Show less +1 more•Institutions (3)

University of Wisconsin-Madison¹, Stanford University², University of California, Berkeley³

12 Jun 2005

TL;DR: A statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs and identifies predictors that are associated with individual bugs that reveal both the circumstances under which bugs occur as well as the frequencies of failure modes, making it easier to prioritize debugging efforts.

...read moreread less

Abstract: We present a statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs. Earlier statistical algorithms that focus solely on identifying predictors that correlate with program failure perform poorly when there are multiple bugs. Our new technique separates the effects of different bugs and identifies predictors that are associated with individual bugs. These predictors reveal both the circumstances under which bugs occur as well as the frequencies of failure modes, making it easier to prioritize debugging efforts. Our algorithm is validated using several case studies, including examples in which the algorithm identified previously unknown, significant crashing bugs in widely used systems.

...read moreread less

851 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse