scispace - formally typeset
Search or ask a question
Author

Saul Schleimer

Bio: Saul Schleimer is an academic researcher from University of Warwick. The author has contributed to research in topics: Surface (mathematics) & Genus (mathematics). The author has an hindex of 24, co-authored 101 publications receiving 2726 citations. Previous affiliations of Saul Schleimer include Rutgers University & University of California.


Papers
More filters
Proceedings ArticleDOI
09 Jun 2003
TL;DR: The class of local document fingerprinting algorithms is introduced, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies, and a novel lower bound on the performance of any local algorithm is proved.
Abstract: Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.

1,220 citations

Journal ArticleDOI
TL;DR: In this paper, the authors gave a distance estimate for the disk complex and used the distance estimate to prove that disk complex is Gromov hyperbolic, up to an error depending only on the genus of the disk.
Abstract: We give a distance estimate for the disk complex. We use the distance estimate to prove that the disk complex is Gromov hyperbolic. As another application of our techniques, we find an algorithm which computes the Hempel distance of a Heegaard splitting, up to an error depending only on the genus.

141 citations

Posted Content
TL;DR: In this article, the authors gave a distance estimate for the metric on the disk complex and showed that it is Gromov hyperbolic, up to an error depending only on the genus of the genus.
Abstract: We give a distance estimate for the metric on the disk complex and show that it is Gromov hyperbolic. As another application of our techniques, we find an algorithm which computes the Hempel distance of a Heegaard splitting, up to an error depending only on the genus.

105 citations

Patent
12 Feb 2003
TL;DR: In this article, a method for comparing the contents of a query document to the content on the World Wide Web is presented, where the query document is indexed and compared to content from the Web which is continuously retrieved and indexed.
Abstract: Methods and related systems for indexing the contents of documents for comparison with the contents of other documents to identify matching content. A method for comparing the contents of a query document to the content on the World Wide Web is set forth. The contents of a query document are indexed and compared to content from the World Wide Web which is continuously retrieved and indexed. The method for indexing may comprise selecting substrings from the document, hashing the substrings to generate a plurality of hash values having a known range of values, selecting certain hash values to save from the generated hash values, and sorting the saved hash values. Methods for selecting certain hash values to save are set forth.

103 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that the distance of a knot in bridge position is bounded above by twice the genus, plus the number of boundary components, of an essential surface in the knot complement.
Abstract: J. Hempel's denition of the distance of a Heegaard surface generalizes to a notion of complexity for any knot that is in bridge position with respect to a Heegaard surface. Our main result is that the distance of a knot in bridge position is bounded above by twice the genus, plus the number of boundary components, of an essential surface in the knot complement. As a consequence knots constructed via suciently high powers of pseudo-Anosov maps have minimal bridge presentations which are thin.

79 citations


Cited by
More filters
Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Book
15 Aug 1998
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

1,571 citations

Journal ArticleDOI
Heng Li1
TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.
Abstract: Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10–15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm Contact: gro.etutitsnidaorb@ilgneh Supplementary information: Supplementary data are available at Bioinformatics online.

1,060 citations

Proceedings ArticleDOI
24 May 2007
TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.
Abstract: Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space \mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

1,008 citations

Journal ArticleDOI
12 Jun 2005
TL;DR: A statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs and identifies predictors that are associated with individual bugs that reveal both the circumstances under which bugs occur as well as the frequencies of failure modes, making it easier to prioritize debugging efforts.
Abstract: We present a statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs. Earlier statistical algorithms that focus solely on identifying predictors that correlate with program failure perform poorly when there are multiple bugs. Our new technique separates the effects of different bugs and identifies predictors that are associated with individual bugs. These predictors reveal both the circumstances under which bugs occur as well as the frequencies of failure modes, making it easier to prioritize debugging efforts. Our algorithm is validated using several case studies, including examples in which the algorithm identified previously unknown, significant crashing bugs in widely used systems.

851 citations