Home
/
Authors
/
Laurent Mouchard

Author

Laurent Mouchard

Other affiliations: King's College London, Curtin University, Celera Corporation

Bio: Laurent Mouchard is an academic researcher from University of Rouen. The author has contributed to research in topics: Pattern matching & Suffix array. The author has an hindex of 19, co-authored 64 publications receiving 1099 citations. Previous affiliations of Laurent Mouchard include King's College London & Curtin University.

Papers published on a yearly basis

2020
2019
2018
2017
2016
2014
2013
2012
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1992
1968

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Whole-genome shotgun assembly and comparison of human genome assemblies

[...]

Sorin Istrail¹, Granger G. Sutton¹, Liliana Florea¹, Aaron L. Halpern, Clark M. Mobarry¹, Ross A. Lippert¹, Brian P. Walenz¹, Hagit Shatkay², Hagit Shatkay¹, Ian Dew¹, Jason R. Miller¹, Michael J. Flanigan¹, Nathan Edwards¹, Randall Bolanos¹, Daniel Fasulo¹, Bjarni V. Halldorsson¹, Sridhar Hannenhalli³, Sridhar Hannenhalli¹, Russell Turner¹, Shibu Yooseph¹, Fu Lu⁴, Deborah R. Nusskern⁴, Bixiong Chris Shue⁴, Xiangqun Holly Zheng⁴, Fei Zhong⁴, Arthur L. Delcher⁵, Daniel H. Huson⁴, Daniel H. Huson⁶, Saul A. Kravitz, Laurent Mouchard⁴, Laurent Mouchard⁷, Knut Reinert⁴, Knut Reinert⁸, Karin A. Remington, Andrew G. Clark⁹, Michael S. Waterman¹⁰, Evan E. Eichler¹¹, Mark Raymond Adams⁴, Mark Raymond Adams¹¹, Michael W. Hunkapiller¹, Eugene W. Myers¹², J. Craig Venter - Show less +38 more•Institutions (12)

Applied Biosystems¹, Queen's University², University of Pennsylvania³, Celera Corporation⁴, J. Craig Venter Institute⁵, University of Tübingen⁶, University of Rouen⁷, Free University of Berlin⁸, Cornell University⁹, University of Southern California¹⁰, Case Western Reserve University¹¹, University of California, Berkeley¹²

17 Feb 2004-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves.

...read moreread less

Abstract: We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.

...read moreread less

189 citations

Journal Article•DOI•

Algorithms for computing approximate repetitions in musical sequences

[...]

Emilios Cambouropoulos¹, Maxime Crochemore², Costas S. Iliopoulos³, Costas S. Iliopoulos⁴, Laurent Mouchard⁴, Laurent Mouchard⁵, Yoan J. Pinzón⁴, Yoan J. Pinzón³ - Show less +4 more•Institutions (5)

Austrian Research Institute for Artificial Intelligence¹, Institut Gaspard Monge², King's College London³, Curtin University⁴, University of Rouen⁵

01 Jan 2002-International Journal of Computer Mathematics

TL;DR: In this paper, the authors introduce two new notions of approximate matching with application in computer assisted music analysis, and present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.

...read moreread less

Abstract: Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.

...read moreread less

102 citations

Book Chapter•DOI•

Parallelising the Computation of Minimal Absent Words

[...]

Carl Barton¹, Alice Héliou², Alice Héliou³, Laurent Mouchard⁴, Solon P. Pissis⁵ - Show less +1 more•Institutions (5)

Queen Mary University of London¹, French Institute for Research in Computer Science and Automation², École Polytechnique³, University of Rouen⁴, King's College London⁵

01 Jan 2016

TL;DR: Experimental results show that a multiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches, and it is shown that the implementation achieves near-optimal speed-ups.

...read moreread less

Abstract: An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix array (Barton et al., 2014). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we present a new \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words; it has the desirable property that, given the indexing data structure at hand, the computation of minimal absent words can be executed in parallel. Experimental results show that a multiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches. By excluding the indexing data structure construction time, we show that the implementation achieves near-optimal speed-ups.

...read moreread less

65 citations

Proceedings Article•

String regularities with don't cares

[...]

Costas S. Iliopoulos¹, Manal Mohamed¹, Laurent Mouchard², William F. Smyth³, Katerina Perdikuri⁴, Athanasios K. Tsakalidis⁴ - Show less +2 more•Institutions (4)

King's College London¹, University of Rouen², McMaster University³, Research Academic Computer Technology Institute⁴

01 Mar 2003

TL;DR: Two simple practical algorithms that compute all the periods of every prefix of x are presented, which require quadratic worst-case time but only linear time in the average case.

...read moreread less

Abstract: We describe algorithms for computing typical regularities in strings x = x[1n] that contain don't care symbols For such strings on alphabet Σ, an O(n log n log |Σ|) worst-case time algorithm for computing the period is known, but the algorithm is impractical due to a large constant of proportionality We present instead two simple practical algorithms that compute all the periods of every prefix of x; our algorithms require quadratic worst-case time but only linear time in the average case We then show how our algorithms can be used to compute other string regularities, specifically the covers of both ordinary and circular strings

...read moreread less

41 citations

Journal Article•DOI•

Dynamic extended suffix arrays

[...]

Mikaël Salson¹, Thierry Lecroq¹, Martine Léonard¹, Laurent Mouchard²•Institutions (2)

University of Rouen¹, King's College London²

01 Jun 2010-Journal of Discrete Algorithms

TL;DR: This article presents an algorithm that modifies the suffix array and the Longest Common Prefix (LCP) array when the text is edited, based on a recent four-stage algorithm developed for dynamic Burrows-Wheeler Transforms (BWT).

...read moreread less

39 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A haplotype map of the human genome

[...]

John W. Belmont¹, Andrew Boudreau, Suzanne M. Leal¹, Paul Hardenbol +229 more•Institutions (40)

27 Oct 2005

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

...read moreread less

5,479 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Journal Article•DOI•

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

[...]

Sergey Koren¹, Brian P. Walenz¹, Konstantin Berlin², Jason R. Miller³, Nicholas H. Bergman, Adam M. Phillippy¹ - Show less +2 more•Institutions (3)

National Institutes of Health¹, Invincea², J. Craig Venter Institute³

15 Mar 2017-Genome Research

TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.

...read moreread less

Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

...read moreread less

4,806 citations

Journal Article•DOI•

Finishing the euchromatic sequence of the human genome

[...]

Chris P. Ponting, Daniel Barker

21 Oct 2004-Nature

TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.

...read moreread less

Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

...read moreread less

3,989 citations

Journal Article•DOI•

The Diploid Genome Sequence of an Individual Human

[...]

Samuel Levy¹, Granger G. Sutton¹, Pauline C. Ng¹, Lars Feuk², Aaron L. Halpern¹, Brian P. Walenz¹, Nelson Axelrod¹, Jiaqi Huang¹, Ewen F. Kirkness¹, Gennady Denisov¹, Yuan Lin¹, Jeffrey R. MacDonald², Andy Wing Chun Pang², Mary Shago², Timothy B. Stockwell¹, Alexia Tsiamouri¹, Vineet Bafna³, Vikas Bansal³, Saul A. Kravitz¹, Dana A. Busam¹, Karen Beeson¹, Tina C McIntosh¹, Karin A. Remington¹, Josep F. Abril⁴, John Gill¹, Jon Borman¹, Yu-Hui Rogers¹, Marvin Frazier¹, Stephen W. Scherer², Robert L. Strausberg¹, J. Craig Venter¹ - Show less +27 more•Institutions (4)

J. Craig Venter Institute¹, University of Toronto², University of California, San Diego³, University of Barcelona⁴

04 Sep 2007-PLOS Biology

TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.

...read moreread less

Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

...read moreread less

1,843 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174

Collapse