Home
/
Authors
/
Emily M LeProust

Author

Emily M LeProust

Bio: Emily M LeProust is an academic researcher from Agilent Technologies. The author has contributed to research in topics: Genome & Oligonucleotide. The author has an hindex of 30, co-authored 44 publications receiving 10431 citations. Previous affiliations of Emily M LeProust include University of Iowa.

Topics: Genome, Oligonucleotide, Population, Human genome, Gene ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing

[...]

Andreas Gnirke¹, Alexandre Melnikov¹, Jared Maguire¹, Peter Rogov¹, Emily M LeProust², William Brockman¹, William Brockman³, Timothy Fennell¹, Georgia Giannoukos¹, Sheila Fisher¹, Carsten Russ¹, Stacey Gabriel¹, David B. Jaffe¹, Eric S. Lander⁴, Eric S. Lander¹, Eric S. Lander⁵, Chad Nusbaum¹ - Show less +13 more•Institutions (5)

Broad Institute¹, Agilent Technologies², Google³, Harvard University⁴, Massachusetts Institute of Technology⁵

01 Feb 2009-Nature Biotechnology

TL;DR: A capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments that uniformity was such that ∼60% of target bases in the exonic 'catch', and ∼80% in the regional catch, had at least half the mean coverage.

...read moreread less

Abstract: Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.

...read moreread less

1,444 citations

Journal Article•DOI•

The DNA-encoded nucleosome organization of a eukaryotic genome

[...]

Noam Kaplan¹, Irene K. Moore², Yvonne N. Fondufe-Mittendorf², Andrea J. Gossett³, Desiree Tillo⁴, Yair Field, Emily M LeProust⁵, Timothy P. Hughes⁴, Jason D. Lieb³, Jonathan Widom², Eran Segal¹ - Show less +7 more•Institutions (5)

Weizmann Institute of Science¹, Northwestern University², University of North Carolina at Chapel Hill³, University of Toronto⁴, Agilent Technologies⁵

19 Mar 2009-Nature

TL;DR: The results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization ofucleosomes in vivo.

...read moreread less

Abstract: The nucleosomes are the basic repeating units of eukaryotic chromatin, and nucleosome organization is critically important for gene regulation. Kaplan et al. tested the importance of the intrinsic DNA sequence preferences of nucleosomes by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map is remarkably similar to in vivo nucleosome maps, indicating that the organization of nucleosomes in vivo is largely governed by the underlying genomic DNA sequence. This study tests the importance of the intrinsic DNA sequence preferences of nucleosomes by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map is similar to in vivo nucleosome maps, indicating that the organization of nucleosomes in vivo is largely governed by the underlying genomic DNA sequence. Nucleosome organization is critical for gene regulation1. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers2, competition with site-specific DNA-binding proteins3, and the DNA sequence preferences of the nucleosomes themselves4,5,6,7,8. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo7,9,10,11, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for ∼40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

...read moreread less

1,205 citations

Journal Article•DOI•

Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells.

[...]

Madeleine Ball¹, Jin Billy Li¹, Yuan Gao², Je-Hyuk Lee¹, Emily M LeProust³, In-Hyun Park¹, Bin Xie², George Q. Daley¹, George M. Church¹ - Show less +5 more•Institutions (3)

Harvard University¹, Virginia Commonwealth University², Agilent Technologies³

29 Mar 2009-Nature Biotechnology

TL;DR: Two complementary approaches that use next-generation sequencing technology to detect cytosine methylation are introduced and it is confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome.

...read moreread less

Abstract: Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed approximately 10,000 bisulfite padlock probes to profile approximately 7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for approximately 1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.

...read moreread less

973 citations

Journal Article•DOI•

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA

[...]

Nick Goldman¹, Paul Bertone¹, Siyuan Chen², Christophe Dessimoz¹, Emily M LeProust², Botond Sipos¹, Ewan Birney¹ - Show less +3 more•Institutions (2)

European Bioinformatics Institute¹, Agilent Technologies²

07 Feb 2013-Nature

TL;DR: Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving.

...read moreread less

Abstract: Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 10(6) bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.

...read moreread less

900 citations

Journal Article•DOI•

Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C

[...]

Borbala Mifsud¹, Filipe Tavares-Cadete², Alice N Young³, Robert Sugar², Stefan Schoenfelder³, Lauren Ferreira³, Steven W. Wingett³, Simon Andrews³, William Grey⁴, Philip Ewels³, Bram Herman⁵, Scott Happe⁵, Andy Higgs⁵, Emily M LeProust⁵, George A. Follows⁶, Peter Fraser³, Nicholas M. Luscombe⁷, Cameron S. Osborne⁴ - Show less +14 more•Institutions (7)

University College London¹, Francis Crick Institute², Babraham Institute³, King's College London⁴, Agilent Technologies⁵, University of Cambridge⁶, Okinawa Institute of Science and Technology⁷

01 Jun 2015-Nature Genetics

TL;DR: In this article, the authors use Capture Hi-C (CHi-C) to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types and identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci.

...read moreread less

Abstract: Transcriptional control in large genomes often requires looping interactions between distal DNA elements, such as enhancers and target promoters. Current chromosome conformation capture techniques do not offer sufficiently high resolution to interrogate these regulatory interactions on a genomic scale. Here we use Capture Hi-C (CHi-C), an adapted genome conformation assay, to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types. We identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci. Transcriptionally active genes contact enhancer-like elements, whereas transcriptionally inactive genes interact with previously uncharacterized elements marked by repressive features that may act as long-range silencers. Finally, we show that interacting loci are enriched for disease-associated SNPs, suggesting how distal mutations may disrupt the regulation of relevant genes. This study provides new insights and accessible tools to dissect the regulatory interactions that underlie normal and aberrant gene regulation.

...read moreread less

869 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

A framework for variation discovery and genotyping using next-generation DNA sequencing data

[...]

Mark A. DePristo¹, Eric Banks¹, Ryan Poplin¹, Kiran V. Garimella¹, Jared Maguire¹, Christopher Hartl¹, Anthony A. Philippakis¹, Anthony A. Philippakis², Anthony A. Philippakis³, Guillermo del Angel¹, Manuel A. Rivas¹, Manuel A. Rivas³, Matt Hanna¹, Aaron McKenna¹, Timothy Fennell¹, Andrew Kernytsky¹, Andrey Sivachenko¹, Kristian Cibulskis¹, Stacey Gabriel¹, David Altshuler³, David Altshuler¹, Mark J. Daly³, Mark J. Daly¹ - Show less +19 more•Institutions (3)

Broad Institute¹, Brigham and Women's Hospital², Harvard University³

01 May 2011-Nature Genetics

TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.

...read moreread less

Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

...read moreread less

10,056 citations

Journal Article•DOI•

RNA-Guided Human Genome Engineering via Cas9

[...]

Prashant Mali¹, Luhan Yang¹, Kevin M. Esvelt², John Aach¹, Marc Güell¹, James E. DiCarlo³, Julie E. Norville¹, George M. Church², George M. Church¹ - Show less +5 more•Institutions (3)

Harvard University¹, Wyss Institute for Biologically Inspired Engineering², Boston University³

15 Feb 2013-Science

TL;DR: The type II bacterial CRISPR system is engineer to function with custom guide RNA (gRNA) in human cells to establish an RNA-guided editing tool for facile, robust, and multiplexable human genome engineering.

...read moreread less

Abstract: Bacteria and archaea have evolved adaptive immune defenses, termed clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems, that use short RNA to direct degradation of foreign nucleic acids. Here, we engineer the type II bacterial CRISPR system to function with custom guide RNA (gRNA) in human cells. For the endogenous AAVS1 locus, we obtained targeting rates of 10 to 25% in 293T cells, 13 to 8% in K562 cells, and 2 to 4% in induced pluripotent stem cells. We show that this process relies on CRISPR components; is sequence-specific; and, upon simultaneous introduction of multiple gRNAs, can effect multiplex editing of target loci. We also compute a genome-wide resource of ~190 K unique gRNAs targeting ~40.5% of human exons. Our results establish an RNA-guided editing tool for facile, robust, and multiplexable human genome engineering.

...read moreread less

8,197 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse