Home
/
Authors
/
Simon G. Gregory

Author

Simon G. Gregory

Other affiliations: University of Helsinki, Wellcome Trust, Imperial College London ...read more

Bio: Simon G. Gregory is an academic researcher from Duke University. The author has contributed to research in topics: Single-nucleotide polymorphism & Medicine. The author has an hindex of 54, co-authored 198 publications receiving 47130 citations. Previous affiliations of Simon G. Gregory include University of Helsinki & Wellcome Trust.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1992

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations

Journal Article•DOI•

Identification of the breast cancer susceptibility gene BRCA2

[...]

Richard Wooster, Graham R. Bignell, Johnathan M. Lancaster¹, Sally Swift, Sheila Seal, Jonathon Mangion, N. Collins, Simon G. Gregory², Curtis Gumbs³, Gos Micklem² - Show less +6 more•Institutions (3)

National Institutes of Health¹, Wellcome Trust Sanger Institute², Duke University³

21 Dec 1995-Nature

TL;DR: The identification of a gene in which six different germline mutations in breast cancer families that are likely to be due to BRCA2 are detected, and results indicate that this is the BRC a2 gene.

...read moreread less

Abstract: IN Western Europe and the United States approximately 1 in 12 women develop breast cancer. A small proportion of breast cancer cases, in particular those arising at a young age, are attributable to a highly penetrant, autosomal dominant predisposition to the disease. The breast cancer susceptibility gene, BRCA2, was recently localized to chromosome 13q12-q13. Here we report the identification of a gene in which we have detected six different germline mutations in breast cancer families that are likely to be due to BRCA2. Each mutation causes serious disruption to the open reading frame of the transcriptional unit. The results indicate that this is the BRCA2 gene.

...read moreread less

3,333 citations

Journal Article•DOI•

Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease

[...]

H Ueda¹, Howson Jmm.¹, Laura Esposito¹, Joanne M. Heward², Hywel Snook¹, Giselle Chamberlain¹, Dan Rainbow¹, Hunter Kmd.¹, Anne Smith¹, G Di Genova¹, G Di Genova³, Mathias H. Herr¹, Mathias H. Herr⁴, Ingrid Dahlman⁵, Ingrid Dahlman¹, F Payne⁶, Deborah J. Smyth¹, Christopher E. Lowe¹, Twells Rcj.¹, Sarah Howlett¹, Barry C. Healy¹, Sarah Nutland¹, Helen E. Rance¹, Vincent H. Everett¹, Luc J. Smink¹, A C Lam¹, Heather J. Cordell¹, Neil Walker¹, C Bordin¹, John S. Hulme¹, Costantino Motzo⁶, Francesco Cucca⁶, J F Hess⁷, Michael L. Metzker⁸, Michael L. Metzker⁷, Jane Rogers⁹, Simon G. Gregory¹⁰, Amit Allahabadia¹¹, Amit Allahabadia², R Nithiyananthan², Eva Tuomilehto-Wolf¹⁰, Jaakko Tuomilehto¹⁰, Polly J. Bingley¹², Kathleen M Gillespie¹², Dag E. Undlien¹³, Kjersti S. Rønningen¹⁴, Cristian Guja, Constantin Ionescu-Tirgoviste, David A. Savage¹⁵, Alexander P. Maxwell, Dennis Carson¹⁵, Christopher Patterson¹⁵, Jayne A. Franklyn², David Clayton¹, Laurence B. Peterson¹⁶, Linda S. Wicker¹, John A. Todd¹, Gough Scl.² - Show less +54 more•Institutions (16)

University of Cambridge¹, University of Birmingham², Southampton General Hospital³, Humboldt University of Berlin⁴, Karolinska Institutet⁵, University of Cagliari⁶, United States Military Academy⁷, Baylor College of Medicine⁸, Wellcome Trust Sanger Institute⁹, University of Helsinki¹⁰, Northern General Hospital¹¹, University of Bristol¹², University of Oslo¹³, Norwegian Institute of Public Health¹⁴, Queen's University Belfast¹⁵, Merck & Co.¹⁶

29 May 2003-Nature

TL;DR: In this article, the authors identify polymorphisms of the cytotoxic T lymphocyte antigen 4 gene (CTLA4) as candidates for primary determinants of risk of the common autoimmune disorders Graves' disease, autoimmune hypothyroidism and type 1 diabetes.

...read moreread less

Abstract: Genes and mechanisms involved in common complex diseases, such as the autoimmune disorders that affect approximately 5% of the population, remain obscure. Here we identify polymorphisms of the cytotoxic T lymphocyte antigen 4 gene (CTLA4)—which encodes a vital negative regulatory molecule of the immune system—as candidates for primary determinants of risk of the common autoimmune disorders Graves' disease, autoimmune hypothyroidism and type 1 diabetes. In humans, disease susceptibility was mapped to a non-coding 6.1?kb 3′ region of CTLA4, the common allelic variation of which was correlated with lower messenger RNA levels of the soluble alternative splice form of CTLA4. In the mouse model of type 1 diabetes, susceptibility was also associated with variation in CTLA-4 gene splicing with reduced production of a splice form encoding a molecule lacking the CD80/CD86 ligand-binding domain. Genetic mapping of variants conferring a small disease risk can identify pathways in complex disorders, as exemplified by our discovery of inherited, quantitative alterations of CTLA4 contributing to autoimmune tissue destruction.

...read moreread less

2,173 citations

Journal Article•DOI•

Risk alleles for multiple sclerosis identified by a genomewide study.

[...]

David A. Hafler¹, Alastair Compston², Stephen Sawcer², Mark J. Daly¹, Philip L. De Jager¹, Stacey Gabriel¹, Daniel B. Mirel¹, Adrian J. Ivinson¹, Margaret A. Pericak-Vance, Simon G. Gregory³, John D. Rioux¹, John D. Rioux⁴, Jacob L. McCauley⁵, Lisa F. Barcellos⁶, Lisa F. Barcellos⁷, Bruce A.C. Cree⁷, Stephen L. Hauser⁷ - Show less +13 more•Institutions (7)

Harvard University¹, University of Cambridge², Duke University³, Université de Montréal⁴, Vanderbilt University⁵, University of California, Berkeley⁶, University of California, San Francisco⁷

30 Aug 2007-The New England Journal of Medicine

TL;DR: Alleles of IL2RA and IL7RA and those in the HLA locus are identified as heritable risk factors for multiple sclerosis.

...read moreread less

Abstract: �Background Multiple sclerosis has a clinically significant heritable component. We conducted a genomewide association study to identify alleles associated with the risk of multiple sclerosis. Methods We used DNA microarray technology to identify common DNA sequence variants in 931 family trios (consisting of an affected child and both parents) and tested them for association. For replication, we genotyped another 609 family trios, 2322 case subjects, and 789 control subjects and used genotyping data from two external control data sets. A joint analysis of data from 12,360 subjects was performed to estimate the overall significance and effect size of associations between alleles and the risk of multiple sclerosis. Results A transmission disequilibrium test of 334,923 single-nucleotide polymorphisms (SNPs) in 931 family trios revealed 49 SNPs having an association with multiple sclerosis (P<1×10 −4 ); of these SNPs, 38 were selected for the second-stage analysis. A comparison between the 931 case subjects from the family trios and 2431 control subjects identified an additional nonoverlapping 32 SNPs (P<0.001). An additional 40 SNPs with less stringent P values (<0.01) were also selected, for a total of 110 SNPs for the second-stage analysis. Of these SNPs, two within the interleukin-2 receptor α gene (IL2RA) were strongly associated with multiple sclerosis (P = 2.96×10 −8 ), as were a nonsynonymous SNP in the interleukin-7 receptor α gene (IL7RA) (P = 2.94×10 −7 ) and multiple SNPs in the HLA-DRA locus (P = 8.94×10 −81 ).

...read moreread less

1,635 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

...read moreread less

22,269 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

BLAST+: architecture and applications.

[...]

Christiam Camacho¹, George Coulouris¹, Vahram Avagyan¹, Ning Ma¹, Jason S. Papadopoulos¹, Kevin Bealer¹, Thomas L. Madden¹ - Show less +3 more•Institutions (1)

National Institutes of Health¹

15 Dec 2009-BMC Bioinformatics

TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.

...read moreread less

Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

...read moreread less

13,223 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse