Home
/
Authors
/
Katayoon Darvishi

Author

Katayoon Darvishi

Other affiliations: Brigham and Women's Hospital, Jawaharlal Nehru University

Bio: Katayoon Darvishi is an academic researcher from Harvard University. The author has contributed to research in topics: Single-nucleotide polymorphism & Copy-number variation. The author has an hindex of 13, co-authored 14 publications receiving 4550 citations. Previous affiliations of Katayoon Darvishi include Brigham and Women's Hospital & Jawaharlal Nehru University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Integrating common and rare genetic variation in diverse human populations

[...]

David Altshuler¹, Richard A. Gibbs², Leena Peltonen³, Emmanouil T. Dermitzakis⁴, Stephen F. Schaffner¹, Fuli Yu², Penelope E. Bonnen², de Bakker Piw.⁵, Panagiotis Deloukas⁵, Stacey Gabriel¹, R. Gwilliam⁵, Sarah E. Hunt⁵, Michael Inouye⁵, Xiaoming Jia¹, Aarno Palotie, Melissa Parkin¹, Pamela Whittaker⁵, Kyle Chang², Alicia Hawes², Lora Lewis², Yanru Ren², D Wheeler², Donna M. Muzny², Chris P. Barnes⁵, Katayoon Darvishi⁶, Matthew E. Hurles⁵, Joshua M. Korn¹, K. Kristiansson⁵, Charles Lee⁶, S A McCarrol¹, James Nemesh¹, Alon Keinan⁷, Stephen B. Montgomery⁴, Samuela Pollack¹, Alkes L. Price⁶, Nicole Soranzo⁵, Claudia Gonzaga-Jauregui², Verneri Anttila, Wendy Brodeur¹, Mark J. Daly⁶, Stephen Leslie⁸, Gil McVean⁸, Loukas Moutsianas⁸, Huy Nguyen¹, Qingrun Zhang⁵, Ghori Mjr.⁵, Ralph McGinnis⁵, William M. McLaren⁵, Fumihiko Takeuchi⁵, Sharon R. Grossman⁶, Ilya Shlyakhter¹, Elizabeth Hostetter⁶, Pardis C. Sabeti⁶, Clement Adebamowo⁹, Morris W. Foster¹⁰, Deborah R. Gordon¹¹, Julio Licinio¹², M C Manca, Patricia A. Marshall¹³, Ichiro Matsuda¹⁴, D Ngare¹⁵, Vivian Ota Wang¹⁶, D Reddy¹⁷, Charles N. Rotimi¹⁶, Charmaine D.M. Royal¹⁸, Richard R. Sharp¹⁹, Changqing Zeng²⁰, Lisa D. Brooks¹⁶, Jean E. McEwen¹⁶ - Show less +65 more•Institutions (20)

Broad Institute¹, Baylor College of Medicine², University of Helsinki³, University of Geneva⁴, Wellcome Trust Sanger Institute⁵, Harvard University⁶, Cornell University⁷, University of Oxford⁸, University of Maryland, Baltimore⁹, University of Oklahoma¹⁰, University of California, San Francisco¹¹, Australian National University¹², Case Western Reserve University¹³, Health Sciences University of Hokkaido¹⁴, Moi University¹⁵, National Institutes of Health¹⁶, University of Houston–Clear Lake¹⁷, Duke University¹⁸, Cleveland Clinic¹⁹, Chinese Academy of Sciences²⁰

02 Sep 2010-Nature

TL;DR: An expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

...read moreread less

Abstract: Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

...read moreread less

2,863 citations

Journal Article•DOI•

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.

[...]

Joshua M. Korn, Finny G Kuruvilla, Steven A. McCarroll¹, Steven A. McCarroll², Alec Wysoker¹, James Nemesh¹, Simon Cawley³, Earl Hubbell³, Jim Veitch³, Patrick J Collins³, Katayoon Darvishi², Charles Lee², Marcia M. Nizzari¹, Stacey Gabriel¹, Shaun Purcell¹, Shaun Purcell², Mark J. Daly², Mark J. Daly¹, David Altshuler - Show less +15 more•Institutions (3)

Massachusetts Institute of Technology¹, Harvard University², Affymetrix³

01 Oct 2008-Nature Genetics

TL;DR: Birdsuite is presented, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes that more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies.

...read moreread less

Abstract: Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.

...read moreread less

835 citations

Journal Article•DOI•

Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants

[...]

Dalila Pinto¹, Katayoon Darvishi², Xinghua Shi², Diana Rajan³, Diane Rigler³, Tom Fitzgerald³, Anath C. Lionel¹, Bhooma Thiruvahindrapuram¹, Jeffrey R. MacDonald¹, Ryan E. Mills², Aparna Prasad¹, Kristin M. Noonan⁴, Kristin M. Noonan², Susan M. Gribble³, Elena Prigmore³, Patricia K. Donahoe⁴, Richard S. Smith², Ji Hyeon Park², Matthew E. Hurles³, Nigel P. Carter³, Charles Lee², Stephen W. Scherer⁵, Stephen W. Scherer¹, Lars Feuk⁶ - Show less +20 more•Institutions (6)

The Centre for Applied Genomics¹, Brigham and Women's Hospital², Wellcome Trust Sanger Institute³, Harvard University⁴, University of Toronto⁵, Uppsala University⁶

01 Jun 2011-Nature Biotechnology

TL;DR: The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics.

...read moreread less

Abstract: We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.

...read moreread less

418 citations

Journal Article•DOI•

Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing

[...]

Hansoo Park¹, Jong Il Kim¹, Jong Il Kim², Young Seok Ju¹, Omer Gokcumen³, Ryan E. Mills³, Sheehyun Kim¹, Seung-Bok Lee¹, Dongwhan Suh¹, Dongwan Hong¹, Hyunseok Peter Kang¹, Yun Joo Yoo¹, Jong Yeon Shin¹, Hyun-Jin Kim¹, Maryam Yavartanoo¹, Young Wha Chang¹, Jung Sook Ha³, Wilson W. S. Chong³, Ga-Ram Hwang³, Katayoon Darvishi³, Hyeran Kim, Song Ju Yang, Kap-Seok Yang, Hyungtae Kim, Matthew E. Hurles⁴, Stephen W. Scherer⁵, Nigel P. Carter⁴, Chris Tyler-Smith⁴, Charles Lee³, Jeong-Sun Seo¹ - Show less +26 more•Institutions (5)

Seoul National University¹, New Generation University College², Harvard University³, Wellcome Trust Sanger Institute⁴, University of Toronto⁵

01 May 2010-Nature Genetics

TL;DR: A new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals and discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs.

...read moreread less

Abstract: Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3× coverage) and two Asian genomes (AK1, with 27.8× coverage and AK2, with 32.0× coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.

...read moreread less

224 citations

Journal Article•DOI•

Mitochondrial DNA G10398A polymorphism imparts maternal Haplogroup N a risk for breast and esophageal cancer

[...]

Katayoon Darvishi¹, Swarkar Sharma¹, Audesh Bhat¹, Ekta Rai¹, Rameshwar N. K. Bamezai¹ - Show less +1 more•Institutions (1)

Jawaharlal Nehru University¹

08 May 2007-Cancer Letters

TL;DR: This study makes an attempt to validate the exclusive presence of mtG10398A (Ala-->Thr) polymorphism in a haplotype constituting mtDNA haplogroup N and its sublineages, imparting this group a higher risk for breast cancer, based on the re-analyses of approximately 1000 complete human mtDNA sequences worldwide and collated information on 2334 individuals belonging to 18 regions in India.

...read moreread less

161 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

A Map of Human Genome Variation From Population-Scale Sequencing

[...]

Gonçalo R. Abecasis¹, David Altshuler², David Altshuler³, Adam Auton⁴, Lisa D Brooks⁵, Richard Durbin⁶, Richard A. Gibbs⁷, Matthew E. Hurles⁶, Gil McVean⁴ - Show less +5 more•Institutions (7)

University of Michigan¹, Harvard University², Broad Institute³, University of Oxford⁴, Johns Hopkins University⁵, Wellcome Trust Sanger Institute⁶, Baylor College of Medicine⁷

28 Oct 2010-Nature

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.

...read moreread less

Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

...read moreread less

7,538 citations

Journal Article•DOI•

From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline

[...]

Géraldine A. Van der Auwera¹, Mauricio O. Carneiro¹, Christopher Hartl¹, Ryan Poplin¹, Guillermo del Angel¹, Ami Levy-Moonshine¹, Tadeusz Jordan¹, Khalid Shakir¹, David Roazen¹, Joel Thibault¹, Eric Banks¹, Kiran V. Garimella², David Altshuler¹, Stacey Gabriel¹, Mark A. DePristo¹ - Show less +11 more•Institutions (2)

Broad Institute¹, Wellcome Trust Centre for Human Genetics²

15 Oct 2013-Current protocols in human genetics

TL;DR: This unit describes how to use BWA and the Genome Analysis Toolkit to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses.

...read moreread less

Abstract: This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

...read moreread less

5,150 citations

Journal Article•DOI•

Comprehensive molecular characterization of gastric adenocarcinoma

[...]

Adam J. Bass¹, Vesteinn Thorsson², Ilya Shmulevich², Sheila Reynolds² +254 more•Institutions (32)

11 Sep 2014-Nature

TL;DR: A comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project is described and a molecular classification dividing gastric cancer into four subtypes is proposed.

...read moreread less

Abstract: Gastric cancer was the world’s third leading cause of cancer mortality in 2012, responsible for 723,000 deaths1. The vast majority of gastric cancers are adenocarcinomas, which can be further subdivided into intestinal and diffuse types according to the Lauren classification2. An alternative system, proposed by the World Health Organization, divides gastric cancer into papillary, tubular, mucinous (colloid) and poorly cohesive carcinomas3. These classification systems have little clinical utility, making the development of robust classifiers that can guide patient therapy an urgent priority. The majority of gastric cancers are associated with infectious agents, including the bacterium Helicobacter pylori4 and Epstein–Barr virus (EBV). The distribution of histological subtypes of gastric cancer and the frequencies of H. pylori and EBV associated gastric cancer vary across the globe5. A small minority of gastric cancer cases are associated with germline mutation in E-cadherin (CDH1)6 or mismatch repair genes7 (Lynch syndrome), whereas sporadic mismatch repair-deficient gastric cancers have epigenetic silencing of MLH1 in the context of a CpG island methylator phenotype (CIMP)8. Molecular profiling of gastric cancer has been performed using gene expression or DNA sequencing9–12, but has not led to a clear biologic classification scheme. The goals of this study by The Cancer Genome Atlas (TCGA) were to develop a robust molecular classification of gastric cancer and to identify dysregulated pathways and candidate drivers of distinct classes of gastric cancer.

...read moreread less

4,583 citations

Journal Article•DOI•

Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

[...]

Shaun Purcell¹, Shaun Purcell², Naomi R. Wray³, Jennifer Stone², Jennifer Stone¹, Peter M. Visscher, Michael Conlon O'Donovan⁴, Patrick F. Sullivan⁵, Pamela Sklar¹, Pamela Sklar², Douglas M. Ruderfer, Andrew McQuillin, Derek W. Morris⁶, Colm O'Dushlaine⁶, Aiden Corvin⁶, Peter Holmans⁴, Stuart MacGregor³, Hugh Gurling, Douglas Blackwood⁷, Nicholas John Craddock⁵, Michael Gill⁶, Christina M. Hultman⁸, Christina M. Hultman⁹, George Kirov⁴, Paul Lichtenstein⁸, Walter J. Muir⁷, Michael John Owen⁴, Carlos N. Pato¹⁰, Edward M. Scolnick¹, Edward M. Scolnick², David St Clair, Nigel Williams⁴, Lyudmila Georgieva⁴, Ivan Nikolov⁴, Nadine Norton⁴, Hywel Williams⁴, Draga Toncheva, Vihra Milanova, Emma Flordal Thelander⁸, Patrick Sullivan¹¹, Elaine Kenny⁶, Emma M. Quinn⁶, Khalid Choudhury¹², Susmita Datta¹², Jonathan Pimm¹², Srinivasa Thirumalai¹³, Vinay Puri¹², Robert Krasucki¹², Jacob Lawrence¹², Digby Quested¹⁴, Nicholas Bass¹², Caroline Crombie¹⁵, Gillian Fraser¹⁵, Soh Leh Kuan, Nicholas Walker, Kevin A. McGhee⁷, Ben S. Pickard¹⁶, P. Malloy⁷, Alan W Maclean⁷, Margaret Van Beck⁷, Michele T. Pato¹⁰, Helena Medeiros¹⁰, Frank A. Middleton¹⁷, Célia Barreto Carvalho¹⁰, Christopher P. Morley¹⁷, Ayman H. Fanous, David V. Conti¹⁰, James A. Knowles¹⁰, Carlos Ferreira, António Macedo¹⁸, M. Helena Azevedo¹⁸, Andrew Kirby², Andrew Kirby¹, Manuel A. R. Ferreira¹, Manuel A. R. Ferreira², Mark J. Daly¹, Mark J. Daly², Kimberly Chambert¹, Finny G Kuruvilla¹, Stacey Gabriel¹, Kristin G. Ardlie¹, Jennifer L. Moran¹ - Show less +78 more•Institutions (18)

Broad Institute¹, Harvard University², QIMR Berghofer Medical Research Institute³, Cardiff University⁴, North Carolina State University⁵, Trinity College, Dublin⁶, University of Edinburgh⁷, Karolinska Institutet⁸, Uppsala University⁹, University of Southern California¹⁰, University of North Carolina at Chapel Hill¹¹, University College London¹², National Health Service¹³, University of Oxford¹⁴, University of Aberdeen¹⁵, Strathclyde Institute of Pharmacy and Biomedical Sciences¹⁶, State University of New York Upstate Medical University¹⁷, University of Coimbra¹⁸

06 Aug 2009-Nature

TL;DR: The extent to which common genetic variation underlies the risk of schizophrenia is shown, using two analytic approaches, and the major histocompatibility complex is implicate, which is shown to involve thousands of common alleles of very small effect.

...read moreread less

Abstract: Schizophrenia is a severe mental disorder with a lifetime risk of about 1%, characterized by hallucinations, delusions and cognitive deficits, with heritability estimated at up to 80%(1,2). We performed a genome-wide association study of 3,322 European individuals with schizophrenia and 3,587 controls. Here we show, using two analytic approaches, the extent to which common genetic variation underlies the risk of schizophrenia. First, we implicate the major histocompatibility complex. Second, we provide molecular genetic evidence for a substantial polygenic component to the risk of schizophrenia involving thousands of common alleles of very small effect. We show that this component also contributes to the risk of bipolar disorder, but not to several non-psychiatric diseases.

...read moreread less

4,573 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse