Home
/
Authors
/
Irina Khrebtukova

Author

Irina Khrebtukova

Bio: Irina Khrebtukova is an academic researcher from Illumina. The author has contributed to research in topics: Gene expression profiling & Massively parallel signature sequencing. The author has an hindex of 21, co-authored 29 publications receiving 12260 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Alternative Isoform Regulation in Human Tissue Transcriptomes

[...]

Eric T. Wang¹, Rickard Sandberg², Rickard Sandberg¹, Shujun Luo³, Irina Khrebtukova³, Lu Zhang³, Christine Mayr¹, Stephen F. Kingsmore⁴, Gary P. Schroth³, Christopher B. Burge¹ - Show less +6 more•Institutions (4)

Massachusetts Institute of Technology¹, Karolinska Institutet², Illumina³, National Center for Genome Resources⁴

27 Nov 2008-Nature

TL;DR: An in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments yielding a digital inventory of gene and mRNA isoform expression suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.

...read moreread less

Abstract: Through alternative processing of pre-messenger RNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analyses in which sequence reads are mapped to exon-exon junctions indicated that 92-94% of human genes undergo alternative splicing, 86% with a minor isoform frequency of 15% or more. Differences in isoform-specific read densities indicated that most alternative splicing and alternative cleavage and polyadenylation events vary between tissues, whereas variation between individuals was approximately twofold to threefold less common. Extreme or 'switch-like' regulation of splicing between tissues was associated with increased sequence conservation in regulatory regions and with generation of full-length open reading frames. Patterns of alternative splicing and alternative cleavage and polyadenylation were strongly correlated across tissues, suggesting coordinated regulation of these processes, and sequence conservation of a subset of known regulatory motifs in both alternative introns and 3' untranslated regions suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.

...read moreread less

4,711 citations

Journal Article•DOI•

Accurate whole human genome sequencing using reversible terminator chemistry

[...]

David R. Bentley¹, Shankar Balasubramanian², Harold Swerdlow¹, Harold Swerdlow³ +198 more•Institutions (4)

06 Nov 2008-Nature

TL;DR: An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

...read moreread less

Abstract: DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

...read moreread less

3,802 citations

Journal Article•DOI•

Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells

[...]

Daniel Ramsköld¹, Shujun Luo², Yu-Chieh Wang³, Robin Li², Qiaolin Deng¹, Omid R. Faridani¹, Gregory A. Daniels, Irina Khrebtukova², Jeanne F. Loring³, Louise C. Laurent⁴, Gary P. Schroth², Rickard Sandberg⁵, Rickard Sandberg¹ - Show less +9 more•Institutions (5)

Ludwig Institute for Cancer Research¹, Illumina², Scripps Research Institute³, University of California, San Diego⁴, Karolinska Institutet⁵

01 Aug 2012-Nature Biotechnology

TL;DR: Applying Smart-Seq to circulating tumor cells from melanomas, it is found that although gene expression estimates from single cells have increased noise, hundreds of differentially expressed genes could be identified using few cells per cell type.

...read moreread less

Abstract: RNA-Seq of single cells has been limited by biases in transcript coverage and unknown technical variability. Ramskold et al. describe a protocol to reproducibly recover full-length transcripts and use it to quantitatively analyze splice isoforms in single cells.

...read moreread less

1,412 citations

Journal Article•DOI•

Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis.

[...]

Sergio E. Baranzini¹, Joann Mudge², Jennifer C. van Velkinburgh², Pouya Khankhanian¹, Irina Khrebtukova³, Neil A. Miller², Lu Zhang³, Andrew Farmer², Callum J. Bell², Ryan W. Kim², Gregory D. May², Jimmy E. Woodward², Stacy J. Caillier¹, Joseph P. McElroy¹, Refujia Gomez¹, Marcelo J. Pando⁴, Leonda E. Clendenen², Elena E. Ganusova², Faye D. Schilkey², Thiruvarangan Ramaraj², Omar Khan⁵, James Huntley³, Shujun Luo³, Pui-Yan Kwok¹, Thomas D. Wu⁶, Gary P. Schroth³, Jorge R. Oksenberg¹, Stephen L. Hauser¹, Stephen F. Kingsmore² - Show less +25 more•Institutions (6)

University of California, San Francisco¹, National Center for Genome Resources², Illumina³, Stanford University⁴, Wayne State University⁵, Genentech⁶

29 Apr 2010-Nature

TL;DR: This first systematic effort to estimate sequence variation among monozygotic co-twins did not find evidence for genetic, epigenetic or transcriptome differences that explained disease discordance, and are the first, to the authors' knowledge, female, twin and autoimmune disease individual genome sequences reported.

...read moreread less

Abstract: Monozygotic or 'identical' twins have been widely studied to dissect the relative contributions of genetics and environment in human diseases. In multiple sclerosis (MS), an autoimmune demyelinating disease and common cause of neurodegeneration and disability in young adults, disease discordance in monozygotic twins has been interpreted to indicate environmental importance in its pathogenesis. However, genetic and epigenetic differences between monozygotic twins have been described, challenging the accepted experimental model in disambiguating the effects of nature and nurture. Here we report the genome sequences of one MS-discordant monozygotic twin pair, and messenger RNA transcriptome and epigenome sequences of CD4(+) lymphocytes from three MS-discordant, monozygotic twin pairs. No reproducible differences were detected between co-twins among approximately 3.6 million single nucleotide polymorphisms (SNPs) or approximately 0.2 million insertion-deletion polymorphisms. Nor were any reproducible differences observed between siblings of the three twin pairs in HLA haplotypes, confirmed MS-susceptibility SNPs, copy number variations, mRNA and genomic SNP and insertion-deletion genotypes, or the expression of approximately 19,000 genes in CD4(+) T cells. Only 2 to 176 differences in the methylation of approximately 2 million CpG dinucleotides were detected between siblings of the three twin pairs, in contrast to approximately 800 methylation differences between T cells of unrelated individuals and several thousand differences between tissues or between normal and cancerous tissues. In the first systematic effort to estimate sequence variation among monozygotic co-twins, we did not find evidence for genetic, epigenetic or transcriptome differences that explained disease discordance. These are the first, to our knowledge, female, twin and autoimmune disease individual genome sequences reported.

...read moreread less

463 citations

Journal Article•DOI•

Chimeric transcript discovery by paired-end transcriptome sequencing

[...]

Christopher G. Maher¹, Nallasivam Palanisamy¹, J.C. Brenner¹, Xuhong Cao¹, Shanker Kalyana-Sundaram¹, Shujun Luo², Irina Khrebtukova², Terrence R. Barrette¹, Catherine S. Grasso¹, Jindan Yu¹, Robert J. Lonigro¹, Gary P. Schroth², Chandan Kumar-Sinha¹, Arul M. Chinnaiyan¹ - Show less +10 more•Institutions (2)

University of Michigan¹, Illumina²

28 Jul 2009-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy is established and is successfully used to detect previously undescribed ETS gene fusion in prostate tumors.

...read moreread less

Abstract: Recurrent gene fusions are a prevalent class of mutations arising from the juxtaposition of 2 distinct regions, which can generate novel functional transcripts that could serve as valuable therapeutic targets in cancer. Therefore, we aim to establish a sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy. Not only did a paired-end approach provide a greater dynamic range in comparison with single read based approaches, but it clearly distinguished the high-level “driving” gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level “passenger” gene fusions. Also, the comprehensiveness of a paired-end approach enabled the discovery of 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded previous approaches. Using the paired-end transcriptome sequencing approach, we observed read-through mRNA chimeras, tissue-type restricted chimeras, converging transcripts, diverging transcripts, and overlapping mRNA transcripts. Last, we successfully used paired-end transcriptome sequencing to detect previously undescribed ETS gene fusions in prostate tumors. Together, this study establishes a highly specific and sensitive approach for accurately and comprehensively cataloguing chimeras within a sample using paired-end transcriptome sequencing.

...read moreread less

353 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Max Planck Society¹, Harvard University²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh¹, Kerstin Lindblad-Toh⁴, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse