Home
/
Authors
/
Elia Stupka

Author

Elia Stupka

Other affiliations: Queen Mary University of London, AREA Science Park, Boehringer Ingelheim ...read more

Bio: Elia Stupka is an academic researcher from Vita-Salute San Raffaele University. The author has contributed to research in topics: Gene & Exome sequencing. The author has an hindex of 36, co-authored 83 publications receiving 35707 citations. Previous affiliations of Elia Stupka include Queen Mary University of London & AREA Science Park.

Topics: Gene, Exome sequencing, Genome, Gene expression, Regulation of gene expression ...read more

Papers published on a yearly basis

2022
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The Transcriptional Landscape of the Mammalian Genome

[...]

Piero Carninci, Takeya Kasukawa¹, Shintaro Katayama, Julian Gough +194 more•Institutions (36)

02 Sep 2005-Science

TL;DR: Detailed polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

...read moreread less

Abstract: This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

...read moreread less

3,412 citations

Journal Article•DOI•

The Bioperl Toolkit: Perl Modules for the Life Sciences

[...]

Jason E. Stajich¹, David Block², David Block³, Kris Boulez, Steven E. Brenner⁴, Stephen A. Chervitz⁵, Chris Dagdigian, Georg Fuellen⁶, James G. R. Gilbert⁷, Ian F Korf⁸, Hilmar Lapp², Heikki Lehväslaiho, Chad Matsalla, Christopher J. Mungall⁴, Brian I. Osborne, Matthew Pocock⁷, Peter Schattner⁹, Martin Senger, Lincoln Stein¹⁰, Elia Stupka¹¹, Mark Wilkinson³, Ewan Birney - Show less +18 more•Institutions (11)

Duke University¹, Novartis², National Research Council³, University of California, Berkeley⁴, Affymetrix⁵, University of Münster⁶, Wellcome Trust Sanger Institute⁷, Washington University in St. Louis⁸, University of California, Santa Cruz⁹, Cold Spring Harbor Laboratory¹⁰, Agency for Science, Technology and Research¹¹

01 Oct 2002-Genome Research

TL;DR: The overall architecture of the Bioperl toolkit is described, the problem domains that it addresses, and specific examples of how the toolkit can be used to solve common life-sciences problems are given.

...read moreread less

Abstract: The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

...read moreread less

1,694 citations

Journal Article•DOI•

The Ensembl genome database project

[...]

Tim Hubbard¹, Daniel Barker, Ewan Birney, Graham Cameron, Yuan Chen, Louise Clark¹, Tony Cox¹, James Cuff¹, Val Curwen¹, Thomas A. Down¹, Richard Durbin¹, Eduardo Eyras¹, James G. R. Gilbert¹, Martin Hammond, Lukasz Huminiecki, Arek Kasprzyk, Heikki Lehväslaiho, Philip Lijnzaad, Craig Melsopp, Emmanuel Mongin, Roger Pettett¹, Matthew Pocock¹, Simon C. Potter¹, Alistair G. Rust, Esther Schmidt, Stephen M. J. Searle¹, Guy Slater, James Smith¹, William Spooner¹, Arne Stabenau, Jim Stalker¹, Elia Stupka², Abel Ureta-Vidal, Imre Vastrik, Michele Clamp¹ - Show less +31 more•Institutions (2)

Wellcome Trust Sanger Institute¹, Vita-Salute San Raffaele University²

01 Jan 2002-Nucleic Acids Research

TL;DR: The Ensembl database project provides a bioinformatics framework to organise biology around the sequences of large genomes and is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources.

...read moreread less

Abstract: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

...read moreread less

1,540 citations

Journal Article•DOI•

Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes

[...]

Samuel Aparicio¹, Jarrod Chapman¹, Elia Stupka¹, Nik Putnam¹, Jer Ming Chia¹, Paramvir S. Dehal¹, Alan Christoffels¹, Sam Rash¹, Shawn Hoon¹, Arian F.A. Smit¹, Maarten D. Sollewijn Gelpke¹, Jared C. Roach¹, Tania Oh¹, Isaac Ho¹, Marie Wong¹, Chris Detter¹, Frans Verhoef¹, Paul Predki¹, Alice Tay¹, Susan Lucas¹, Paul G. Richardson¹, Sarah Smith¹, Melody S. Clark¹, Yvonne J. K. Edwards¹, Norman A. Doggett¹, Andrey Zharkikh¹, Sean V. Tavtigian¹, Dmitry Pruss¹, Mary Barnstead¹, Cheryl Evans¹, Holly Baden¹, Justin Powell¹, Gustavo Glusman¹, Lee Rowen¹, Leroy Hood¹, Y. H. Tan¹, Greg Elgar¹, Trevor Hawkins¹, Byrappa Venkatesh¹, Daniel S. Rokhsar¹, Sydney Brenner¹ - Show less +37 more•Institutions (1)

Agency for Science, Technology and Research¹

23 Aug 2002-Science

TL;DR: The Fugu rubripes genome has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds as discussed by the authors.

...read moreread less

Abstract: The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some “giant” genes were observed that had average coding sequence sizes but were spread over genomic lengths significantly larger than those of their human orthologs. Although three-quarters of predicted human proteins have a strong match toFugu, approximately a quarter of the human proteins had highly diverged from or had no pufferfish homologs, highlighting the extent of protein evolution in the 450 million years since teleosts and mammals diverged. Conserved linkages between Fugu and human genes indicate the preservation of chromosomal segments from the common vertebrate ancestor, but with considerable scrambling of gene order.

...read moreread less

1,446 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth⁵, Gordon K. Smyth¹ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

Bioconductor: open software development for computational biology and bioinformatics

[...]

Robert Gentleman¹, Vincent J. Carey², Douglas M. Bates³, Benjamin M. Bolstad⁴, Marcel Dettling, Sandrine Dudoit⁴, Byron Ellis¹, Laurent Gautier⁵, Yongchao Ge⁶, Jeff Gentry¹, Kurt Hornik⁷, Torsten Hothorn⁸, Wolfgang Huber⁹, Stefano Maria Iacus¹⁰, Rafael A. Irizarry¹¹, Friedrich Leisch⁷, Cheng Li¹, Martin Maechler, A. J. Rossini¹², Günther Sawitzki, Colin A. Smith¹³, Gordon K. Smyth¹⁴, Luke Tierney¹⁵, Jean Yang, Jianhua Zhang¹ - Show less +21 more•Institutions (15)

Harvard University¹, Brigham and Women's Hospital², University of Wisconsin-Madison³, University of California, Berkeley⁴, Technical University of Denmark⁵, Icahn School of Medicine at Mount Sinai⁶, Vienna University of Technology⁷, University of Erlangen-Nuremberg⁸, German Cancer Research Center⁹, University of Milan¹⁰, Johns Hopkins University¹¹, University of Washington¹², Scripps Research Institute¹³, Walter and Eliza Hall Institute of Medical Research¹⁴, University of Iowa¹⁵

15 Sep 2004-Genome Biology

TL;DR: Details of the aims and methods of Bioconductor, the collaborative creation of extensible software for computational biology and bioinformatics, and current challenges are described.

...read moreread less

Abstract: The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

...read moreread less

12,142 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse