Home
/
Authors
/
Ning Lan

Author

Ning Lan

Bio: Ning Lan is an academic researcher from Yale University. The author has contributed to research in topics: Proteome & Genome. The author has an hindex of 11, co-authored 13 publications receiving 4896 citations. Previous affiliations of Ning Lan include Rutgers University.

Topics: Proteome, Genome, Functional genomics, ORFS, Gene ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Functional profiling of the Saccharomyces cerevisiae genome.

[...]

Guri Giaever¹, Angela M. Chu¹, Li Ni², Carla Connelly³, Linda Riles⁴, Steeve Veronneau⁵, Sally Dow⁶, Ankuta Lucau-Danila⁷, Keith Anderson¹, Bruno André⁸, Adam P. Arkin⁹, Anna Astromoff¹, Mohamed El Bakkoury⁸, Rhonda Bangham², Rocío Benito¹⁰, Sophie Brachat¹¹, Stefano Campanaro¹², Matt Curtiss⁴, Karen Davis¹, Adam M. Deutschbauer¹, K. D. Entian¹³, Patrick Flaherty⁹, Françoise Foury⁷, David J. Garfinkel¹⁴, Mark Gerstein², Deanna Gotte¹⁴, Ulrich Güldener¹⁵, Johannes H. Hegemann¹⁵, Svenja Hempel¹³, Zelek S. Herman¹, Daniel F. Jaramillo¹, Diane E. Kelly¹⁵, Steven L. Kelly¹⁵, Peter Kötter¹³, Darlene LaBonte², David C. Lamb¹⁵, Ning Lan², Hong Liang¹, Hong Liao², Lucy Y. Liu², Chuanyun Luo², Marc Lussier⁵, Rong Mao³, Patrice Menard⁵, Siew Loon Ooi³, José L. Revuelta¹⁰, Christopher J. Roberts⁶, Matthias Rose¹³, Petra Ross-Macdonald², Bart Scherens⁸, Greg Schimmack⁶, Brenda Shafer¹⁴, Daniel D. Shoemaker¹, Sharon Sookhai-Mahadeo³, Reginald Storms¹⁶, Jeffrey N. Strathern¹⁴, Giorgio Valle¹², Marleen Voet¹⁷, Guido Volckaert¹⁷, Ching Yun Wang¹⁴, Teresa R. Ward⁶, Julie Wilhelmy⁴, Elizabeth A. Winzeler¹, Yonghong Yang¹, Grace Yen¹, Elaine M. Youngman³, Kexin Yu³, Howard Bussey⁵, Jef D. Boeke³, Michael Snyder², Peter Philippsen¹¹, Ronald W. Davis¹, Mark Johnston⁴ - Show less +69 more•Institutions (17)

Stanford University¹, Yale University², Johns Hopkins University School of Medicine³, Washington University in St. Louis⁴, McGill University⁵, Merck & Co.⁶, Université catholique de Louvain⁷, Université libre de Bruxelles⁸, University of California, Berkeley⁹, University of Salamanca¹⁰, University of Basel¹¹, University of Padua¹², Goethe University Frankfurt¹³, National Institutes of Health¹⁴, Aberystwyth University¹⁵, Concordia University¹⁶, Katholieke Universiteit Leuven¹⁷

25 Jul 2002-Nature

TL;DR: It is shown that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment, and less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal Growth in four of the tested conditions.

...read moreread less

Abstract: Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

...read moreread less

4,328 citations

Journal Article•DOI•

Mining the Structural Genomics Pipeline: Identification of Protein Properties that Affect High-throughput Experimental Analysis

[...]

Chern Sing Goh¹, Ning Lan¹, Ning Lan², Shawn M. Douglas², Shawn M. Douglas¹, Baolin Wu¹, Nathaniel Echols¹, Nathaniel Echols², Andrew Marcus Smith², Andrew Marcus Smith¹, Duncan Milburn², Duncan Milburn¹, Gaetano T. Montelione, Hongyu Zhao¹, Mark Gerstein², Mark Gerstein¹ - Show less +12 more•Institutions (2)

Yale University¹, Rutgers University²

06 Feb 2004-Journal of Molecular Biology

TL;DR: This work uses tree-based analyses and random forest algorithms to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation and identifies combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets.

...read moreread less

143 citations

Journal Article•DOI•

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics

[...]

Paul Bertone¹, Yuval Kluger¹, Ning Lan¹, Deyou Zheng², Dinesh Christendat³, Adelinda Yee³, Aled M. Edwards³, Cheryl H. Arrowsmith³, Gaetano T. Montelione², Mark Gerstein¹ - Show less +6 more•Institutions (3)

Yale University¹, Rutgers University², University of Toronto³

01 Jul 2001-Nucleic Acids Research

TL;DR: This work developed a comprehensive set of data mining features for each protein, including several related to experimental progress and demonstrated in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein's solubility and propensity to crystallize based on sequence features.

...read moreread less

Abstract: High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at bioinfo.mbb.yale.edu/nesg or nesg.org, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein's solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble proteins tend to have significantly more acidic residues and fewer hydrophobic stretches than insoluble ones. One of the characteristics of proteomics data sets, currently and in the foreseeable future, is their intermediate size ( approximately 500-5000 data points). This creates a number of issues in relation to error estimation. Initially we estimate the overall error in our trees based on standard cross-validation. However, this leaves out a significant fraction of the data in model construction and does not give error estimates on individual rules. Therefore, we present alternative methods to estimate the error in particular rules.

...read moreread less

121 citations

Journal Article•DOI•

An integrated approach for finding overlooked genes in yeast

[...]

Anuj Kumar¹, Paul M. Harrison¹, Kei-Hoi Cheung¹, Ning Lan¹, Nathaniel Echols¹, Paul Bertone¹, Perry L. Miller¹, Mark Gerstein¹, Michael Snyder¹ - Show less +5 more•Institutions (1)

Yale University¹

01 Jan 2002-Nature Biotechnology

TL;DR: The discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching, which provides an effective supplement to current gene-finding schemes.

...read moreread less

Abstract: We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to β-galactosidase (β-gal); nonannotated open reading frames (ORFs) translated as β-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.

...read moreread less

109 citations

Journal Article•DOI•

Integration of genomic datasets to predict protein complexes in yeast

[...]

Ronald Jansen¹, Ning Lan¹, Jiang Qian¹, Mark Gerstein¹•Institutions (1)

Yale University¹

01 Jan 2002-Journal of Structural and Functional Genomics

TL;DR: This paper focuses on the prediction of membership in protein complexes for individual genes, and recruits six different data sources that include expression profiles, interaction data, essentiality and localization information, which can be improved by combining all of them.

...read moreread less

Abstract: The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale. It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics. The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome. In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http:// bioinfo.mbb.yale.edu/integrate/interactions/.

...read moreread less

97 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Network biology: understanding the cell's functional organization

[...]

Albert-László Barabási¹, Zoltán N. Oltvai²•Institutions (2)

University of Notre Dame¹, Northwestern University²

01 Feb 2004-Nature Reviews Genetics

TL;DR: This work states that rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual framework that could potentially revolutionize the view of biology and disease pathologies in the twenty-first century.

...read moreread less

Abstract: A key aim of postgenomic biomedical research is to systematically catalogue all molecules and their interactions within a living cell. There is a clear need to understand how these molecules and the interactions between them determine the function of this enormously complex machinery, both in isolation and when surrounded by other cells. Rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual framework that could potentially revolutionize our view of biology and disease pathologies in the twenty-first century.

...read moreread less

7,475 citations

Journal Article•DOI•

Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection.

[...]

Tomoya Baba¹, Takeshi Ara¹, Miki Hasegawa¹, Yuki Takai¹, Yoshiko Okumura¹, Miki Baba¹, Kirill A. Datsenko², Masaru Tomita¹, Barry L. Wanner², Hirotada Mori³, Hirotada Mori¹ - Show less +7 more•Institutions (3)

Keio University¹, Purdue University², Nara Institute of Science and Technology³

01 Jan 2006-Molecular Systems Biology

TL;DR: These mutants—the ‘Keio collection’—provide a new resource not only for systematic analyses of unknown gene functions and gene regulatory networks but also for genome‐wide testing of mutational effects in a common strain background, E. coli K‐12 BW25113.

...read moreread less

Abstract: We have systematically made a set of precisely defined, single-gene deletions of all nonessential genes in Escherichia coli K-12. Open-reading frame coding regions were replaced with a kanamycin cassette flanked by FLP recognition target sites by using a one-step method for inactivation of chromosomal genes and primers designed to create in-frame deletions upon excision of the resistance cassette. Of 4288 genes targeted, mutants were obtained for 3985. To alleviate problems encountered in high-throughput studies, two independent mutants were saved for every deleted gene. These mutants-the 'Keio collection'-provide a new resource not only for systematic analyses of unknown gene functions and gene regulatory networks but also for genome-wide testing of mutational effects in a common strain background, E. coli K-12 BW25113. We were unable to disrupt 303 genes, including 37 of unknown function, which are candidates for essential genes. Distribution is being handled via GenoBase (http://ecoli.aist-nara.ac.jp/).

...read moreread less

7,428 citations

Journal Article•DOI•

Protein Misfolding, Functional Amyloid, and Human Disease

[...]

Fabrizio Chiti¹, Christopher M. Dobson²•Institutions (2)

University of Florence¹, University of Cambridge²

06 Jun 2006-Annual Review of Biochemistry

TL;DR: The relative importance of the common main-chain and side-chain interactions in determining the propensities of proteins to aggregate is discussed and some of the evidence that the oligomeric fibril precursors are the primary origins of pathological behavior is described.

...read moreread less

Abstract: Peptides or proteins convert under some conditions from their soluble forms into highly ordered fibrillar aggregates. Such transitions can give rise to pathological conditions ranging from neurodegenerative disorders to systemic amyloidoses. In this review, we identify the diseases known to be associated with formation of fibrillar aggregates and the specific peptides and proteins involved in each case. We describe, in addition, that living organisms can take advantage of the inherent ability of proteins to form such structures to generate novel and diverse biological functions. We review recent advances toward the elucidation of the structures of amyloid fibrils and the mechanisms of their formation at a molecular level. Finally, we discuss the relative importance of the common main-chain and side-chain interactions in determining the propensities of proteins to aggregate and describe some of the evidence that the oligomeric fibril precursors are the primary origins of pathological behavior.

...read moreread less

5,897 citations

Journal Article•DOI•

Genome-Wide Insertional Mutagenesis of Arabidopsis thaliana

[...]

Jose M. Alonso¹, Anna Stepanova¹, Thomas J. Leisse¹, Christopher J. Kim¹, Huaming Chen¹, Paul Shinn¹, Denise K. Stevenson¹, Justin Zimmerman¹, Pascual Barajas¹, Rosa Cheuk¹, Carmelita Gadrinab¹, Collen Heller¹, Albert Jeske¹, Eric Koesema¹, Cristina C. Meyers¹, Holly Parker¹, Lance Prednis¹, Yasser Ansari¹, Nathan Choy¹, Hashim Deen¹, Michael Geralt¹, Nisha Hazari¹, Emily Hom¹, Meagan Karnes¹, Celene Mulholland¹, Ral Ndubaku¹, Ian Thomas Schmidt¹, Plinio Guzmán¹, Laura Aguilar-Henonin¹, Markus Schmid¹, Detlef Weigel¹, David E. Carter², Trudy Marchand², Eddy Risseeuw², Debra Brogden², Albana Zeko², William L. Crosby², Charles C. Berry³, Joseph R. Ecker¹ - Show less +35 more•Institutions (3)

Salk Institute for Biological Studies¹, National Research Council², University of California, San Diego³

01 Aug 2003-Science

TL;DR: Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels, and insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

...read moreread less

Abstract: Over 225,000 independent Agrobacterium transferred DNA (T-DNA) insertion events in the genome of the reference plant Arabidopsis thaliana have been created that represent near saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions, which resulted in the identification of mutations in more than 21,700 of the approximately 29,454 predicted Arabidopsis genes. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels. Insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

...read moreread less

5,227 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse