scispace - formally typeset
Search or ask a question

Showing papers by "Massachusetts Institute of Technology published in 2003"


Journal ArticleDOI
S. Agostinelli1, John Allison2, K. Amako3, J. Apostolakis4, Henrique Araujo5, P. Arce4, Makoto Asai6, D. Axen4, S. Banerjee7, G. Barrand, F. Behner4, Lorenzo Bellagamba8, J. Boudreau9, L. Broglia10, A. Brunengo8, H. Burkhardt4, Stephane Chauvie, J. Chuma11, R. Chytracek4, Gene Cooperman12, G. Cosmo4, P. V. Degtyarenko13, Andrea Dell'Acqua4, G. Depaola14, D. Dietrich15, R. Enami, A. Feliciello, C. Ferguson16, H. Fesefeldt4, Gunter Folger4, Franca Foppiano, Alessandra Forti2, S. Garelli, S. Gianì4, R. Giannitrapani17, D. Gibin4, J. J. Gomez Y Cadenas4, I. González4, G. Gracia Abril4, G. Greeniaus18, Walter Greiner15, Vladimir Grichine, A. Grossheim4, Susanna Guatelli, P. Gumplinger11, R. Hamatsu19, K. Hashimoto, H. Hasui, A. Heikkinen20, A. S. Howard5, Vladimir Ivanchenko4, A. Johnson6, F.W. Jones11, J. Kallenbach, Naoko Kanaya4, M. Kawabata, Y. Kawabata, M. Kawaguti, S.R. Kelner21, Paul R. C. Kent22, A. Kimura23, T. Kodama24, R. P. Kokoulin21, M. Kossov13, Hisaya Kurashige25, E. Lamanna26, Tapio Lampén20, V. Lara4, Veronique Lefebure4, F. Lei16, M. Liendl4, W. S. Lockman, Francesco Longo27, S. Magni, M. Maire, E. Medernach4, K. Minamimoto24, P. Mora de Freitas, Yoshiyuki Morita3, K. Murakami3, M. Nagamatu24, R. Nartallo28, Petteri Nieminen28, T. Nishimura, K. Ohtsubo, M. Okamura, S. W. O'Neale29, Y. Oohata19, K. Paech15, J Perl6, Andreas Pfeiffer4, Maria Grazia Pia, F. Ranjard4, A.M. Rybin, S.S Sadilov4, E. Di Salvo8, Giovanni Santin27, Takashi Sasaki3, N. Savvas2, Y. Sawada, Stefan Scherer15, S. Sei24, V. Sirotenko4, David J. Smith6, N. Starkov, H. Stoecker15, J. Sulkimo20, M. Takahata23, Satoshi Tanaka30, E. Tcherniaev4, E. Safai Tehrani6, M. Tropeano1, P. Truscott31, H. Uno24, L. Urbán, P. Urban32, M. Verderi, A. Walkden2, W. Wander33, H. Weber15, J.P. Wellisch4, Torre Wenaus34, D.C. Williams, Douglas Wright6, T. Yamada24, H. Yoshida24, D. Zschiesche15 
TL;DR: The Gelfant 4 toolkit as discussed by the authors is a toolkit for simulating the passage of particles through matter, including a complete range of functionality including tracking, geometry, physics models and hits.
Abstract: G eant 4 is a toolkit for simulating the passage of particles through matter. It includes a complete range of functionality including tracking, geometry, physics models and hits. The physics processes offered cover a comprehensive range, including electromagnetic, hadronic and optical processes, a large set of long-lived particles, materials and elements, over a wide energy range starting, in some cases, from 250 eV and extending in others to the TeV energy range. It has been designed and constructed to expose the physics models utilised, to handle complex geometries, and to enable its easy adaptation for optimal use in different sets of applications. The toolkit is the result of a worldwide collaboration of physicists and software engineers. It has been created exploiting software engineering and object-oriented technology and implemented in the C++ programming language. It has been used in applications in particle physics, nuclear physics, accelerator design, space engineering and medical physics.

18,904 citations


Journal ArticleDOI
TL;DR: An analytical strategy is introduced, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes, which identifies a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle.
Abstract: DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1α and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.

7,997 citations


Journal ArticleDOI
John W. Belmont1, Paul Hardenbol, Thomas D. Willis, Fuli Yu1, Huanming Yang2, Lan Yang Ch'Ang, Wei Huang3, Bin Liu2, Yan Shen3, Paul K.H. Tam4, Lap-Chee Tsui4, Mary M.Y. Waye5, Jeffrey Tze Fei Wong6, Changqing Zeng2, Qingrun Zhang2, Mark S. Chee7, Luana Galver7, Semyon Kruglyak7, Sarah S. Murray7, Arnold Oliphant7, Alexandre Montpetit8, Fanny Chagnon8, Vincent Ferretti8, Martin Leboeuf8, Michael S. Phillips8, Andrei Verner8, Shenghui Duan9, Denise L. Lind10, Raymond D. Miller9, John P. Rice9, Nancy L. Saccone9, Patricia Taillon-Miller9, Ming Xiao10, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley11, Sarah E. Hunt11, Don Powell11, Houcan Zhang12, Ichiro Matsuda13, Yoshimitsu Fukushima14, Darryl Macer15, Eiko Suda15, Charles N. Rotimi16, Clement Adebamowo17, Toyin Aniagwu17, Patricia A. Marshall18, Olayemi Matthew17, Chibuzor Nkwodimmah17, Charmaine D.M. Royal16, Mark Leppert19, Missy Dixon19, Fiona Cunningham20, Ardavan Kanani20, Gudmundur A. Thorisson20, Peter E. Chen21, David J. Cutler21, Carl S. Kashuk21, Peter Donnelly22, Jonathan Marchini22, Gilean McVean22, Simon Myers22, Lon R. Cardon22, Andrew P. Morris22, Bruce S. Weir23, James C. Mullikin24, Michael Feolo24, Mark J. Daly25, Renzong Qiu26, Alastair Kent, Georgia M. Dunston16, Kazuto Kato27, Norio Niikawa28, Jessica Watkin29, Richard A. Gibbs1, Erica Sodergren1, George M. Weinstock1, Richard K. Wilson9, Lucinda Fulton9, Jane Rogers11, Bruce W. Birren25, Hua Han2, Hongguang Wang, Martin Godbout30, John C. Wallenburg8, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins24, Lisa D. Brooks24, Jean E. McEwen24, Mark S. Guyer24, Elke Jordan31, Jane Peterson24, Jack Spiegel24, Lawrence M. Sung32, Lynn F. Zacharia24, Karen Kennedy29, Michael Dunn29, Richard Seabrook29, Mark Shillito, Barbara Skene29, John Stewart29, David Valle21, Ellen Wright Clayton33, Lynn B. Jorde19, Aravinda Chakravarti21, Mildred K. Cho34, Troy Duster35, Troy Duster36, Morris W. Foster37, Maria Jasperse38, Bartha Maria Knoppers39, Pui-Yan Kwok10, Julio Licinio40, Jeffrey C. Long41, Pilar N. Ossorio42, Vivian Ota Wang33, Charles N. Rotimi16, Patricia Spallone43, Patricia Spallone29, Sharon F. Terry44, Eric S. Lander25, Eric H. Lai45, Deborah A. Nickerson46, Gonçalo R. Abecasis41, David Altshuler47, Michael Boehnke41, Panos Deloukas11, Julie A. Douglas41, Stacey Gabriel25, Richard R. Hudson48, Thomas J. Hudson8, Leonid Kruglyak49, Yusuke Nakamura50, Robert L. Nussbaum24, Stephen F. Schaffner25, Stephen T. Sherry24, Lincoln Stein20, Toshihiro Tanaka 
18 Dec 2003-Nature
TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.
Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

5,926 citations


Journal ArticleDOI
TL;DR: Advances in the understanding of the mechanism and role of DNA methylation in biological processes are reviewed, showing that epigenetic mechanisms seem to allow an organism to respond to the environment through changes in gene expression.
Abstract: Cells of a multicellular organism are genetically homogeneous but structurally and functionally heterogeneous owing to the differential expression of genes. Many of these differences in gene expression arise during development and are subsequently retained through mitosis. Stable alterations of this kind are said to be 'epigenetic', because they are heritable in the short term but do not involve mutations of the DNA itself. Research over the past few years has focused on two molecular mechanisms that mediate epigenetic phenomena: DNA methylation and histone modifications. Here, we review advances in the understanding of the mechanism and role of DNA methylation in biological processes. Epigenetic effects by means of DNA methylation have an important role in development but can also arise stochastically as animals age. Identification of proteins that mediate these effects has provided insight into this complex process and diseases that occur when it is perturbed. External influences on epigenetic processes are seen in the effects of diet on long-term diseases such as cancer. Thus, epigenetic mechanisms seem to allow an organism to respond to the environment through changes in gene expression. The extent to which environmental effects can provoke epigenetic responses represents an exciting area of future research.

5,760 citations


Journal ArticleDOI
26 Dec 2003-Cell
TL;DR: The predicted regulatory targets of mammalian miRNAs were enriched for genes involved in transcriptional regulation but also encompassed an unexpectedly broad range of other functions.

5,246 citations


Journal ArticleDOI
TL;DR: This work develops and analyzes space-time coded cooperative diversity protocols for combating multipath fading across multiple protocol layers in a wireless network and demonstrates that these protocols achieve full spatial diversity in the number of cooperating terminals, not just theNumber of decoding relays, and can be used effectively for higher spectral efficiencies than repetition-based schemes.
Abstract: We develop and analyze space-time coded cooperative diversity protocols for combating multipath fading across multiple protocol layers in a wireless network. The protocols exploit spatial diversity available among a collection of distributed terminals that relay messages for one another in such a manner that the destination terminal can average the fading, even though it is unknown a priori which terminals will be involved. In particular, a source initiates transmission to its destination, and many relays potentially receive the transmission. Those terminals that can fully decode the transmission utilize a space-time code to cooperatively relay to the destination. We demonstrate that these protocols achieve full spatial diversity in the number of cooperating terminals, not just the number of decoding relays, and can be used effectively for higher spectral efficiencies than repetition-based schemes. We discuss issues related to space-time code design for these protocols, emphasizing codes that readily allow for appealing distributed versions.

4,385 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine systematic differences in earnings management across 31 countries and propose an explanation for these differences based on the notion that insiders, in an attempt to protect their private control benefits, use earnings management to conceal firm performance from outsiders.

3,662 citations


Proceedings ArticleDOI
14 Sep 2003
TL;DR: Measurements taken from a 29-node 802.11b test-bed demonstrate the poor performance of minimum hop-count, illustrate the causes of that poor performance, and confirm that ETX improves performance.
Abstract: This paper presents the expected transmission count metric (ETX), which finds high-throughput paths on multi-hop wireless networks. ETX minimizes the expected total number of packet transmissions (including retransmissions) required to successfully deliver a packet to the ultimate destination. The ETX metric incorporates the effects of link loss ratios, asymmetry in the loss ratios between the two directions of each link, and interference among the successive links of a path. In contrast, the minimum hop-count metric chooses arbitrarily among the different paths of the same minimum length, regardless of the often large differences in throughput among those paths, and ignoring the possibility that a longer path might offer higher throughput.This paper describes the design and implementation of ETX as a metric for the DSDV and DSR routing protocols, as well as modifications to DSDV and DSR which allow them to use ETX. Measurements taken from a 29-node 802.11b test-bed demonstrate the poor performance of minimum hop-count, illustrate the causes of that poor performance, and confirm that ETX improves performance. For long paths the throughput improvement is often a factor of two or more, suggesting that ETX will become more useful as networks grow larger and paths become longer.

3,656 citations


Journal ArticleDOI
TL;DR: Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.
Abstract: A fundamental problem that confronts peer-to-peer applications is the efficient location of the node that stores a desired data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

3,518 citations


Journal ArticleDOI
TL;DR: Two complementary strategies can be used in the fabrication of molecular biomaterials as discussed by the authors : chemical complementarity and structural compatibility, both of which confer the weak and noncovalent interactions that bind building blocks together during self-assembly.
Abstract: Two complementary strategies can be used in the fabrication of molecular biomaterials. In the 'top-down' approach, biomaterials are generated by stripping down a complex entity into its component parts (for example, paring a virus particle down to its capsid to form a viral cage). This contrasts with the 'bottom-up' approach, in which materials are assembled molecule by molecule (and in some cases even atom by atom) to produce novel supramolecular architectures. The latter approach is likely to become an integral part of nanomaterials manufacture and requires a deep understanding of individual molecular building blocks and their structures, assembly properties and dynamic behaviors. Two key elements in molecular fabrication are chemical complementarity and structural compatibility, both of which confer the weak and noncovalent interactions that bind building blocks together during self-assembly. Using natural processes as a guide, substantial advances have been achieved at the interface of nanomaterials and biology, including the fabrication of nanofiber materials for three-dimensional cell culture and tissue engineering, the assembly of peptide or protein nanotubes and helical ribbons, the creation of living microlenses, the synthesis of metal nanowires on DNA templates, the fabrication of peptide, protein and lipid scaffolds, the assembly of electronic materials by bacterial phage selection, and the use of radiofrequency to regulate molecular behaviors.

3,125 citations


ReportDOI
TL;DR: This paper found that computer capital substitutes for workers in performing cognitive and manual tasks that can be accomplished by following explicit rules, and complements workers in non-routine problem-solving and complex communications tasks.
Abstract: We apply an understanding of what computers do to study how computerization alters job skill demands. We argue that computer capital (1) substitutes for workers in performing cognitive and manual tasks that can be accomplished by following explicit rules; and (2) complements workers in performing nonroutine problem-solving and complex communications tasks. Provided these tasks are imperfect substitutes, our model implies measurable changes in the composition of job tasks, which we explore using representative data on task input for 1960 to 1998. We find that within industries, occupations and education groups, computerization is associated with reduced labor input of routine manual and routine cognitive tasks and increased labor input of nonroutine cognitive tasks. Translating task shifts into education demand, the model can explain sixty percent of the estimated relative demand shift favoring college labor during 1970 to 1998. Task changes within nominally identical occupations account for almost half of this impact.

Journal ArticleDOI
TL;DR: For the multicast setup it is proved that there exist coding strategies that provide maximally robust networks and that do not require adaptation of the network interior to the failure pattern in question.
Abstract: We take a new look at the issue of network capacity. It is shown that network coding is an essential ingredient in achieving the capacity of a network. Building on recent work by Li et al.(see Proc. 2001 IEEE Int. Symp. Information Theory, p.102), who examined the network capacity of multicast networks, we extend the network coding framework to arbitrary networks and robust networking. For networks which are restricted to using linear network codes, we find necessary and sufficient conditions for the feasibility of any given set of connections over a given network. We also consider the problem of network recovery for nonergodic link failures. For the multicast setup we prove that there exist coding strategies that provide maximally robust networks and that do not require adaptation of the network interior to the failure pattern in question. The results are derived for both delay-free networks and networks with delays.

Journal ArticleDOI
TL;DR: Online feedback mechanisms harness the bidirectional communication capabilities of the Internet to engineer large-scale, word-of-mouth networks as discussed by the authors, which has potentially important implications for a wide range of management activities such as brand building, customer acquisition and retention, product development and quality assurance.
Abstract: Online feedback mechanisms harness the bidirectional communication capabilities of the Internet to engineer large-scale, word-of-mouth networks. Best known so far as a technology for building trust and fostering cooperation in online marketplaces, such as eBay, these mechanisms are poised to have a much wider impact on organizations. Their growing popularity has potentially important implications for a wide range of management activities such as brand building, customer acquisition and retention, product development, and quality assurance. This paper surveys our progress in understanding the new possibilities and challenges that these mechanisms represent. It discusses some important dimensions in which Internet-based feedback mechanisms differ from traditional word-of-mouth networks and surveys the most important issues related to their design, evaluation, and use. It provides an overview of relevant work in game theory and economics on the topic of reputation. It discusses how this body of work is being extended and combined with insights from computer science, management science, sociology, and psychology to take into consideration the special properties of online environments. Finally, it identifies opportunities that this new area presents for operations research/management science (OR/MS) research.

Journal ArticleDOI
TL;DR: It is found that solid tumors carrying the gene-expression signature were most likely to be associated with metastasis and poor clinical outcome, suggesting that the metastatic potential of human tumors is encoded in the bulk of aPrimary tumor, thus challenging the notion that metastases arise from rare cells within a primary tumor that have the ability to metastasize.
Abstract: Metastasis is the principal event leading to death in individuals with cancer, yet its molecular basis is poorly understood. To explore the molecular differences between human primary tumors and metastases, we compared the gene-expression profiles of adenocarcinoma metastases of multiple tumor types to unmatched primary adenocarcinomas. We found a gene-expression signature that distinguished primary from metastatic adenocarcinomas. More notably, we found that a subset of primary tumors resembled metastatic tumors with respect to this gene-expression signature. We confirmed this finding by applying the expression signature to data on 279 primary solid tumors of diverse types. We found that solid tumors carrying the gene-expression signature were most likely to be associated with metastasis and poor clinical outcome (P < 0.03). These results suggest that the metastatic potential of human tumors is encoded in the bulk of a primary tumor, thus challenging the notion that metastases arise from rare cells within a primary tumor that have the ability to metastasize.


Journal ArticleDOI
TL;DR: The studies reported here establish for the first time that a region in the human temporo-parietal junction (here called the TPJ-M) is involved specifically in reasoning about the contents of another person's mind.

Journal ArticleDOI
19 Jun 2003-Nature
TL;DR: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length, and is a mosaic of heterochromatic sequences and three classes of euchromatics sequences: X-transposed, X-degenerate and ampliconic.
Abstract: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (non-reciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes.

PatentDOI
27 May 2003-Cell
TL;DR: The results reveal an unanticipated level of regulation which is superimposed on that due to gene-specific transcription factors, a novel mechanism for coordinate regulation of specific sets of genes when cells encounter limiting nutrients, and evidence that the ultimate targets of signal transduction pathways can be identified within the initiation apparatus.

Journal ArticleDOI
TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

Journal ArticleDOI
TL;DR: It is concluded that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.
Abstract: Association studies offer a potentially powerful approach to identify genetic variants that influence susceptibility to common disease1,2,3,4, but are plagued by the impression that they are not consistently reproducible5,6. In principle, the inconsistency may be due to false positive studies, false negative studies or true variability in association among different populations4,5,6,7,8. The critical question is whether false positives overwhelmingly explain the inconsistency. We analyzed 301 published studies covering 25 different reported associations. There was a large excess of studies replicating the first positive reports, inconsistent with the hypothesis of no true positive associations (P < 10−14). This excess of replications could not be reasonably explained by publication bias and was concentrated among 11 of the 25 associations. For 8 of these 11 associations, pooled analysis of follow-up studies yielded statistically significant replication of the first report, with modest estimated genetic effects. Thus, a sizable fraction (but under half) of reported associations have strong evidence of replication; for these, false negative, underpowered studies probably contribute to inconsistent replication. We conclude that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.

Journal ArticleDOI
15 May 2003-Nature
TL;DR: A comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species, which inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions.
Abstract: Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

Journal ArticleDOI
TL;DR: A new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data is presented, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters.
Abstract: In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.

Journal ArticleDOI
TL;DR: In this article, the authors present an overview of the mechanical properties of nanocrystalline metals and alloys with the objective of assessing recent advances in the experimental and computational studies of deformation, damage evolution, fracture and fatigue, and highlighting opportunities for further research.

Journal ArticleDOI
07 Mar 2003-Cell
TL;DR: It is found that activation of HIF-1alpha is essential for myeloid cell infiltration and activation in vivo through a mechanism independent of VEGF, and its direct regulation of survival and function in the inflammatory microenvironment is demonstrated.

Journal ArticleDOI
TL;DR: This work proposes a robust integer programming problem of moderately larger size that allows controlling the degree of conservatism of the solution in terms of probabilistic bounds on constraint violation, and proposes an algorithm for robust network flows that solves the robust counterpart by solving a polynomial number of nominal minimum cost flow problems in a modified network.
Abstract: We propose an approach to address data uncertainty for discrete optimization and network flow problems that allows controlling the degree of conservatism of the solution, and is computationally tractable both practically and theoretically. In particular, when both the cost coefficients and the data in the constraints of an integer programming problem are subject to uncertainty, we propose a robust integer programming problem of moderately larger size that allows controlling the degree of conservatism of the solution in terms of probabilistic bounds on constraint violation. When only the cost coefficients are subject to uncertainty and the problem is a 0−1 discrete optimization problem on n variables, then we solve the robust counterpart by solving at most n+1 instances of the original problem. Thus, the robust counterpart of a polynomially solvable 0−1 discrete optimization problem remains polynomially solvable. In particular, robust matching, spanning tree, shortest path, matroid intersection, etc. are polynomially solvable. We also show that the robust counterpart of an NP-hard α-approximable 0−1 discrete optimization problem, remains α-approximable. Finally, we propose an algorithm for robust network flows that solves the robust counterpart by solving a polynomial number of nominal minimum cost flow problems in a modified network.

Journal ArticleDOI
TL;DR: It is shown that lentivirus-delivered shRNAs are capable of specific, highly stable and functional silencing of gene expression in a variety of cell types and also in transgenic mice.
Abstract: RNA interference (RNAi) has recently emerged as a specific and efficient method to silence gene expression in mammalian cells either by transfection of short interfering RNAs (siRNAs; ref. 1) or, more recently, by transcription of short hairpin RNAs (shRNAs) from expression vectors and retroviruses. But the resistance of important cell types to transduction by these approaches, both in vitro and in vivo, has limited the use of RNAi. Here we describe a lentiviral system for delivery of shRNAs into cycling and non-cycling mammalian cells, stem cells, zygotes and their differentiated progeny. We show that lentivirus-delivered shRNAs are capable of specific, highly stable and functional silencing of gene expression in a variety of cell types and also in transgenic mice. Our lentiviral vectors should permit rapid and efficient analysis of gene function in primary human and animal cells and tissues and generation of animals that show reduced expression of specific genes. They may also provide new approaches for gene therapy.

Journal ArticleDOI
24 Apr 2003-Nature
TL;DR: A high-quality draft sequence of the N. crassa genome is reported, suggesting that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.
Abstract: Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes—more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis of the gene set yields insights into unexpected aspects of Neurospora biology including the identification of genes potentially associated with red light photobiology, genes implicated in secondary metabolism, and important differences in Ca21 signalling as compared with plants and animals. Neurospora possesses the widest array of genome defence mechanisms known for any eukaryotic organism, including a process unique to fungi called repeat-induced point mutation (RIP). Genome analysis suggests that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.

Book
01 Jan 2003
TL;DR: It is shown that BP can only converge to a fixed point that is also a stationary point of the Bethe approximation to the free energy, which enables connections to be made with variational approaches to approximate inference.
Abstract: "Inference" problems arise in statistical physics, computer vision, error-correcting coding theory, and AI. We explain the principles behind the belief propagation (BP) algorithm, which is an efficient way to solve inference problems based on passing local messages. We develop a unified approach, with examples, notation, and graphical models borrowed from the relevant disciplines.We explain the close connection between the BP algorithm and the Bethe approximation of statistical physics. In particular, we show that BP can only converge to a fixed point that is also a stationary point of the Bethe approximation to the free energy. This result helps explaining the successes of the BP algorithm and enables connections to be made with variational approaches to approximate inference.The connection of BP with the Bethe approximation also suggests a way to construct new message-passing algorithms based on improvements to Bethe's approximation introduced Kikuchi and others. The new generalized belief propagation (GBP) algorithms are significantly more accurate than ordinary BP for some problems. We illustrate how to construct GBP algorithms with a detailed example.

Journal ArticleDOI
TL;DR: In this paper, the authors propose that open source software development is an exemplar of a compound "private-collective" model of innovation that contains elements of both the private investment and the collective action models and can offer society the best of both worlds under many conditions.
Abstract: Currently, two models of innovation are prevalent in organization science. The "private investment" model assumes returns to the innovator result from private goods and efficient regimes of intellectual property protection. The "collective action" model assumes that under conditions of market failure, innovators collaborate in order to produce a public good. The phenomenon of open source software development shows that users program to solve their own as well as shared technical problems, and freely reveal their innovations without appropriating private returns from selling the software. In this paper, we propose that open source software development is an exemplar of a compound "private-collective" model of innovation that contains elements of both the private investment and the collective action models and can offer society the "best of both worlds" under many conditions. We describe a new set of research questions this model raises for scholars in organization science. We offer some details regarding the types of data available for open source projects in order to ease access for researchers who are unfamiliar with these, and also offer some advice on conducting empirical studies on open source software development processes.

Journal ArticleDOI
TL;DR: Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands, allowing segments of biological pathways to be constructed in silico.
Abstract: Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands. Each sequence motif is represented as a position-specific scoring matrix (PSSM) based on results from oriented peptide library and phage display experiments. Predicted domain-motif interactions from Scansite can be sequentially combined, allowing segments of biological pathways to be constructed in silico. The current release of Scansite, version 2.0, includes 62 motifs characterizing the binding and/or substrate specificities of many families of Ser/Thr- or Tyr-kinases, SH2, SH3, PDZ, 14-3-3 and PTB domains, together with signature motifs for PtdIns(3,4,5)P(3)-specific PH domains. Scansite 2.0 contains significant improvements to its original interface, including a number of new generalized user features and significantly enhanced performance. Searches of all SWISS-PROT, TrEMBL, Genpept and Ensembl protein database entries are now possible with run times reduced by approximately 60% when compared with Scansite version 1.0. Scansite 2.0 allows restricted searching of species-specific proteins, as well as isoelectric point and molecular weight sorting to facilitate comparison of predictions with results from two-dimensional gel electrophoresis experiments. Support for user-defined motifs has been increased, allowing easier input of user-defined matrices and permitting user-defined motifs to be combined with pre-compiled Scansite motifs for dual motif searching. In addition, a new series of Sequence Match programs for non-quantitative user-defined motifs has been implemented. Scansite is available via the World Wide Web at http://scansite.mit.edu.