scispace - formally typeset
Search or ask a question

Showing papers by "Santa Fe Institute published in 2009"


Journal ArticleDOI
TL;DR: This work proposes a principled statistical framework for discerning and quantifying power-law behavior in empirical data by combining maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios.
Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

8,753 citations


Journal ArticleDOI
TL;DR: It is demonstrated that it is possible to accurately infer 95% of friendships based on the observational data alone, where friend dyads demonstrate distinctive temporal and spatial patterns in their physical proximity and calling patterns that allow the prediction of individual-level outcomes such as job satisfaction.
Abstract: Data collected from mobile phones have the potential to provide insight into the relational dynamics of individuals. This paper compares observational data from mobile phones with standard self-report survey data. We find that the information from these two data sources is overlapping but distinct. For example, self-reports of physical proximity deviate from mobile phone records depending on the recency and salience of the interactions. We also demonstrate that it is possible to accurately infer 95% of friendships based on the observational data alone, where friend dyads demonstrate distinctive temporal and spatial patterns in their physical proximity and calling patterns. These behavioral patterns, in turn, allow the prediction of individual-level outcomes such as job satisfaction.

1,921 citations


Journal ArticleDOI
06 Aug 2009-Nature
TL;DR: The leaders of the world are flying the economy by the seat of their pants, say J. Doyne Farmer and Duncan Foley, and there is a better way to help guide financial policies.
Abstract: The leaders of the world are flying the economy by the seat of their pants, say J. Doyne Farmer and Duncan Foley. There is, however, a better way to help guide financial policies.

1,109 citations


Journal ArticleDOI
TL;DR: In this article, the authors argue that this leverage cycle can be damaging to the economy and should be regulated, and that equilibrium determines leverage, not just interest rates, causing fluctuations in asset prices.
Abstract: Equilibrium determines leverage, not just interest rates. Variations in leverage cause fluctuations in asset prices. This leverage cycle can be damaging to the economy, and should be regulated.

905 citations


Journal ArticleDOI
TL;DR: The authors developed a model of friendship formation that sheds light on segregation patterns observed in social and economic networks Individuals have types and see type-dependent benefits from friendships, and examine the properties of a steady-state equilibrium of a matching process of friendship forming.
Abstract: We develop a model of friendship formation that sheds light on segregation patterns observed in social and economic networks Individuals have types and see type-dependent benefits from friendships We examine the properties of a steady-state equilibrium of a matching process of friendship formation We use the model to understand three empirical patterns of friendship formation: (i) larger groups tend to form more same-type ties and fewer other-type ties than small groups, (ii) larger groups form more ties per capita, and (iii) all groups are biased towards same-type relative to demographics, with the most extreme bias coming from middle-sized groups We show how these empirical observations can be generated by biases in preferences and biases in meetings We also illustrate some welfare implications of the model

853 citations


Journal ArticleDOI
TL;DR: Viral properties associated with mucosal HIV-1 transmission and a limited set of rapidly evolving adaptive mutations driven primarily, but not exclusively, by early cytotoxic T cell responses are revealed.
Abstract: Identification of full-length transmitted HIV-1 genomes could be instrumental in HIV-1 pathogenesis, microbicide, and vaccine research by enabling the direct analysis of those viruses actually responsible for productive clinical infection. We show in 12 acutely infected subjects (9 clade B and 3 clade C) that complete HIV-1 genomes of transmitted/founder viruses can be inferred by single genome amplification and sequencing of plasma virion RNA. This allowed for the molecular cloning and biological analysis of transmitted/founder viruses and a comprehensive genome-wide assessment of the genetic imprint left on the evolving virus quasispecies by a composite of host selection pressures. Transmitted viruses encoded intact canonical genes ( gag-pol-vif-vpr-tat-rev-vpu-env-nef ) and replicated efficiently in primary human CD4+ T lymphocytes but much less so in monocyte-derived macrophages. Transmitted viruses were CD4 and CCR5 tropic and demonstrated concealment of coreceptor binding surfaces of the envelope bridging sheet and variable loop 3. 2 mo after infection, transmitted/founder viruses in three subjects were nearly completely replaced by viruses differing at two to five highly selected genomic loci; by 12–20 mo, viruses exhibited concentrated mutations at 17–34 discrete locations. These findings reveal viral properties associated with mucosal HIV-1 transmission and a limited set of rapidly evolving adaptive mutations driven primarily, but not exclusively, by early cytotoxic T cell responses.

777 citations


Journal ArticleDOI
TL;DR: This article analyzed how professional values and practices influence the character of nonprofit organizations, with data from a random sample of 501 (c)(3) operating charities in the San Francisco Bay Area collected between 2003 and 2004.
Abstract: This paper analyzes how professional values and practices influence the character of nonprofit organizations, with data from a random sample of 501 (c)(3) operating charities in the San Francisco Bay Area collected between 2003 and 2004. Expanded professionalism in the nonprofit world involves not only paid, full-time careers and credentialed expertise but also the integration of professional ideals into the everyday world of charitable work. We develop key indicators of professionalism and measure organizational rationalization as expressed in the use of strategic planning, independent financial audits, quantitative program evaluation, and consultants. As hypothesized, charities operated by paid personnel and full-time management show higher levels of rationalization. While traditional professionals (doctors, lawyers, and the clergy) do not differ significantly from executives with no credentialed background in eschewing business-like practices, managerial professionals champion such efforts actively, as...

708 citations


Journal ArticleDOI
TL;DR: Kinetic analysis and mathematical modeling of virus immune escape showed that the contribution of CD8 T cell–mediated killing of productively infected cells was earlier and much greater than previously recognized and that it contributed to the initial decline of plasma virus in acute infection.
Abstract: Identification of the transmitted/founder virus makes possible, for the first time, a genome-wide analysis of host immune responses against the infecting HIV-1 proteome. A complete dissection was made of the primary HIV-1–specific T cell response induced in three acutely infected patients. Cellular assays, together with new algorithms which identify sites of positive selection in the virus genome, showed that primary HIV-1–specific T cells rapidly select escape mutations concurrent with falling virus load in acute infection. Kinetic analysis and mathematical modeling of virus immune escape showed that the contribution of CD8 T cell–mediated killing of productively infected cells was earlier and much greater than previously recognized and that it contributed to the initial decline of plasma virus in acute infection. After virus escape, these first T cell responses often rapidly waned, leaving or being succeeded by T cell responses to epitopes which escaped more slowly or were invariant. These latter responses are likely to be important in maintaining the already established virus set point. In addition to mutations selected by T cells, there were other selected regions that accrued mutations more gradually but were not associated with a T cell response. These included clusters of mutations in envelope that were targeted by NAbs, a few isolated sites that reverted to the consensus sequence, and bystander mutations in linkage with T cell–driven escape.

670 citations


Journal ArticleDOI
Samuel Bowles1
05 Jun 2009-Science
TL;DR: A model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations finds that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist.
Abstract: Since Darwin, intergroup hostilities have figured prominently in explanations of the evolution of human social behavior. Yet whether ancestral humans were largely "peaceful" or "warlike" remains controversial. I ask a more precise question: If more cooperative groups were more likely to prevail in conflicts with other groups, was the level of intergroup violence sufficient to influence the evolution of human social behavior? Using a model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations, I find that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist.

592 citations


Journal ArticleDOI
13 Mar 2009-Science
TL;DR: It is shown that incorporating a limited amount of choice in the classic Erdös-Rényi network formation model causes its percolation transition to become discontinuous.
Abstract: Networks in which the formation of connections is governed by a random process often undergo a percolation transition, wherein around a critical point, the addition of a small number of connections causes a sizable fraction of the network to suddenly become linked together. Typically such transitions are continuous, so that the percentage of the network linked together tends to zero right above the transition point. Whether percolation transitions could be discontinuous has been an open question. Here, we show that incorporating a limited amount of choice in the classic Erdos-Renyi network formation model causes its percolation transition to become discontinuous.

562 citations


Book ChapterDOI
TL;DR: In this article, a new approach to the classic problem of tâtonnement is presented, which is based on several empirical observations about financial markets, the most important of which is long memory in the fluctuations of supply and demand.
Abstract: Publisher Summary This chapter discusses the new approach to the classic problem of tâtonnement —the dynamic process through which markets seek to reach equilibrium. The foundation of this approach is based on several empirical observations about financial markets. The most important of which is long memory in the fluctuations of supply and demand. This is exhibited in the placement of trading orders and corresponds to long-term, slowly decaying positive correlations in the initiation of buying versus selling. It is observed in all the stock markets studied so far at very high levels of statistical significance. It appears that the primary cause of this long memory is the incremental execution of large hidden trading orders. The fact that the long memory of order flow must coexist with market efficiency has a profound influence on price formation, causing dynamic adjustments of liquidity that are strongly asymmetric between buyers and sellers. This has important consequences for market impact. This work has also important consequences about the interpretation and effect of information in financial markets. In particular, the explanation for market impact is that the shape of the impact function is determined by differences in the information content of trades.

Journal ArticleDOI
TL;DR: This work identifies the structure inherent in daily behavior with models that can accurately analyze, predict, and cluster multimodal data from individuals and communities within the social network of a population with the potential for this dimensionality reduction technique to infer community affiliations within the subjects’ social network.
Abstract: Longitudinal behavioral data generally contains a significant amount of structure. In this work, we identify the structure inherent in daily behavior with models that can accurately analyze, predict, and cluster multimodal data from individuals and communities within the social network of a population. We represent this behavioral structure by the principal components of the complete behavioral dataset, a set of characteristic vectors we have termed eigenbehaviors. In our model, an individual’s behavior over a specific day can be approximated by a weighted sum of his or her primary eigenbehaviors. When these weights are calculated halfway through a day, they can be used to predict the day’s remaining behaviors with 79% accuracy for our test subjects. Additionally, we demonstrate the potential for this dimensionality reduction technique to infer community affiliations within the subjects’ social network by clustering individuals into a “behavior space” spanned by a set of their aggregate eigenbehaviors. These behavior spaces make it possible to determine the behavioral similarity between both individuals and groups, enabling 96% classification accuracy of community affiliations within the population-level social network. Additionally, the distance between individuals in the behavior space can be used as an estimate for relational ties such as friendship, suggesting strong behavioral homophily amongst the subjects. This approach capitalizes on the large amount of rich data previously captured during the Reality Mining study from mobile phones continuously logging location, proximate phones, and communication of 100 subjects at MIT over the course of 9 months. As wearable sensors continue to generate these types of rich, longitudinal datasets, dimensionality reduction techniques such as eigenbehaviors will play an increasingly important role in behavioral research.

Journal ArticleDOI
TL;DR: In this article, the authors consider the joint effects of geographic propinquity and network position on organizational innovation using negative binomial count models of patenting activity for U.S.-based life science firms in industrial districts and regional clusters across a 12-year time period, 1988-1999.
Abstract: Industrial districts and regional clusters depend on the networks that arise from reciprocal linkages among co-located organizations, while physical proximity among firms can alter the nature of information and resource flows through networks. We consider the joint effects of geographic propinquity and network position on organizational innovation using negative binomial count models of patenting activity for U.S.-based life science firms in industrial districts and regional clusters across a 12-year time period, 1988–1999. We find evidence that regional agglomeration and network centrality exert complementary, but contingent, influences on organizational innovation. Results show that in the high-velocity, research-intensive field of biotechnology, geographic and network positions have both independent and contingent effects on organizational innovation. The influence of centrality in local, physically co-located partner networks depends on the extent to which firms are also embedded in a global network c...

Journal ArticleDOI
TL;DR: In a combined analysis of 171 subtype B and C transmission events, it is found that infection with more than one variant does not follow a Poisson distribution, indicating that transmission of individual virions cannot be seen as independent events, each occurring with low probability.
Abstract: Identifying the specific genetic characteristics of successfully transmitted variants may prove central to the development of effective vaccine and microbicide interventions. Although human immunodeficiency virus transmission is associated with a population bottleneck, the extent to which different factors influence the diversity of transmitted viruses is unclear. We estimate here the number of transmitted variants in 69 heterosexual men and women with primary subtype C infections. From 1,505 env sequences obtained using a single genome amplification approach we show that 78% of infections involved single variant transmission and 22% involved multiple variant transmissions (median of 3). We found evidence for mutations selected for cytotoxic-T-lymphocyte or antibody escape and a high prevalence of recombination in individuals infected with multiple variants representing another potential escape pathway in these individuals. In a combined analysis of 171 subtype B and C transmission events, we found that infection with more than one variant does not follow a Poisson distribution, indicating that transmission of individual virions cannot be seen as independent events, each occurring with low probability. While most transmissions resulted from a single infectious unit, multiple variant transmissions represent a significant fraction of transmission events, suggesting that there may be important mechanistic differences between these groups that are not yet understood.

Journal ArticleDOI
TL;DR: In this article, a particular pattern of wartime violence, the relative absence of sexual violence on the part of many armed groups, has been explored, which has important policy implications: If s...
Abstract: This article explores a particular pattern of wartime violence, the relative absence of sexual violence on the part of many armed groups. This neglected fact has important policy implications: If s...

Journal ArticleDOI
TL;DR: It is found that the interaction strength between a pair of species is predicted well by simple functions of the two species' biomasses and the body mass of the species removed, and prediction accuracy increases with network size, suggesting that greater web complexity simplifies predicting interaction strengths.
Abstract: Darwin's classic image of an “entangled bank” of interdependencies among species has long suggested that it is difficult to predict how the loss of one species affects the abundance of others. We show that for dynamical models of realistically structured ecological networks in which pair-wise consumer-resource interactions allometrically scale to the ¾ power—as suggested by metabolic theory—the effect of losing one species on another can be predicted well by simple functions of variables easily observed in nature. By systematically removing individual species from 600 networks ranging from 10–30 species, we analyzed how the strength of 254,032 possible pair-wise species interactions depended on 90 stochastically varied species, link, and network attributes. We found that the interaction strength between a pair of species is predicted well by simple functions of the two species' biomasses and the body mass of the species removed. On average, prediction accuracy increases with network size, suggesting that greater web complexity simplifies predicting interaction strengths. Applied to field data, our model successfully predicts interactions dominated by trophic effects and illuminates the sign and magnitude of important nontrophic interactions.

Journal ArticleDOI
TL;DR: The molecular features of simian immunodeficiency virus (SIV) transmission in 18 experimentally infected Indian rhesus macaques are determined to validate the SIV–macaque mucosal infection model for HIV-1 vaccine and microbicide research.
Abstract: We recently developed a novel strategy to identify transmitted HIV-1 genomes in acutely infected humans using single-genome amplification and a model of random virus evolution. Here, we used this approach to determine the molecular features of simian immunodeficiency virus (SIV) transmission in 18 experimentally infected Indian rhesus macaques. Animals were inoculated intrarectally (i.r.) or intravenously (i.v.) with stocks of SIVmac251 or SIVsmE660 that exhibited sequence diversity typical of early-chronic HIV-1 infection. 987 full-length SIV env sequences (median of 48 per animal) were determined from plasma virion RNA 1–5 wk after infection. i.r. inoculation was followed by productive infection by one or a few viruses (median 1; range 1–5) that diversified randomly with near starlike phylogeny and a Poisson distribution of mutations. Consensus viral sequences from ramp-up and peak viremia were identical to viruses found in the inocula or differed from them by only one or a few nucleotides, providing direct evidence that early plasma viral sequences coalesce to transmitted/founder viruses. i.v. infection was >2,000-fold more efficient than i.r. infection, and viruses transmitted by either route represented the full genetic spectra of the inocula. These findings identify key similarities in mucosal transmission and early diversification between SIV and HIV-1, and thus validate the SIV–macaque mucosal infection model for HIV-1 vaccine and microbicide research.

Journal ArticleDOI
18 Dec 2009-Science
TL;DR: It is demonstrated that disruptive ecological selection favors the evolution of sexual preferences for ornaments that signal local adaptation, and thus natural and sexual selection work in concert to achieve local adaptation and reproductive isolation, even in the presence of substantial gene flow.
Abstract: Ecological speciation is considered an adaptive response to selection for local adaptation. However, besides suitable ecological conditions, the process requires assortative mating to protect the nascent species from homogenization by gene flow. By means of a simple model, we demonstrate that disruptive ecological selection favors the evolution of sexual preferences for ornaments that signal local adaptation. Such preferences induce assortative mating with respect to ecological characters and enhance the strength of disruptive selection. Natural and sexual selection thus work in concert to achieve local adaptation and reproductive isolation, even in the presence of substantial gene flow. The resulting speciation process ensues without the divergence of mating preferences, avoiding problems that have plagued previous models of speciation by sexual selection.

Journal ArticleDOI
TL;DR: The first part of a quantitative theory for the structure and dynamics of forests at demographic and resource steady state uses allometric scaling relations, based on metabolism and biomechanics, to quantify how trees use resources, fill space, and grow.
Abstract: We present the first part of a quantitative theory for the structure and dynamics of forests at demographic and resource steady state The theory uses allometric scaling relations, based on metabolism and biomechanics, to quantify how trees use resources, fill space, and grow These individual-level traits and properties scale up to predict emergent properties of forest stands, including size–frequency distributions, spacing relations, resource flux rates, and canopy configurations Two insights emerge from this analysis: (i) The size structure and spatial arrangement of trees in the entire forest are emergent manifestations of the way that functionally invariant xylem elements are bundled together to conduct water and nutrients up from the trunks, through the branches, to the leaves of individual trees (ii) Geometric and dynamic properties of trees in a forest and branches in trees scale identically, so that the entire forest can be described mathematically and behaves structurally and functionally like a scaled version of the branching networks in the largest tree This quantitative framework uses a small number of parameters to predict numerous structural and dynamical properties of idealized forests

Journal ArticleDOI
TL;DR: A mathematical model of a tripartite structure of users, resources, and tags-labels collaboratively applied by the users to the resources in order to impart meaningful structure on an otherwise undifferentiated database is proposed.
Abstract: In the last few years we have witnessed the emergence, primarily in online communities, of new types of social networks that require for their representation more complex graph structures than have been employed in the past. One example is the folksonomy, a tripartite structure of users, resources, and tags—labels collaboratively applied by the users to the resources in order to impart meaningful structure on an otherwise undifferentiated database. Here we propose a mathematical model of such tripartite structures that represents them as random hypergraphs. We show that it is possible to calculate many properties of this model exactly in the limit of large network size and we compare the results against observations of a real folksonomy, that of the online photography website Flickr. We show that in some cases the model matches the properties of the observed network well, while in others there are significant differences, which we find to be attributable to the practice of multiple tagging, i.e., the application by a single user of many tags to one resource or one tag to many resources

Journal ArticleDOI
TL;DR: This work compares levels of secondary extinctions in communities generated by four structural food-web models and a fifth null model in response to sequential primary species removals and finds increased robustness and decreased levels of web collapse are associated with increased diversity and increased complexity.
Abstract: Species loss in ecosystems can lead to secondary extinctions as a result of consumerresource relationships and other species interactions. We compare levels of secondary extinctions in communities ...

Journal ArticleDOI
TL;DR: In this paper, an extension of the theory of symmetry-breaking phase transitions which applies to phases with topological excitations described by quantum groups or modular tensor categories is presented.
Abstract: We investigate transitions between topologically ordered phases in two spatial dimensions induced by the condensation of a bosonic quasiparticle. To this end, we formulate an extension of the theory of symmetry-breaking phase transitions which applies to phases with topological excitations described by quantum groups or modular tensor categories. This enables us to deal with phases whose quasiparticles have noninteger quantum dimensions and obey braid statistics. Many examples of such phases can be constructed from two-dimensional rational conformal field theories, and we find that there is a beautiful connection between quantum group symmetry breaking and certain well-known constructions in conformal field theory, notably the coset construction, the construction of orbifold models, and more general conformal extensions. Besides the general framework, many representative examples are worked out in detail.

Journal ArticleDOI
TL;DR: The mechanistic theory, based on allometric scaling relations, is complementary to “demographic theory,” but is fundamentally different in approach and provides a quantitative baseline for understanding deviations from predictions due to other factors.
Abstract: Here, we present the second part of a quantitative theory for the structure and dynamics of forests under demographic and resource steady state. The theory is based on individual-level allometric scaling relations for how trees use resources, fill space, and grow. These scale up to determine emergent properties of diverse forests, including size-frequency distributions, spacing relations, canopy configurations, mortality rates, population dynamics, successional dynamics, and resource flux rates. The theory uniquely makes quantitative predictions for both stand-level scaling exponents and normalizations. We evaluate these predictions by compiling and analyzing macroecological datasets from several tropical forests. The close match between theoretical predictions and data suggests that forests are organized by a set of very general scaling rules. Our mechanistic theory is based on allometric scaling relations, is complementary to "demographic theory," but is fundamentally different in approach. It provides a quantitative baseline for understanding deviations from predictions due to other factors, including disturbance, variation in branching architecture, asymmetric competition, resource limitation, and other sources of mortality, which are not included in the deliberately simplified theory. The theory should apply to a wide range of forests despite large differences in abiotic environment, species diversity, and taxonomic and functional composition.

Journal ArticleDOI
25 May 2009-PLOS ONE
TL;DR: This work applies high-throughput sequencing to the V3 loop-coding region of env in samples collected from 4 chronically HIV-infected subjects in whom CCR5 antagonist (vicriviroc [VVC]) therapy failed, and results show greater V3 diversity was observed post-selection.
Abstract: High-throughput sequencing platforms provide an approach for detecting rare HIV-1 variants and documenting more fully quasispecies diversity. We applied this technology to the V3 loop-coding region of env in samples collected from 4 chronically HIV-infected subjects in whom CCR5 antagonist (vicriviroc [VVC]) therapy failed. Between 25,000–140,000 amplified sequences were obtained per sample. Profound baseline V3 loop sequence heterogeneity existed; predicted CXCR4-using populations were identified in a largely CCR5-using population. The V3 loop forms associated with subsequent virologic failure, either through CXCR4 use or the emergence of high-level VVC resistance, were present as minor variants at 0.8–2.8% of baseline samples. Extreme, rapid shifts in population frequencies toward these forms occurred, and deep sequencing provided a detailed view of the rapid evolutionary impact of VVC selection. Greater V3 diversity was observed post-selection. This previously unreported degree of V3 loop sequence diversity has implications for viral pathogenesis, vaccine design, and the optimal use of HIV-1 CCR5 antagonists.

Journal ArticleDOI
TL;DR: It is found that market impact is strongly concave, approximately increasing as the square root of order size, and as a given order is executed, the impact grows in time according to a power law.
Abstract: We empirically study the market impact of trading orders. We are specifically interested in large trading orders that are executed incrementally, which we call hidden orders. These are statistically reconstructed based on information about market member codes using data from the Spanish Stock Market and the London Stock Exchange. We find that market impact is strongly concave, approximately increasing as the square root of order size. Furthermore, as a given order is executed, the impact grows in time according to a power law; after the order is finished, it reverts to a level of about 0.5–0.7 of its value at its peak. We observe that hidden orders are executed at a rate that more or less matches trading in the overall market, except for small deviations at the beginning and end of the order.

Journal ArticleDOI
11 Mar 2009-PLOS ONE
TL;DR: Map of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.
Abstract: Background: Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science. Methodology: Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute’s Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences. Conclusions: Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.

Journal ArticleDOI
TL;DR: It is shown that an algorithm adapted from the one Google uses to rank web-pages can order species according to their importance for coextinctions, providing the sequence of losses that results in the fastest collapse of the network.
Abstract: A major challenge in ecology is forecasting the effects of species' extinctions, a pressing problem given current human impacts on the planet. Consequences of species losses such as secondary extinctions are difficult to forecast because species are not isolated, but interact instead in a complex network of ecological relationships. Because of their mutual dependence, the loss of a single species can cascade in multiple coextinctions. Here we show that an algorithm adapted from the one Google uses to rank web-pages can order species according to their importance for coextinctions, providing the sequence of losses that results in the fastest collapse of the network. Moreover, we use the algorithm to bridge the gap between qualitative (who eats whom) and quantitative (at what rate) descriptions of food webs. We show that our simple algorithm finds the best possible solution for the problem of assigning importance from the perspective of secondary extinctions in all analyzed networks. Our approach relies on network structure, but applies regardless of the specific dynamical model of species' interactions, because it identifies the subset of coextinctions common to all possible models, those that will happen with certainty given the complete loss of prey of a given predator. Results show that previous measures of importance based on the concept of “hubs” or number of connections, as well as centrality measures, do not identify the most effective extinction sequence. The proposed algorithm provides a basis for further developments in the analysis of extinction risk in ecosystems.

Journal ArticleDOI
TL;DR: Using different input data and methodology, the CVTree approach is a good complement to the standard methods and brings about more confidence to the current understanding of the fungal branch of TOL.
Abstract: Molecular phylogenetics and phylogenomics have greatly revised and enriched the fungal systematics in the last two decades. Most of the analyses have been performed by comparing single or multiple orthologous gene regions. Sequence alignment has always been an essential element in tree construction. These alignment-based methods (to be called the standard methods hereafter) need independent verification in order to put the fungal Tree of Life (TOL) on a secure footing. The ever-increasing number of sequenced fungal genomes and the recent success of our newly proposed alignment-free composition vector tree (CVTree, see Methods) approach have made the verification feasible. In all, 82 fungal genomes covering 5 phyla were obtained from the relevant genome sequencing centers. An unscaled phylogenetic tree with 3 outgroup species was constructed by using the CVTree method. Overall, the resultant phylogeny infers all major groups in accordance with standard methods. Furthermore, the CVTree provides information on the placement of several currently unsettled groups. Within the sub-phylum Pezizomycotina, our phylogeny places the Dothideomycetes and Eurotiomycetes as sister taxa. Within the Sordariomycetes, it infers that Magnaporthe grisea and the Plectosphaerellaceae are closely related to the Sordariales and Hypocreales, respectively. Within the Eurotiales, it supports that Aspergillus nidulans is the early-branching species among the 8 aspergilli. Within the Onygenales, it groups Histoplasma and Paracoccidioides together, supporting that the Ajellomycetaceae is a distinct clade from Onygenaceae. Within the sub-phylum Saccharomycotina, the CVTree clearly resolves two clades: (1) species that translate CTG as serine instead of leucine (the CTG clade) and (2) species that have undergone whole-genome duplication (the WGD clade). It places Candida glabrata at the base of the WGD clade. Using different input data and methodology, the CVTree approach is a good complement to the standard methods. The remarkable consistency between them has brought about more confidence to the current understanding of the fungal branch of TOL.

Journal ArticleDOI
TL;DR: The application of network theory to the analysis of interaction data reveals an unexpectedly complex picture of drug-target interactions, and confirms that the topology ofdrug-target networks depends implicitly on data completeness, drug properties, and target families.
Abstract: The availability of interaction data between small molecule drugs and protein targets has increased substantially in recent years. Using seven different databases, we were able to assemble a total of 4767 unique interactions between 802 drugs and 480 targets, which means that on average every drug is currently acknowledged to interact with 6 targets. The application of network theory to the analysis of these data reveals an unexpectedly complex picture of drug–target interactions. The results confirm that the topology of drug–target networks depends implicitly on data completeness, drug properties, and target families. The implications for drug discovery are discussed.

Journal ArticleDOI
TL;DR: This work investigates the diversification of HIV-1 env coding sequences in 81 very early B subtype infections previously shown to have resulted from transmission or expansion of single viruses or two closely related viruses, and highlights the role of CTL escape and hypermutation in shaping viral evolution during the establishment of new infections.
Abstract: The pattern of viral diversification in newly infected individuals provides information about the host environment and immune responses typically experienced by the newly transmitted virus. For example, sites that tend to evolve rapidly across multiple early-infection patients could be involved in enabling escape from common early immune responses, could represent adaptation for rapid growth in a newly infected host, or could represent reversion from less fit forms of the virus that were selected for immune escape in previous hosts. Here we investigated the diversification of HIV-1 env coding sequences in 81 very early B subtype infections previously shown to have resulted from transmission or expansion of single viruses (n = 78) or two closely related viruses (n = 3). In these cases, the sequence of the infecting virus can be estimated accurately, enabling inference of both the direction of substitutions as well as distinction between insertion and deletion events. By integrating information across multiple acutely infected hosts, we find evidence of adaptive evolution of HIV-1 env and identify a subset of codon sites that diversified more rapidly than can be explained by a model of neutral evolution. Of 24 such rapidly diversifying sites, 14 were either i) clustered and embedded in CTL epitopes that were verified experimentally or predicted based on the individual's HLA or ii) in a nucleotide context indicative of APOBEC-mediated G-to-A substitutions, despite having excluded heavily hypermutated sequences prior to the analysis. In several cases, a rapidly evolving site was embedded both in an APOBEC motif and in a CTL epitope, suggesting that APOBEC may facilitate early immune escape. Ten rapidly diversifying sites could not be explained by CTL escape or APOBEC hypermutation, including the most frequently mutated site, in the fusion peptide of gp41. We also examined the distribution, extent, and sequence context of insertions and deletions, and we provide evidence that the length variation seen in hypervariable loop regions of the envelope glycoprotein is a consequence of selection and not of mutational hotspots. Our results provide a detailed view of the process of diversification of HIV-1 following transmission, highlighting the role of CTL escape and hypermutation in shaping viral evolution during the establishment of new infections.