scispace - formally typeset
Search or ask a question
Institution

Santa Fe Institute

NonprofitSanta Fe, New Mexico, United States
About: Santa Fe Institute is a nonprofit organization based out in Santa Fe, New Mexico, United States. It is known for research contribution in the topics: Population & Complex network. The organization has 558 authors who have published 4558 publications receiving 396015 citations. The organization is also known as: SFI.


Papers
More filters
03 Jul 1997
TL;DR: The problem of regression under Gaussian assumptions is treated in this paper, where the relationship between Bayesian prediction, regularization and smoothing is elucidated and the ideal regression is the posterior mean and its computation scales as O(n 3 ).
Abstract: The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n 3 ) , where n is the sample size. We show that the optimal m -dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.

114 citations

Journal ArticleDOI
09 Apr 2014-JAMA
TL;DR: Web data are potentially the only source for real-time insights into behavioral medicine, where web data can be available almost immediately compared to a 365-day lag time between annual surveys, and can be an important source for identifying new hypotheses.
Abstract: Digital footprints left on search engines, social media, and social networking sites can be aggregated and analyzed as health proxies, yielding anonymous and instantaneous insights. On the one hand, nearly all the existing work has focused on acute diseases. This means the value-added from web surveillance is reduced, because the effectiveness of even high profile systems, such as Google Flu Trends, have been found inferior to already strong traditional surveillance.1 On the other hand, the future of web surveillance is promising in an area where traditional surveillance is largely incomplete: behavioral medicine, a multidisciplinary field incorporating medicine, social science, and public health and focusing on health behaviors and mental health. The proportion of illness (or death) attributable to health behaviors or psychological well-being has steadily increased over the last half century, while surveillance of these outcomes has remained largely unchanged. Investigators simply ask people about their health on surveys. However, surveys have well-known limitations, such as respondents’ reluctance to participate, social desirability biases, difficulty in accurately reporting behaviors, long lags between data collection and availability, and provisions (sometimes legal) curtailing the inclusion of politically sensitive topics like gun violence. Most importantly, the expense of surveys means many topics are either not covered or covered restrictively (e.g., clinical depression screeners are included in the Behavioral Risk Factor Surveillance System just every other year). Given the current budget climate, survey capacity will likely worsen before it improves. To overcome these limits, behavioral medicine should now embrace web data. First, behavioral medicine requires observing behavior or the manifestation of mental health problems. Doing so online is easier, more comprehensive, and more effective than with surveys, because many outcomes are passively exhibited there. For example, one study showed how precise health concerns changed during the United States recession of December 2008 through 2011, by systematically selecting Google search queries and using the content of each query to describe the concern and the change in volume to describe concern prevalence. “Stomach ulcer symptoms,” for example, were 228% (95%CI, 35–363) higher than expected during the recession, with queries thematically related to arrhythmia, congestion, pain (including many foci like head, tooth and back) also elevated.2 This approach highlights how web data can reveal largely assumption-free insights, via systematic data generation of hundreds of possible outcomes rather than arbitrary a priori selection of a few outcomes by investigators. Second, web data reflects more than the individual, because social context can also be captured online. Online networks can reveal how mechanistic drivers such as social norms spread and influence population health. For example, social patterns in obesity promotion and suppression have been described by pooling Facebook posts that encourage television watching or going outdoors, which ultimately explained variability in neighborhood obesity rates.3 Moreover, social support concepts are often expressed in web data, like observing specific instances of caregiving and confidence on Twitter. As a result, online behavioral medicine can move away from understanding aggregation based purely on location and towards understanding health in the context of our human interconnectedness. Third, web data are potentially the only source for real-time insights into behavioral medicine, where web data can be available almost immediately compared to a 365-day lag time between annual surveys. By harnessing these data around social events or interventions, programs can be evaluated as they are implemented, hypothetically generating real-time feedback to maximize their effectiveness. Web data in this vein also hold promise for guiding investigator resources. In 2011, when tobacco journals were debating snus (a smokeless tobacco product), and funders were soliciting proposals to understand the snus pandemic, electronic cigarettes already attracted more searches on Google than any other smoking alternative, snus included.4 In this same way, web data can guide traditional surveillance, like vetting the inclusion of questions on surveys using online proxies. Fourth, given all hypotheses are based on some data, web data can be an important source for identifying new hypotheses. Many hypotheses in behavioral medicine can be traced directly to data availability and can appear ad hoc to lay audiences. Many studies have explored birthdate seasonality in mental health problems. Why? Birthdates are routinely found in traditional surveillance, while some mental health problems are too rare to assess incidence or increased severity seasonality. As a result, obvious questions are never explored, until now. Is schizophrenia seasonal? Online interest in schizophrenia and its symptoms – as well as 8 other outcomes - peak in the winter.5 What is the healthiest day? Online interest in quitting smoking across the globe is highest on Monday.6 Behavioral medicine needs to escape the confines of limited data to more fully specify the next frontier of research questions, and going online is one such escape. Fifth, it is beyond present scientific limits for a hypothetical arm to reach out of the screen to inoculate against infection. In behavioral medicine, however, substantial resources have been used to develop online interventions that treat or prevent illness with effectiveness equivalent to their offline counterparts. For example, as early as the mid-1990s, investigators implemented online programs to promote behavioral health. A meta-analysis found these programs relatively increased quitting smoking 44%,7 yet a research agenda for harnessing the surveillance potential of the web has not been articulated. Improving the online surveillance capacity means online interventions can be better disseminated via online screening or linking subjects to existing online treatments (i.e., what advertisements for an online program are most effective?). Sixth, some of the most effective interventions in behavioral medicine involve changes in public policy. Web data can identify alerts for policy changes and pathways for health advocacy. For instance, by archiving online media, places considering policy changes can be identified, and this information can then be passed onto advocacy groups. Case in point, Brazilian President Lula’s laryngeal cancer prompted broad changes in media coverage of tobacco control, and soon after, Brazil became the largest smoke-free nation to date.8 By prospectively analyzing news media content, advocacy resources may be more cost-effectively spent during opportunistic times, including events like Lula’s diagnosis, will be possible. A major criticism is that web data have sampling biases. However, such biases are increasingly eroding at the population level as more people go online. In addition, several studies have demonstrated that valid trends reflecting the entire population, and even subsets of the population, can be extracted from online data. For example, computer science has already developed approaches for identifying the gender, ethnicity or education associated with a Twitter account using the content of a user’s Tweets. Going forward, the research community may mimic these studies and validate methods for obtaining high quality, actionable information in behavioral medicine, then further realizing the comparative value of web data to traditional data. Billions of digital footprints from nearly all parts of the United States and from countries around the world provide a powerful opportunity to expand the evidence-base across medicine. However, for the above reasons and more related reasons yet to be expressed, behavioral medicine potentially has the most to gain from web data and could be essential to the broader web data revolution.

113 citations

Journal ArticleDOI
30 Dec 2009-PLOS ONE
TL;DR: This study demonstrates significant contrasts in the population structure of P. falciparum vaccine candidates that are consistent with the merozoite antigens being under stronger balancing selection than non-merozoite antIGens and suggesting that unique approaches to vaccine design will be required.
Abstract: The extensive diversity of Plasmodium falciparum antigens is a major obstacle to a broadly effective malaria vaccine but population genetics has rarely been used to guide vaccine design. We have completed a meta-population genetic analysis of the genes encoding ten leading P. falciparum vaccine antigens, including the pre-erythrocytic antigens csp, trap, lsa1 and glurp; the merozoite antigens eba175, ama1, msp's 1, 3 and 4, and the gametocyte antigen pfs48/45. A total of 4553 antigen sequences were assembled from published data and we estimated the range and distribution of diversity worldwide using traditional population genetics, Bayesian clustering and network analysis. Although a large number of distinct haplotypes were identified for each antigen, they were organized into a limited number of discrete subgroups. While the non-merozoite antigens showed geographically variable levels of diversity and geographic restriction of specific subgroups, the merozoite antigens had high levels of diversity globally, and a worldwide distribution of each subgroup. This shows that the diversity of the non-merozoite antigens is organized by physical or other location-specific barriers to gene flow and that of merozoite antigens by features intrinsic to all populations, one important possibility being the immune response of the human host. We also show that current malaria vaccine formulations are based upon low prevalence haplotypes from a single subgroup and thus may represent only a small proportion of the global parasite population. This study demonstrates significant contrasts in the population structure of P. falciparum vaccine candidates that are consistent with the merozoite antigens being under stronger balancing selection than non-merozoite antigens and suggesting that unique approaches to vaccine design will be required. The results of this study also provide a realistic framework for the diversity of these antigens to be incorporated into the design of next-generation malaria vaccines.

112 citations

Journal ArticleDOI
TL;DR: This work generalizes a previous trait-based framework to incorporate aspects of frequency dependence, functional complementarity, and the dynamics of systems composed of species that are defined by multiple traits that are tied to multiple environmental drivers, and constructs simple models to investigate two ecological problems.

112 citations

Journal ArticleDOI
TL;DR: In this paper, the first few levels of a hierarchy of complexity for two-or-more-dimensional patterns are studied, and several definitions of "regular language" or "local rule" that are equivalent in d = 1 lead to distinct classes in d ≥ 2.
Abstract: In dynamical systems such as cellular automata and iterated maps, it is often useful to look at a language or set of symbol sequences produced by the system. There are well-established classification schemes, such as the Chomsky hierarchy, with which we can measure the complexity of these sets of sequences, and thus the complexity of the systems which produce them. In this paper, we look at the first few levels of a hierarchy of complexity for two-or-more-dimensional patterns. We show that several definitions of “regular language” or “local rule” that are equivalent in d=1 lead to distinct classes in d≥2. We explore the closure properties and computational complexity of these classes, including undecidability and L, NL, and NP-completeness results. We apply these classes to cellular automata, in particular to their sets of fixed and periodic points, finite-time images, and limit sets. We show that it is undecidable whether a CA in d≥2 has a periodic point of a given period, and that certain “local lattice languages” are not finite-time images or limit sets of any CA. We also show that the entropy of a d-dimensional CA's finite-time image cannot decrease faster than t −d unless it maps every initial condition to a single homogeneous state.

112 citations


Authors

Showing all 606 results

NameH-indexPapersCitations
James Hone127637108193
James H. Brown12542372040
Alan S. Perelson11863266767
Mark Newman117348168598
Bette T. Korber11739249526
Marten Scheffer11135073789
Peter F. Stadler10390156813
Sanjay Jain10388146880
Henrik Jeldtoft Jensen102128648138
Dirk Helbing10164256810
Oliver G. Pybus10044745313
Andrew P. Dobson9832244211
Carel P. van Schaik9432926908
Seth Lloyd9249050159
Andrew W. Lo8537851440
Network Information
Related Institutions (5)
Massachusetts Institute of Technology
268K papers, 18.2M citations

90% related

University of Oxford
258.1K papers, 12.9M citations

90% related

Princeton University
146.7K papers, 9.1M citations

89% related

Max Planck Society
406.2K papers, 19.5M citations

89% related

University of California, Berkeley
265.6K papers, 16.8M citations

89% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202341
202241
2021297
2020309
2019263
2018231