Author
Erkan Ozge Buzbas
Other affiliations: University of Michigan, Stanford University
Bio: Erkan Ozge Buzbas is an academic researcher from University of Idaho. The author has contributed to research in topics: Population & Approximate Bayesian computation. The author has an hindex of 9, co-authored 25 publications receiving 562 citations. Previous affiliations of Erkan Ozge Buzbas include University of Michigan & Stanford University.
Papers
More filters
••
TL;DR: In this paper, a statistical test based on a measure of haplotype homozygosity (H12) was developed to detect both hard and soft sweeps with similar power, and they used H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the DGRP.
Abstract: Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.
394 citations
••
TL;DR: It is shown that the scientific process may not converge to truth even if scientific results are reproducible and that irreproducible results do not necessarily imply untrue results.
Abstract: Consistent confirmations obtained independently of each other lend credibility to a scientific result. We refer to results satisfying this consistency as reproducible and assume that reproducibility is a desirable property of scientific discovery. Yet seemingly science also progresses despite irreproducible results, indicating that the relationship between reproducibility and other desirable properties of scientific discovery is not well understood. These properties include early discovery of truth, persistence on truth once it is discovered, and time spent on truth in a long-term scientific inquiry. We build a mathematical model of scientific discovery that presents a viable framework to study its desirable properties including reproducibility. In this framework, we assume that scientists adopt a model-centric approach to discover the true model generating data in a stochastic process of scientific discovery. We analyze the properties of this process using Markov chain theory, Monte Carlo methods, and agent-based modeling. We show that the scientific process may not converge to truth even if scientific results are reproducible and that irreproducible results do not necessarily imply untrue results. The proportion of different research strategies represented in the scientific population, scientists’ choice of methodology, the complexity of truth, and the strength of signal contribute to this counter-intuitive finding. Important insights include that innovative research speeds up the discovery of scientific truth by facilitating the exploration of model space and epistemic diversity optimizes across desirable properties of scientific discovery.
51 citations
••
TL;DR: A formal statistical analysis of three popular claims in the metascientific literature is presented, showing how the use and benefits of such formalism can inform and shape debates about such methodological claims.
Abstract: Current attempts at methodological reform in sciences come in response to an overall lack of rigor in methodological and scientific practices in experimental sciences. However, most methodological reform attempts suffer from similar mistakes and over-generalizations to the ones they aim to address. We argue that this can be attributed in part to lack of formalism and first principles. Considering the costs of allowing false claims to become canonized, we argue for formal statistical rigor and scientific nuance in methodological reform. To attain this rigor and nuance, we propose a five-step formal approach for solving methodological problems. To illustrate the use and benefits of such formalism, we present a formal statistical analysis of three popular claims in the metascientific literature: (a) that reproducibility is the cornerstone of science; (b) that data must not be used twice in any analysis; and (c) that exploratory projects imply poor statistical practice. We show how our formal approach can inform and shape debates about such methodological claims.
51 citations
••
TL;DR: To the Editor: Human embryonic stem-cell research may lead to new methods of drug discovery, insights into mechanisms of disease, and eventually, cellular therapies, but investigators have been unable to target their research to diverse subgroups of existing lines or to ensure the inclusion of lines from the human populations most relevant to their diseases of interest.
Abstract: To the Editor: Human embryonic stem-cell research may lead to new methods of drug discovery, insights into mechanisms of disease, and eventually, cellular therapies. The potential benefit to patient populations may depend partially on the diversity of the stem-cell lines that are available for research and clinical use. However, investigators have been unable to target their research to diverse subgroups of existing lines or to ensure the inclusion of lines from the human populations most relevant to their diseases of interest, because almost no information has been available on the human population origin of existing stem-cell lines. Therefore, with the . . .
46 citations
••
TL;DR: This work presents "approximate approximate Bayesian computation" (AABC), a class of computationally fast inference methods that extends ABC to models in which simulating data is expensive, and demonstrates the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations.
36 citations
Cited by
More filters
•
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.
11,521 citations
••
TL;DR: The ability to restore pluripotency to somatic cells through the ectopic co-expression of reprogramming factors has created powerful new opportunities for modelling human diseases and offers hope for personalized regenerative cell therapies.
Abstract: The field of stem-cell biology has been catapulted forward by the startling development of reprogramming technology. The ability to restore pluripotency to somatic cells through the ectopic co-expression of reprogramming factors has created powerful new opportunities for modelling human diseases and offers hope for personalized regenerative cell therapies. While the field is racing ahead, some researchers are pausing to evaluate whether induced pluripotent stem cells are indeed the true equivalents of embryonic stem cells and whether subtle differences between these types of cell might affect their research applications and therapeutic potential.
1,064 citations
••
[...]
TL;DR: To study the operational behaviour of λ-terms, this work will use the denotational (mathematical) approach to choose a space of semantics values, or denotations, where terms are to be interpreted.
Abstract: To study the operational behaviour of λ-terms, we will use the denotational (mathematical) approach. A denotational semantics for a language is based on the choice of a space of semantics values, or denotations, where terms are to be interpreted. Choosing a space with nice mathematical properties can help in proving the semantic properties of terms, since to this aim standard mathematical techniques can be used.
880 citations
01 Jan 2013
TL;DR: Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: to reproduce or to verify research, to make results of publicly funded research available to the public, to enable others to ask new questions of extant data, and to advance the state of research and innovation.
Abstract: We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation. (Hanson, Sugden, & Alberts)
Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics. Data sharing practices in other fields vary widely. Moreover, research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice. © 2012 Wiley Periodicals, Inc.
634 citations
••
University of Sheffield1, Newcastle upon Tyne Hospitals NHS Foundation Trust2, Agency for Science, Technology and Research3, Royan Institute4, Stanford University5, Boston Children's Hospital6, University of Nottingham7, University of Southern California8, Hebrew University of Jerusalem9, Tel Aviv Sourasky Medical Center10, Cedars-Sinai Medical Center11, University of Geneva12, Manchester Academic Health Science Centre13, University of Manchester14, Genome Institute of Singapore15, Seoul National University16, Harvard University17, University of Edinburgh18, Masaryk University19, WiCell20, University of São Paulo21, Central South University22, University College London23, Karolinska Institutet24, Jawaharlal Nehru Centre for Advanced Scientific Research25, Kyoto University26, Shanghai Jiao Tong University27, Kurchatov Institute28, Russian Academy of Sciences29, Vrije Universiteit Brussel30, King's College London31, Leiden University32, University of Helsinki33, Yale University34, Hospital for Sick Children35, University of New South Wales36, University of Tampere37, Commonwealth Scientific and Industrial Research Organisation38
TL;DR: Of these genes, BCL2L1 is a strong candidate for driving culture adaptation of ES cells, and single-nucleotide polymorphism analysis revealed that they included representatives of most major ethnic groups.
Abstract: The International Stem Cell Initiative analyzed 125 human embryonic stem (ES) cell lines and 11 induced pluripotent stem (iPS) cell lines, from 38 laboratories worldwide, for genetic changes occurring during culture. Most lines were analyzed at an early and late passage. Single-nucleotide polymorphism (SNP) analysis revealed that they included representatives of most major ethnic groups. Most lines remained karyotypically normal, but there was a progressive tendency to acquire changes on prolonged culture, commonly affecting chromosomes 1, 12, 17 and 20. DNA methylation patterns changed haphazardly with no link to time in culture. Structural variants, determined from the SNP arrays, also appeared sporadically. No common variants related to culture were observed on chromosomes 1, 12 and 17, but a minimal amplicon in chromosome 20q11.21, including three genes expressed in human ES cells, ID1, BCL2L1 and HM13, occurred in >20% of the lines. Of these genes, BCL2L1 is a strong candidate for driving culture adaptation of ES cells.
506 citations