scispace - formally typeset
Search or ask a question
Author

Vikas Bansal

Bio: Vikas Bansal is an academic researcher from Mayo Clinic. The author has contributed to research in topics: Medicine & Odds ratio. The author has an hindex of 43, co-authored 184 publications receiving 23455 citations. Previous affiliations of Vikas Bansal include CERN & Washington University in St. Louis.
Topics: Medicine, Odds ratio, KEKB, Branching fraction, Genome


Papers
More filters
Journal ArticleDOI
Georges Aad1, T. Abajyan2, Brad Abbott3, Jalal Abdallah4  +2964 moreInstitutions (200)
TL;DR: In this article, a search for the Standard Model Higgs boson in proton-proton collisions with the ATLAS detector at the LHC is presented, which has a significance of 5.9 standard deviations, corresponding to a background fluctuation probability of 1.7×10−9.

9,282 citations

Book
Georges Aad1, E. Abat2, Jalal Abdallah3, Jalal Abdallah4  +3029 moreInstitutions (164)
23 Feb 2020
TL;DR: The ATLAS detector as installed in its experimental cavern at point 1 at CERN is described in this paper, where a brief overview of the expected performance of the detector when the Large Hadron Collider begins operation is also presented.
Abstract: The ATLAS detector as installed in its experimental cavern at point 1 at CERN is described in this paper. A brief overview of the expected performance of the detector when the Large Hadron Collider begins operation is also presented.

3,111 citations

Journal ArticleDOI
TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.
Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

1,843 citations

Journal ArticleDOI
Georges Aad1, Brad Abbott1, Jalal Abdallah1, A. A. Abdelalim1  +2582 moreInstitutions (23)
TL;DR: The simulation software for the ATLAS Experiment at the Large Hadron Collider is being used for large-scale production of events on the LHC Computing Grid, including supporting the detector description, interfacing the event generation, and combining the GEANT4 simulation of the response of the individual detectors.
Abstract: The simulation software for the ATLAS Experiment at the Large Hadron Collider is being used for large-scale production of events on the LHC Computing Grid. This simulation requires many components, from the generators that simulate particle collisions, through packages simulating the response of the various detectors and triggers. All of these components come together under the ATLAS simulation infrastructure. In this paper, that infrastructure is discussed, including that supporting the detector description, interfacing the event generation, and combining the GEANT4 simulation of the response of the individual detectors. Also described are the tools allowing the software validation, performance testing, and the validation of the simulated output against known physics processes.

1,514 citations

Journal ArticleDOI
M. Huschle1, T. Kuhr2, M. Heck1, P. Goldenzweig1  +218 moreInstitutions (64)
TL;DR: In this paper, the branching fraction ratio R(D)(()*()) of (B) over bar → D-(*())tau(-)(nu)over bar (tau) relative to (B), where l = e or mu, was measured using the full Belle data sample.
Abstract: We report a measurement of the branching fraction ratios R(D)(()*()) of (B) over bar -> D-(*())tau(-)(nu) over bar (tau) relative to (B) over bar -> D-(*())l(-)(nu) over barl (where l = e or mu) using the full Belle data sample of 772 x 10(6)B (B) over bar pairs collected at the Upsilon(4S) resonance with the Belle detector at the KEKB asymmetric-energy e(+)e(-) collider. The measured values are R(D) = 0.375 +/- 0.064(stat) +/- 0.026(syst) and R(D*) = 0.293 +/- 0.038 (stat) +/- 0.015 (syst). The analysis uses hadronic reconstruction of the tag-side B meson and purely leptonic t decays. The results are consistent with earlier measurements and do not show a significant deviation from the standard model prediction.

652 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: TopHat2 is described, which incorporates many significant enhancements to TopHat, and combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes.
Abstract: TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.

11,380 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
Georges Aad1, T. Abajyan2, Brad Abbott3, Jalal Abdallah4  +2964 moreInstitutions (200)
TL;DR: In this article, a search for the Standard Model Higgs boson in proton-proton collisions with the ATLAS detector at the LHC is presented, which has a significance of 5.9 standard deviations, corresponding to a background fluctuation probability of 1.7×10−9.

9,282 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations