Showing papers by "Fereydoun Hormozdiari published in 2010"

PDF

Open Access

A map of human genome variation from population-scale sequencing

[...]

Richard Durbin, David Altshuler, Gonçalo R. Abecasis, David R. Bentley +358 more

01 Oct 2010

TL;DR: The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described.

...read moreread less

599 citations

Journal Article•DOI•

mrsFAST: a cache-oblivious algorithm for short-read mapping

[...]

Faraz Hach¹, Fereydoun Hormozdiari¹, Can Alkan², Can Alkan³, Farhad Hormozdiari¹, Inanc Birol⁴, Inanc Birol¹, Evan E. Eichler², Evan E. Eichler³, S. Cenk Sahinalp², S. Cenk Sahinalp¹ - Show less +7 more•Institutions (4)

Simon Fraser University¹, University of Washington², Howard Hughes Medical Institute³, BC Cancer Agency⁴

01 Aug 2010-Nature Methods

TL;DR: In almost all recent structural variation discovery studies, short reads from a donor genome have been mapped to a reference genome as a first step, and the accuracy of such an SVD study is directly correlated to this mapping step, which also provides the main computational bottleneck of theSVD study.

...read moreread less

Abstract: In addition to single-nucleotide variations and small insertions-deletions (indels), largersized structural variations (for example, insertions, deletions, inversions, segmental duplications and copy-number polymorphisms) contribute to human genetic diversity. In almost all recent structural variation discovery (SVD) studies, short reads from a donor genome have been mapped to a reference genome as a first step. The accuracy of such an SVD study is directly correlated to the accuracy of this mapping step, which also provides the main computational bottleneck of the SVD study.

...read moreread less

326 citations

Journal Article•DOI•

Next-generation VariationHunter

[...]

Fereydoun Hormozdiari¹, Iman Hajirasouliha¹, Phuong Dao¹, Faraz Hach¹, Deniz Yorukoglu¹, Can Alkan², Evan E. Eichler², S. Cenk Sahinalp¹ - Show less +4 more•Institutions (2)

Simon Fraser University¹, Howard Hughes Medical Institute²

01 Jun 2010-Bioinformatics

TL;DR: This article provides a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies and demonstrates that the conflict resolution algorithm outperforms current state of the art algorithms when tested on the genome of the Yoruba African individual.

...read moreread less

Abstract: Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact:eee@gs.washington.edu; cenk@cs.sfu.ca Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

238 citations

Journal Article•DOI•

Detection and characterization of novel sequence insertions using paired-end next-generation sequencing

[...]

Iman Hajirasouliha¹, Fereydoun Hormozdiari¹, Can Alkan², Can Alkan³, Jeffrey M. Kidd⁴, Inanc Birol⁵, Inanc Birol¹, Evan E. Eichler², Evan E. Eichler³, S. Cenk Sahinalp¹ - Show less +6 more•Institutions (5)

Simon Fraser University¹, Howard Hughes Medical Institute², University of Washington³, Stanford University⁴, University of British Columbia⁵

01 May 2010-Bioinformatics

TL;DR: The NovelSeq framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly.

...read moreread less

Abstract: Motivation: In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the ‘detectable’ sequence length with read-pair analysis is limited by the insert size. Thus, longer sequence insertions that contribute to our genetic makeup are not extensively researched. Results: We present NovelSeq: a computational framework to discover the content and location of long novel sequence insertions using paired-end sequencing data generated by the next-generation sequencing platforms. Our framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly. We apply our methods to detect novel sequence insertions in the genome of an anonymous donor and validate our results by comparing with the insertions discovered in the same genome using various sources of sequence data. Availability: The implementation of the NovelSeq pipeline is available at http://compbio.cs.sfu.ca/strvar.htm Contact:eee@gs.washington.edu; cenk@cs.sfu.ca

...read moreread less

124 citations

Journal Article•DOI•

Protein-protein interaction network evaluation for identifying potential drug targets.

[...]

Fereydoun Hormozdiari¹, Raheleh Salari¹, Vineet Bafna², S. Cenk Sahinalp¹•Institutions (2)

Simon Fraser University¹, University of California, San Diego²

25 May 2010-Journal of Computational Biology

TL;DR: This work proposes novel strategies for identifying potential multiple-drug targets in pathogenic protein-protein interaction (PPI) networks with the goal of disrupting known pathways/complexes and describes two polynomial time algorithms with respective approximation factors.

...read moreread less

Abstract: As pathogens evolve effective schemes to overcome the effect of antibiotics, the prevalent “one drug and one drug target” approach is falling behind. We propose novel strategies for identifying potential multiple-drug targets in pathogenic protein-protein interaction (PPI) networks with the goal of disrupting known pathways/complexes. Given a set S of pathogenic pathways/complexes, we first consider computing the minimum number of proteins (with no human orthologs) whose removal from the PPI network disrupts all pathways/complexes. Unfortunately, even the best approximation algorithms for this (NP-hard) problem return too many targets to be practical. Thus, we focus on computing the optimal tradeoff (i.e., maximum ratio) between the number of disrupted essential pathways/complexes and the protein targets. For this “sparsest cut” problem, we describe two polynomial time algorithms with respective approximation factors of |S| and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepac...

...read moreread less

31 citations

Journal Article•DOI•

Towards improved assessment of functional similarity in large-scale screens: a study on indel length.

[...]

Alexander Schönhuth¹, Raheleh Salari¹, Fereydoun Hormozdiari¹, Artem Cherkasov², S. Cenk Sahinalp¹ - Show less +1 more•Institutions (2)

Simon Fraser University¹, University of British Columbia²

18 Jan 2010-Journal of Computational Biology

TL;DR: It is demonstrated for the first time, in a formally sound manner, that insertions and deletions cause more severe functional changes between proteins than substitutions, as measured with respect to Gene Ontology term similarity.

...read moreread less

Abstract: Although insertions and deletions are a common type of evolutionary sequence variation, their origins and their functional consequences have not been comprehensively understood. Most alignment algorithms/programs only roughly reflect the evolutionary processes that result in gaps--which typically require further evaluation. Interestingly, it is widely believed that gaps are the predominant form of sequence variation resulting in structural and functional changes. Thus it is desirable to distinguish between gaps that reflect true point mutations and alignment artifacts when it comes to assessing the functional similarity of proteins based on computational alignments. Here we introduce pair hidden Markov model-based solutions to rapidly assess the statistical significance of gaps in alignments resulting from classical Needleman-Wunsch-like alignment procedures which implement affine gap penalty scoring schemes. Surprisingly, although it has a natural formulation, the emanating Markov chain problem had no known efficient solution thus far. In this article, we present the first efficient algorithm to solve it. We demonstrate that, when comparing paralogous protein pairs (from Escherichia coli) of equal alignment identity and similarity, alignments that contain gaps of significant length are significantly less similar in terms of functionality, as measured with respect to Gene Ontology (GO) term similarity. This demonstrates for the first time, in a formally sound manner, that insertions and deletions cause more severe functional changes between proteins than substitutions. Our method can be reliably employed to quickly filter alignment outputs for protein pairs that are more likely to be functionally similar and/or divergent and establishes a sound and useful add-on for large-scale alignment studies.

...read moreread less

13 citations