scispace - formally typeset
Search or ask a question

Showing papers by "Fereydoun Hormozdiari published in 2010"


01 Oct 2010
TL;DR: The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described.

599 citations


Journal ArticleDOI
TL;DR: In almost all recent structural variation discovery studies, short reads from a donor genome have been mapped to a reference genome as a first step, and the accuracy of such an SVD study is directly correlated to this mapping step, which also provides the main computational bottleneck of theSVD study.
Abstract: In addition to single-nucleotide variations and small insertions-deletions (indels), largersized structural variations (for example, insertions, deletions, inversions, segmental duplications and copy-number polymorphisms) contribute to human genetic diversity. In almost all recent structural variation discovery (SVD) studies, short reads from a donor genome have been mapped to a reference genome as a first step. The accuracy of such an SVD study is directly correlated to the accuracy of this mapping step, which also provides the main computational bottleneck of the SVD study.

326 citations


Journal ArticleDOI
TL;DR: This article provides a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies and demonstrates that the conflict resolution algorithm outperforms current state of the art algorithms when tested on the genome of the Yoruba African individual.
Abstract: Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact:eee@gs.washington.edu; cenk@cs.sfu.ca Supplementary information:Supplementary data are available at Bioinformatics online.

238 citations


Journal ArticleDOI
TL;DR: The NovelSeq framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly.
Abstract: Motivation: In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the ‘detectable’ sequence length with read-pair analysis is limited by the insert size. Thus, longer sequence insertions that contribute to our genetic makeup are not extensively researched. Results: We present NovelSeq: a computational framework to discover the content and location of long novel sequence insertions using paired-end sequencing data generated by the next-generation sequencing platforms. Our framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly. We apply our methods to detect novel sequence insertions in the genome of an anonymous donor and validate our results by comparing with the insertions discovered in the same genome using various sources of sequence data. Availability: The implementation of the NovelSeq pipeline is available at http://compbio.cs.sfu.ca/strvar.htm Contact:eee@gs.washington.edu; cenk@cs.sfu.ca

124 citations


Journal ArticleDOI
TL;DR: This work proposes novel strategies for identifying potential multiple-drug targets in pathogenic protein-protein interaction (PPI) networks with the goal of disrupting known pathways/complexes and describes two polynomial time algorithms with respective approximation factors.
Abstract: As pathogens evolve effective schemes to overcome the effect of antibiotics, the prevalent “one drug and one drug target” approach is falling behind. We propose novel strategies for identifying potential multiple-drug targets in pathogenic protein-protein interaction (PPI) networks with the goal of disrupting known pathways/complexes. Given a set S of pathogenic pathways/complexes, we first consider computing the minimum number of proteins (with no human orthologs) whose removal from the PPI network disrupts all pathways/complexes. Unfortunately, even the best approximation algorithms for this (NP-hard) problem return too many targets to be practical. Thus, we focus on computing the optimal tradeoff (i.e., maximum ratio) between the number of disrupted essential pathways/complexes and the protein targets. For this “sparsest cut” problem, we describe two polynomial time algorithms with respective approximation factors of |S| and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepac...

31 citations


Journal ArticleDOI
TL;DR: It is demonstrated for the first time, in a formally sound manner, that insertions and deletions cause more severe functional changes between proteins than substitutions, as measured with respect to Gene Ontology term similarity.
Abstract: Although insertions and deletions are a common type of evolutionary sequence variation, their origins and their functional consequences have not been comprehensively understood. Most alignment algorithms/programs only roughly reflect the evolutionary processes that result in gaps--which typically require further evaluation. Interestingly, it is widely believed that gaps are the predominant form of sequence variation resulting in structural and functional changes. Thus it is desirable to distinguish between gaps that reflect true point mutations and alignment artifacts when it comes to assessing the functional similarity of proteins based on computational alignments. Here we introduce pair hidden Markov model-based solutions to rapidly assess the statistical significance of gaps in alignments resulting from classical Needleman-Wunsch-like alignment procedures which implement affine gap penalty scoring schemes. Surprisingly, although it has a natural formulation, the emanating Markov chain problem had no known efficient solution thus far. In this article, we present the first efficient algorithm to solve it. We demonstrate that, when comparing paralogous protein pairs (from Escherichia coli) of equal alignment identity and similarity, alignments that contain gaps of significant length are significantly less similar in terms of functionality, as measured with respect to Gene Ontology (GO) term similarity. This demonstrates for the first time, in a formally sound manner, that insertions and deletions cause more severe functional changes between proteins than substitutions. Our method can be reliably employed to quickly filter alignment outputs for protein pairs that are more likely to be functionally similar and/or divergent and establishes a sound and useful add-on for large-scale alignment studies.

13 citations