scispace - formally typeset
Search or ask a question
Author

Cara Woodwark

Bio: Cara Woodwark is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: Copy-number variation & Single-nucleotide polymorphism. The author has an hindex of 4, co-authored 4 publications receiving 4200 citations.

Papers
More filters
Journal ArticleDOI
23 Nov 2006-Nature
TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.
Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

4,275 citations

Journal ArticleDOI
01 Nov 2009-Genetics
TL;DR: In humans, high population differentiation is an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.
Abstract: We have evaluated the extent to which SNPs identified by genomewide surveys as showing unusually high levels of population differentiation in humans have experienced recent positive selection, starting from a set of 32 nonsynonymous SNPs in 27 genes highlighted by the HapMap1 project. These SNPs were genotyped again in the HapMap samples and in the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) panel of 52 populations representing worldwide diversity; extended haplotype homozygosity was investigated around all of them, and full resequence data were examined for 9 genes (5 from public sources and 4 from new data sets). For 7 of the genes, genotyping errors were responsible for an artifactual signal of high population differentiation and for 2, the population differentiation did not exceed our significance threshold. For the 18 genes with confirmed high population differentiation, 3 showed evidence of positive selection as measured by unusually extended haplotypes within a population, and 7 more did in between-population analyses. The 9 genes with resequence data included 7 with high population differentiation, and 5 showed evidence of positive selection on the haplotype carrying the nonsynonymous SNP from skewed allele frequency spectra; in addition, 2 showed evidence of positive selection on unrelated haplotypes. Thus, in humans, high population differentiation is (apart from technical artifacts) an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.

55 citations

Journal ArticleDOI
26 May 2011-PLOS ONE
TL;DR: Looking into possible mechanisms for the regulation of gene expression, it is found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes.
Abstract: Background Due to the increased accuracy of Copy Number Variable region (CNV) break point mapping, it is now possible to say with a reasonable degree of confidence whether a gene (i) falls entirely within a CNV; (ii) overlaps the CNV or (iii) actually contains the CNV. We classify these as type I, II and III CNV genes respectively. Principal Findings Here we show that although type I genes vary in copy number along with the CNV, most of these type I genes have the same expression levels as wild type copy numbers of the gene. These genes must, therefore, be under homeostatic dosage compensation control. Looking into possible mechanisms for the regulation of gene expression we found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes. Type III genes, on the other hand, have a significant excess of genes regulated by miRNAs and are enriched for genes that are monoallelically expressed. Significance Many diseases and genomic disorders are associated with CNVs so a better understanding of the different ways genes are associated with normal CNVs will help focus on candidate genes in genome wide association studies.

24 citations

Journal ArticleDOI
TL;DR: The data indicate that even genes that are strong biological candidates for positive selection and show reproducible signatures of positive selection in SNP scans do not always show the same replicability of selection signals in other tests, which should be considered in future studies on detectingpositive selection in genetic data.
Abstract: Background Numerous genome-wide scans conducted by genotyping previously ascertained single-nucleotide polymorphisms (SNPs) have provided candidate signatures for positive selection in various regions of the human genome, including in genes involved in pigmentation traits. However, it is unclear how well the signatures discovered by such haplotype-based test statistics can be reproduced in tests based on full resequencing data. Four genes (oculocutaneous albinism II (OCA2), tyrosinase-related protein 1 (TYRP1), dopachrome tautomerase (DCT), and KIT ligand (KITLG)) implicated in human skin-color variation, have shown evidence for positive selection in Europeans and East Asians in previous SNP-scan data. In the current study, we resequenced 4.7 to 6.7 kb of DNA from each of these genes in Africans, Europeans, East Asians, and South Asians.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.
Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

13,008 citations

Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations

Journal ArticleDOI
06 Nov 2008-Nature
TL;DR: An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Abstract: DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

3,802 citations

Journal ArticleDOI
TL;DR: ABySS (Assembly By Short Sequences), a parallelized sequence assembler, was developed and assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc, representing 68% of the reference human genome.
Abstract: Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs > or =100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.

3,483 citations