scispace - formally typeset
Search or ask a question
Journal ArticleDOI

In the name of the father: surnames and genetics

01 Jun 2001-Trends in Genetics (Elsevier)-Vol. 17, Iss: 6, pp 353-357
TL;DR: Recent studies involving Y-chromosomal haplotyping and surname analysis are promising and indicate that genealogists of the future could be turning to records written in DNA, as well as in paper archives, to solve their problems.
About: This article is published in Trends in Genetics.The article was published on 2001-06-01. It has received 191 citations till now. The article focuses on the topics: Patronymic surname.
Citations
More filters
Journal ArticleDOI
TL;DR: A large number of new, highly polymorphic Y-chromosomal microsatellites are now available for population-genetic, evolutionary, genealogical, and forensic investigations.
Abstract: We have screened the nearly complete DNA sequence of the human Y chromosome for microsatellites (short tandem repeats) that meet the criteria of having a repeat-unit size of ⩾3 and a repeat count of ⩾8 and thus are likely to be easy to genotype accurately and to be polymorphic. Candidate loci were tested in silico for novelty and for probable Y specificity, and then they were tested experimentally to identify Y-specific loci and to assess their polymorphism. This yielded 166 useful new Y-chromosomal microsatellites, 139 of which were polymorphic, in a sample of eight diverse Y chromosomes representing eight Y-SNP haplogroups. This large sample of microsatellites, together with 28 previously known markers analyzed here—all sharing a common evolutionary history—allowed us to investigate the factors influencing their variation. For simple microsatellites, the average repeat count accounted for the highest proportion of repeat variance (∼34%). For complex microsatellites, the largest proportion of the variance (again, ∼34%) was explained by the average repeat count of the longest homogeneous array, which normally is variable. In these complex microsatellites, the additional repeats outside the longest homogeneous array significantly increased the variance, but this was lower than the variance of a simple microsatellite with the same total repeat count. As a result of this work, a large number of new, highly polymorphic Y-chromosomal microsatellites are now available for population-genetic, evolutionary, genealogical, and forensic investigations.

240 citations


Cites background from "In the name of the father: surnames..."

  • ...1997); they are also used increasingly in genealogical research (Jobling 2001) and find a major application in evolutionary studies as markers for male lineages (Kayser et al....

    [...]

  • ...0002-9297/2004/7406-0012$15.00 et al. 1997; Kayser et al. 1997); they are also used increasingly in genealogical research (Jobling 2001) and find a major application in evolutionary studies as markers for male lineages (Kayser et al. 2001; Stumpf and Goldstein 2001; Jobling and Tyler-Smith 2003)....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a meta-analyses of eight major databases of genetic information across human populations that show clear trends in the growth and development of genetic ancestry testing in the United States.
Abstract: Public demand and the development of large public and private databases of genetic information across human populations has encouraged the development of the new and rapidly growing field of genetic ancestry testing. Both the promise of the science that underlies this field and a lack of a full understanding of its limitations have fuelled the increased public interest in genetic ancestry testing.

226 citations

Journal ArticleDOI
TL;DR: An enumeration of the challenges for genome data privacy is enumerated and a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward is presented.
Abstract: Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.

207 citations

Journal ArticleDOI
TL;DR: Several approaches have been proposed to classify populations into ethnic groups using people's names, as an alternative to ethnicity self-identification information when this is not available as discussed by the authors, and a systematic review has been conducted of the most representative studies that develop new name-based ethnicity classifications, extracting methodological commonalities, achievements and shortcomings.
Abstract: Several approaches have been proposed to classify populations into ethnic groups using people's names, as an alternative to ethnicity self-identification information when this is not available. These methodologies have been developed, primarily in the public health and population genetics literature in different countries, in isolation from and with little participation from demographers or social scientists. The objective of this paper is to bring together these isolated efforts and provide a coherent comparison, a common methodology and terminology in order to foster new research and applications in this promising and multidisciplinary field. A systematic review has been conducted of the most representative studies that develop new name-based ethnicity classifications, extracting methodological commonalities, achievements and shortcomings; 13 studies met the inclusion criteria and all followed a very similar methodology to create a name reference list with which to classify populations into a few most common ethnic groups. The different classifications' sensitivity varies between 0.67 and 0.95, their specificity between 0.80 and 1, their positive predicted value between 0.70 and 0.96, and their negative predicted value between 0.96 and 1. Name-based ethnicity classification systems have a great potential to overcome data scarcity issues in a wide variety of key topics in population studies, as is proved by the 13 papers analysed. Their current limitations are mainly due to a restricted number of names and a partial spatio-temporal coverage of the reference population data-sets used to produce name reference lists. Improved classifications with extensive population coverage and higher classification accuracy levels will be achieved by using population registers with wider spatio-temporal coverage. Furthermore, there is a requirement for such new classifications to include all of the potential ethnic groups present in a society, and not just one or a few of them. Copyright (c) 2007 John Wiley & Sons, Ltd.

172 citations


Additional excerpts

  • ...The 13 studies list a series of issues and limitations, many of them common between them, and that have been summarised here complementing them with other studies (Jobling, 2001, Senior and Bhopal, 1994) under the following eight major themes: a) Temporal differences in name distribution between the reference and the target populations; because of different migration waves and variations in the geographic distribution patterns through time, which introduces misclassification and low coverage in the classification....

    [...]

  • ...Today, surnames have been demonstrated to correlate well with Y-chromosomes, since both are patrilinearly inherited (Jobling, 2001, McEvoy and Bradley, 2006), and this is opening up a new era of genetic genealogy (Shriver and Kittles, 2004)....

    [...]

  • ...Population, Space and Place 11(2): 69-74 Jobling MA. 2001....

    [...]

Journal ArticleDOI
TL;DR: Combining molecular genetics and surname analysis illuminates population structure and history, has potential applications in forensic studies and, in the form of 'genetic genealogy', is an area of rapidly growing interest for the public.

163 citations


Cites background from "In the name of the father: surnames..."

  • ...However, as the number of STRs increases, so too does the probability of detecting an STR mutation between close relatives [ 29 ], and this needs to be taken into account....

    [...]

  • ...The link between surname and Y-chromosomal haplotype suggests the idea of predicting a surname in forensic investigations [ 29 ]....

    [...]

References
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
TL;DR: A method for constructing networks from recombination-free population data that combines features of Kruskal's algorithm for finding minimum spanning trees by favoring short connections, and Farris's maximum-parsimony (MP) heuristic algorithm, which sequentially adds new vertices called "median vectors", except that the MJ method does not resolve ties.
Abstract: Reconstructing phylogenies from intraspecific data (such as human mitochondrial DNA variation) is often a challenging task because of large sample sizes and small genetic distances between individuals. The resulting multitude of plausible trees is best expressed by a network which displays alternative potential evolutionary paths in the form of cycles. We present a method ("median joining" [MJ]) for constructing networks from recombination-free population data that combines features of Kruskal's algorithm for finding minimum spanning trees by favoring short connections, and Farris's maximum-parsimony (MP) heuristic algorithm, which sequentially adds new vertices called "median vectors", except that our MJ method does not resolve ties. The MJ method is hence closely related to the earlier approach of Foulds, Hendy, and Penny for estimating MP trees but can be adjusted to the level of homoplasy by setting a parameter epsilon. Unlike our earlier reduced median (RM) network method, MJ is applicable to multistate characters (e.g., amino acid sequences). An additional feature is the speed of the implemented algorithm: a sample of 800 worldwide mtDNA hypervariable segment I sequences requires less than 3 h on a Pentium 120 PC. The MJ method is demonstrated on a Tibetan mitochondrial DNA RFLP data set.

9,937 citations

Journal ArticleDOI
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Abstract: A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm’s speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human β T cell receptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface at c3.biomath.mssm.edu/trf.html has been established for automated use of the program.

6,577 citations

Journal ArticleDOI
TL;DR: This book aims to provide a history of Chinese modern art from 17th Century to the present day through the lens of 20th Century critics, practitioners, journalists, and mediaeval and modern-day critics.
Abstract: J. Craig Venter,* Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman, Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gangadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor Miklos, Catherine Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder, Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Randall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Brandon, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuoming Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E. Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J. Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Milshina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua Yan, Alison Yao, Jane Ye, Ming Zhan, Weiqing Zhang, Hongyu Zhang, Qi Zhao, Liansheng Zheng, Fei Zhong, Wenyan Zhong, Shiaoping C. Zhu, Shaying Zhao, Dennis Gilbert, Suzanna Baumhueter, Gene Spier, Christine Carter, Anibal Cravchik, Trevor Woodage, Feroze Ali, Huijin An, Aderonke Awe, Danita Baldwin, Holly Baden, Mary Barnstead, Ian Barrow, Karen Beeson, Dana Busam, Amy Carver, Angela Center, Ming Lai Cheng, Liz Curry, Steve Danaher, Lionel Davenport, Raymond Desilets, Susanne Dietz, Kristina Dodson, Lisa Doup, Steven Ferriera, Neha Garg, Andres Gluecksmann, Brit Hart, Jason Haynes, Charles Haynes, Cheryl Heiner, Suzanne Hladun, Damon Hostin, Jarrett Houck, Timothy Howland, Chinyere Ibegwam, Jeffery Johnson, Francis Kalush, Lesley Kline, Shashi Koduru, Amy Love, Felecia Mann, David May, Steven McCawley, Tina McIntosh, Ivy McMullen, Mee Moy, Linda Moy, Brian Murphy, Keith Nelson, Cynthia Pfannkoch, Eric Pratts, Vinita Puri, Hina Qureshi, Matthew Reardon, Robert Rodriguez, Yu-Hui Rogers, Deanna Romblad, Bob Ruhfel, Richard Scott, Cynthia Sitter, Michelle Smallwood, Erin Stewart, Renee Strong, Ellen Suh, Reginald Thomas, Ni Ni Tint, Sukyee Tse, Claire Vech, Gary Wang, Jeremy Wetter, Sherita Williams, Monica Williams, Sandra Windsor, Emily Winn-Deen, Keriellen Wolfe, Jayshree Zaveri, Karena Zaveri, Josep F. Abril, Roderic Guigó, Michael J. Campbell, Kimmen V. Sjolander, Brian Karlak, Anish Kejariwal, Huaiyu Mi, Betty Lazareva, Thomas Hatton, Apurva Narechania, Karen Diemer, Anushya Muruganujan, Nan Guo, Shinji Sato, Vineet Bafna, Sorin Istrail, Ross Lippert, Russell Schwartz, Brian Walenz, Shibu Yooseph, David Allen, Anand Basu, James Baxendale, Louis Blick, Marcelo Caminha, John Carnes-Stine, Parris Caulk, Yen-Hui Chiang, My Coyne, Carl Dahlke, Anne Deslattes Mays, Maria Dombroski, Michael Donnelly, Dale Ely, Shiva Esparham, Carl Fosler, Harold Gire, Stephen Glanowski, Kenneth Glasser, Anna Glodek, Mark Gorokhov, Ken Graham, Barry Gropman, Michael Harris, Jeremy Heil, Scott Henderson, Jeffrey Hoover, Donald Jennings, Catherine Jordan, James Jordan, John Kasha, Leonid Kagan, Cheryl Kraft, Alexander Levitsky, Mark Lewis, Xiangjun Liu, John Lopez, Daniel Ma, William Majoros, Joe McDaniel, Sean Murphy, Matthew Newman, Trung Nguyen, Ngoc Nguyen, Marc Nodell, Sue Pan, Jim Peck, Marshall Peterson, William Rowe, Robert Sanders, John Scott, Michael Simpson, Thomas Smith, Arlan Sprague, Timothy Stockwell, Russell Turner, Eli Venter, Mei Wang, Meiyuan Wen, David Wu, Mitchell Wu, Ashley Xia, Ali Zandieh, Xiaohong Zhu T H E H U M A N G E N O M E

5,205 citations