Home
/
Authors
/
Mathew W. Wright

Author

Mathew W. Wright

Other affiliations: Wellcome Trust, University College London, National Institutes of Health

Bio: Mathew W. Wright is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Gene nomenclature & HUGO Gene Nomenclature Committee. The author has an hindex of 23, co-authored 28 publications receiving 8221 citations. Previous affiliations of Mathew W. Wright include Wellcome Trust & University College London.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

Journal Article•DOI•

Gene map of the extended human MHC

[...]

Roger Horton¹, Laurens G. Wilming¹, Vikki Rand¹, Ruth C. Lovering², Elspeth A. Bruford², Varsha K. Khodiyar², Michael J. Lush², Sue Povey², C. Conover Talbot³, Mathew W. Wright², H Wain², John Trowsdale⁴, Andreas Ziegler⁵, Stephan Beck¹ - Show less +10 more•Institutions (5)

Wellcome Trust Sanger Institute¹, University College London², Johns Hopkins University³, University of Cambridge⁴, Humboldt University of Berlin⁵

01 Dec 2004-Nature Reviews Genetics

TL;DR: A gene map of the xMHC is presented and its content in relation to paralogy, polymorphism, immune function and disease is reviewed.

...read moreread less

Abstract: The major histocompatibility complex (MHC) is the most important region in the vertebrate genome with respect to infection and autoimmunity, and is crucial in adaptive and innate immunity. Decades of biomedical research have revealed many MHC genes that are duplicated, polymorphic and associated with more diseases than any other region of the human genome. The recent completion of several large-scale studies offers the opportunity to assimilate the latest data into an integrated gene map of the extended human MHC. Here, we present this map and review its content in relation to paralogy, polymorphism, immune function and disease.

...read moreread less

1,047 citations

Journal Article•DOI•

New consensus nomenclature for mammalian keratins

[...]

Jürgen Schweizer¹, Paul Edward Bowden², Pierre A. Coulombe³, Lutz Langbein¹, E. Birgitte Lane, Thomas M. Magin⁴, Lois J. Maltais, M. Bishr Omary⁵, David A.D. Parry⁶, Michael A. Rogers, Mathew W. Wright⁷ - Show less +7 more•Institutions (7)

German Cancer Research Center¹, Cardiff University², Johns Hopkins University³, University of Bonn⁴, Stanford University⁵, Massey University⁶, University College London⁷

17 Jul 2006-Journal of Cell Biology

TL;DR: This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratin proteins from other mammalian species.

...read moreread less

Abstract: Keratins are intermediate filament–forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species.

...read moreread less

677 citations

Journal Article•DOI•

Genenames.org: the HGNC resources in 2015

[...]

Kristian Gray¹, Bethan Yates¹, Ruth L. Seal¹, Mathew W. Wright¹, Elspeth A. Bruford¹ - Show less +1 more•Institutions (1)

European Bioinformatics Institute¹

28 Jan 2015-Nucleic Acids Research

TL;DR: An overview of the HUGO Gene Nomenclature Committee's current online data and resources is provided, and highlights the changes made in recent years.

...read moreread less

Abstract: The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. To date the HGNC have assigned over 39 000 gene names and, representing an increase of over 5000 entries in the past two years. As well as increasing the size of our database, we have continued redesigning our website http://www.genenames.org and have modified, updated and improved many aspects of the site including a faster and more powerful search, a vastly improved HCOP tool and a REST service to increase the number of ways users can retrieve our data. This article provides an overview of our current online data and resources, and highlights the changes we have made in recent years.

...read moreread less

589 citations

Journal Article•DOI•

Guidelines for human gene nomenclature.

[...]

H Wain¹, Elspeth A. Bruford¹, Ruth C. Lovering¹, Michael J. Lush¹, Mathew W. Wright¹, Sue Povey¹ - Show less +2 more•Institutions (1)

University College London¹

01 Apr 2002-Genomics

TL;DR: With the recent publications of the complete human genomesequence there is an estimated total of 26,000–40,000 genes, as suggested by the International Human Genome Sequencing Consortium and Venter.

...read moreread less

551 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Molecular Signatures Database Hallmark Gene Set Collection

[...]

Arthur Liberzon¹, Chet Birger¹, Helga Thorvaldsdottir¹, Mahmoud Ghandi¹, Jill P. Mesirov², Pablo Tamayo² - Show less +2 more•Institutions (2)

Broad Institute¹, University of California, San Diego²

23 Dec 2015-Cell systems

TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.

...read moreread less

Abstract: The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of “hallmark” gene sets as part of MSigDB. Each hallmark in this collection consists of a “refined” gene set, derived from multiple “founder” sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.

...read moreread less

6,062 citations

Journal Article•DOI•

KEGG: new perspectives on genomes, pathways, diseases and drugs

[...]

Minoru Kanehisa¹, Miho Furumichi¹, Mao Tanabe¹, Yoko Sato², Kanae Morishima¹ - Show less +1 more•Institutions (2)

Kyoto University¹, Fujitsu²

04 Jan 2017-Nucleic Acids Research

TL;DR: The content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases, and the newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined.

...read moreread less

Abstract: KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.

...read moreread less

5,741 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Journal Article•DOI•

GENCODE: The reference human genome annotation for The ENCODE Project

[...]

Jennifer Harrow¹, Adam Frankish¹, José M. González¹, Electra Tapanari¹, Mark Diekhans², Felix Kokocinski¹, Bronwen Aken¹, Daniel Barrell¹, Amonida Zadissa¹, Stephen M. J. Searle¹, If H. A. Barnes¹, Alexandra Bignell¹, Veronika Boychenko¹, Toby Hunt¹, M. Kay¹, Gaurab Mukherjee¹, Jeena Rajan¹, Gloria Despacio-Reyes¹, Gary Saunders¹, Charles A. Steward¹, Rachel A. Harte², Michael F. Lin³, Cédric Howald⁴, Andrea Tanzer, Thomas Derrien⁴, Jacqueline Chrast⁴, Nathalie Walters⁴, Suganthi Balasubramanian⁵, Baikang Pei⁵, Michael L. Tress, Jose Manuel Rodriguez, Iakes Ezkurdia, Jeltje Van Baren, Michael R. Brent, David Haussler², Manolis Kellis³, Alfonso Valencia, Alexandre Reymond⁴, Mark Gerstein⁵, Roderic Guigó, Tim Hubbard¹ - Show less +37 more•Institutions (5)

Wellcome Trust Sanger Institute¹, University of California, Santa Cruz², Massachusetts Institute of Technology³, University of Lausanne⁴, Yale University⁵

01 Sep 2012-Genome Research

TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.

...read moreread less

Abstract: The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

...read moreread less

4,281 citations

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

...read moreread less

4,104 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse