Home
/
Authors
/
Rachel A. Harte

Author

Rachel A. Harte

Bio: Rachel A. Harte is an academic researcher from Stanford University. The author has contributed to research in topics: Human genome & Proto-oncogene tyrosine-protein kinase Src. The author has an hindex of 13, co-authored 14 publications receiving 7474 citations.

Papers

PDF

Open Access

More filters

An integrated encyclopedia of DNA elements in the human genome

[...]

Ian Dunham, Anshul Kundaje, Shelley Force Aldred, Patrick J. Collins +439 more

01 Sep 2012

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

2,767 citations

Journal Article•DOI•

Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution

[...]

LaDeana W. Hillier¹, Webb Miller², Ewan Birney, Wesley C. Warren¹ +171 more•Institutions (39)

09 Dec 2004-Nature

TL;DR: A draft genome sequence of the red jungle fowl, Gallus gallus, provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes.

...read moreread less

Abstract: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

...read moreread less

2,579 citations

Journal Article•DOI•

The UCSC Genome Browser database: 2014 update

[...]

Donna Karolchik¹, Galt P. Barber¹, Jonathan Casper¹, Hiram Clawson¹, Melissa S. Cline¹, Mark Diekhans¹, Timothy R. Dreszer¹, Pauline A. Fujita¹, Luvina Guruvadoo¹, Maximilian Haeussler¹, Rachel A. Harte¹, Steven G. Heitner¹, Angie S. Hinrichs¹, Katrina Learned¹, Brian T. Lee¹, Chin H. Li¹, Brian J. Raney¹, Brooke Rhead¹, Kate R. Rosenbloom¹, Cricket A. Sloan¹, Matthew L. Speir¹, Ann S. Zweig¹, David Haussler¹, Robert M. Kuhn¹, W. James Kent¹ - Show less +21 more•Institutions (1)

Stanford University¹

01 Jan 2014-Nucleic Acids Research

TL;DR: New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data.

...read moreread less

Abstract: The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.

...read moreread less

709 citations

Journal Article•DOI•

The UCSC Genome Browser database: 2016 update.

[...]

Matthew L. Speir¹, Ann S. Zweig¹, Kate R. Rosenbloom¹, Brian J. Raney¹, Benedict Paten¹, Parisa Nejad¹, Brian T. Lee¹, Katrina Learned¹, Donna Karolchik¹, Angie S. Hinrichs¹, Steve Heitner¹, Rachel A. Harte, Maximilian Haeussler¹, Luvina Guruvadoo¹, Pauline A. Fujita², Christopher Eisenhart¹, Mark Diekhans¹, Hiram Clawson¹, Jonathan Casper¹, Galt P. Barber¹, David Haussler¹, Robert M. Kuhn¹, W. James Kent¹ - Show less +19 more•Institutions (2)

University of California, Santa Cruz¹, University of California, San Francisco²

04 Jan 2016-Nucleic Acids Research

TL;DR: The UCSC Genome Browser has greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

...read moreread less

Abstract: For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

...read moreread less

618 citations

Journal Article•DOI•

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

[...]

Kim D. Pruitt¹, Jennifer Harrow, Rachel A. Harte, Craig Wallin, Mark Diekhans, Donna Maglott, Steve Searle, Catherine M. Farrell, Jane E. Loveland, Barbara J. Ruef, Elizabeth M. Hart, Marie-Marthe Suner, Melissa J. Landrum, Bronwen Aken, Sarah Ayling, Robert Baertsch, Julio Fernandez-Banet, Joshua L. Cherry, Val Curwen, Michael DiCuccio, Manolis Kellis, Jennifer M. Lee, Michael F. Lin, Michael Schuster, Andrew Shkeda, Clara Amid, Garth Brown, Oksana Dukhanina, Adam Frankish, Jennifer Hart, Bonnie L. Maidak, Jonathan M. Mudge, Michael R. Murphy, Terence Murphy, Jeena Rajan, Bhanu Rajput, Lillian D. Riddick, Catherine E. Snow, Charles A. Steward, David Webb, Janet Weber, Laurens G. Wilming, Wenyu Wu, Ewan Birney, David Haussler, Tim Hubbard, James Ostell, Richard Durbin, David J. Lipman - Show less +45 more•Institutions (1)

National Institutes of Health¹

01 Jul 2009-Genome Research

TL;DR: The CCDS database centralizes the function of identifying well-supported, identically-annotated, protein-coding regions and indicates that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS.

...read moreread less

Abstract: Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

...read moreread less

575 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets

[...]

Benjamin P. Lewis¹, Christopher B. Burge¹, David P. Bartel¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Jan 2005-Cell

TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

...read moreread less

11,624 citations

Journal Article•DOI•

Tissue-based map of the human proteome

[...]

Mathias Uhlén¹, Mathias Uhlén², Linn Fagerberg¹, Björn M. Hallström¹, Cecilia Lindskog³, Per Oksvold¹, Adil Mardinoglu⁴, Åsa Sivertsson¹, Caroline Kampf³, Evelina Sjöstedt¹, Evelina Sjöstedt³, Anna Asplund³, IngMarie Olsson³, Karolina Edlund, Emma Lundberg¹, Sanjay Navani, Cristina Al-Khalili Szigyarto¹, Jacob Odeberg¹, Dijana Djureinovic³, Jenny Ottosson Takanen¹, Sophia Hober¹, Tove Alm¹, Per-Henrik Edqvist³, Holger Berling¹, Hanna Tegel¹, Jan Mulder³, Johan Rockberg¹, Peter Nilsson¹, Jochen M. Schwenk¹, Marica Hamsten¹, Kalle von Feilitzen¹, Mattias Forsberg¹, Lukas Persson¹, Fredric Johansson¹, Martin Zwahlen¹, Gunnar von Heijne⁵, Jens Nielsen⁴, Jens Nielsen², Fredrik Pontén³ - Show less +35 more•Institutions (5)

Royal Institute of Technology¹, Technical University of Denmark², Science for Life Laboratory³, Chalmers University of Technology⁴, Stockholm University⁵

23 Jan 2015-Science

TL;DR: In this paper, a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level.

...read moreread less

Abstract: Resolving the molecular details of proteome variation in the different tissues and organs of the human body will greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level. Our tissue-based analysis detected more than 90% of the putative protein-coding genes. We used this approach to explore the human secretome, the membrane proteome, the druggable proteome, the cancer proteome, and the metabolic functions in 32 different tissues and organs. All the data are integrated in an interactive Web-based database that allows exploration of individual proteins, as well as navigation of global expression patterns, in all major tissues and organs in the human body.

...read moreread less

9,745 citations

Journal Article•DOI•

NCBI GEO: archive for functional genomics data sets—update

[...]

Tanya Barrett¹, Stephen E. Wilhite¹, Pierre Ledoux¹, Carlos Evangelista¹, Irene F. Kim¹, Maxim Tomashevsky¹, Kimberly A. Marshall¹, Katherine Phillippy¹, Patti M. Sherman¹, Michelle Holko¹, Andrey Yefanov¹, Hye Seung Lee¹, Naigong Zhang¹, Cynthia L. Robertson¹, Nadezhda Serova¹, Sean Davis¹, Alexandra Soboleva¹ - Show less +13 more•Institutions (1)

National Institutes of Health¹

27 Nov 2012-Nucleic Acids Research

TL;DR: The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.

...read moreread less

Abstract: The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

...read moreread less

6,683 citations

Journal Article•DOI•

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

[...]

Mihaela Pertea¹, Geo Pertea¹, Corina Antonescu¹, Tsung Cheng Chang², Joshua T. Mendell², Steven L. Salzberg¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of Texas Southwestern Medical Center²

01 Mar 2015-Nature Biotechnology

TL;DR: StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts produces more complete and accurate reconstructions of genes and better estimates of expression levels.

...read moreread less

Abstract: Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

...read moreread less

6,594 citations

Journal Article•DOI•

Predicting effective microRNA target sites in mammalian mRNAs

[...]

Vikram Agarwal¹, George W. Bell¹, Jin Wu Nam², Jin Wu Nam¹, David P. Bartel¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, UPRRP College of Natural Sciences²

12 Aug 2015-eLife

TL;DR: It is shown that recently reported non-canonical sites do not mediate repression despite binding the miRNA, which indicates that the vast majority of functional sites are canonical.

...read moreread less

Abstract: Proteins are built by using the information contained in molecules of messenger RNA (mRNA). Cells have several ways of controlling the amounts of different proteins they make. For example, a so-called ‘microRNA’ molecule can bind to an mRNA molecule to cause it to be more rapidly degraded and less efficiently used, thereby reducing the amount of protein built from that mRNA. Indeed, microRNAs are thought to help control the amount of protein made from most human genes, and biologists are working to predict the amount of control imparted by each microRNA on each of its mRNA targets. All RNA molecules are made up of a sequence of bases, each commonly known by a single letter—‘A’, ‘U’, ‘C’ or ‘G’. These bases can each pair up with one specific other base—‘A’ pairs with ‘U’, and ‘C’ pairs with ‘G’. To direct the repression of an mRNA molecule, a region of the microRNA known as a ‘seed’ binds to a complementary sequence in the target mRNA. ‘Canonical sites’ are regions in the mRNA that contain the exact sequence of partner bases for the bases in the microRNA seed. Some canonical sites are more effective at mRNA control than others. ‘Non-canonical sites’ also exist in which the pairing between the microRNA seed and mRNA does not completely match. Previous work has suggested that many non-canonical sites can also control mRNA degradation and usage. Agarwal et al. first used large experimental datasets from many sources to investigate microRNA activity in more detail. As expected, when mRNAs had canonical sites that matched the microRNA, mRNA levels and usage tended to drop. However, no effect was observed when the mRNAs only had recently identified non-canonical sites. This suggests that microRNAs primarily bind to canonical sites to control protein production. Based on these results, Agarwal et al. further developed a statistical model that predicts the effects of microRNAs binding to canonical sites. The updated model considers 14 different features of the microRNA, microRNA site, or mRNA—including the mRNA sequence around the site—to predict which sites within mRNAs are most effectively targeted by microRNAs. Tests showed that Agarwal et al.'s model was as good as experimental approaches at identifying the effective target sites, and was better than existing computational models. The model has been used to power the latest version of a freely available resource called TargetScan, and so could prove a valuable resource for researchers investigating the many important roles of microRNAs in controlling protein production.

...read moreread less

5,365 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse