Home
/
Authors
/
Jason G. McHugh

Author

Jason G. McHugh

Other affiliations: Stanford University, Facebook

Bio: Jason G. McHugh is an academic researcher from Amazon.com. The author has contributed to research in topics: Query language & Object (computer science). The author has an hindex of 21, co-authored 48 publications receiving 4175 citations. Previous affiliations of Jason G. McHugh include Stanford University & Facebook.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Lorel Query Language for Semistructured Data

[...]

Serge Abiteboul¹, Dallan Quass¹, Jason G. McHugh¹, Jennifer Widom¹, Janet L. Wiener¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Apr 1997-International Journal on Digital Libraries

TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.

...read moreread less

Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

...read moreread less

1,257 citations

Journal Article•DOI•

Lore: a database management system for semistructured data

[...]

Jason G. McHugh¹, Serge Abiteboul¹, Roy Goldman¹, Dallas Quass¹, Jennifer Widom¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Sep 1997

TL;DR: This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.

...read moreread less

Abstract: Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.

...read moreread less

692 citations

Proceedings Article•

Query Optimization for XML

[...]

Jason G. McHugh¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

07 Sep 1999

TL;DR: This paper describes the query processor of Lore, a DBMS for XML-based data supporting an expressive query language and focuses primarily on Lore's cost-based query optimizer, including heuristics for reducing the large search space.

...read moreread less

Abstract: XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured: the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore, a DBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported. This is a short version of the paper Query Optimization for Semistructured Data which is available at: http://www-db.stanford.edu/~mchughj/publications/qo.ps

...read moreread less

419 citations

From Semistructured Data to XML: Migrating the Lore Data Model and Query Language

[...]

Roy Goldman, Jason G. McHugh, Jennifer Widom

01 Jan 1999

TL;DR: This paper describes the experiences migrating the Lore database management system for semistructured data to work with XML, and presents a modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language.

...read moreread less

Abstract: Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled as some form of labeled, directed graph. The recent emergence of eXtensible Markup Language (XML) as a new standard for data representation and exchange on the World-Wide Web has drawn significant attention. Researchers have casually observed a striking similarity between semistructured data models and XML. While similarities do abound, some key differences dictate changes to any existing data model, query language, or DBMS for semistructured data in order to fully support XML. This paper describes our experiences migrating the Lore database management system for semistructured data to work with XML. We present our modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language. Based on this model, we describe changes to Lorel, Lore's query language. We also briefly discuss changes to Lore's dynamic structural summaries (DataGuides) and the relationship of DataGuides to XML's Document Type Definitions (DTDs).

...read moreread less

309 citations

Patent•

Method, system and computer program product for the delivery of a chat message in a 3D multi-user environment

[...]

William D. Harvey, Jason G. McHugh, Fernando J. Paiz, Jeffrey J. Ventrella

31 Aug 2000

TL;DR: In this paper, a chat system, method and computer program product for delivering a message between a sender and a recipient in a three-dimensional (3D) multi-user environment, wherein the 3D multiuser environment maintains respective digital representations of the sender and the recipient.

...read moreread less

Abstract: A chat system, method and computer program product for delivering a message between a sender and a recipient in a three-dimensional (3D) multi-user environment, wherein the 3D multi-user environment maintains respective digital representations of the sender and the recipient, uses a recipient interface to receive a message, map the message to a texture to generate a textured message, and render the textured message in the 3D multi-user environment so as to permit the recipient to visually ascertain the location of the digital representation of the sender in the 3D world. Received messages are mantained as two-dimensional elements on a recipient viewport.

...read moreread less

299 citations

1
2
3
4
…
5
6
7
8
9
10

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Data integration: a theoretical perspective

[...]

Maurizio Lenzerini¹•Institutions (1)

Sapienza University of Rome¹

03 Jun 2002

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

2,716 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

XQuery 1.0 : An XML Query Language

[...]

Scott Boag

01 Jan 2007

TL;DR: XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories.

...read moreread less

Abstract: XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the

...read moreread less

2,066 citations

Journal Article•DOI•

Data fusion

[...]

Jens Bleiholder¹, Felix Naumann¹•Institutions (1)

Hasso Plattner Institute¹

15 Jan 2009-ACM Computing Surveys

TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.

...read moreread less

Abstract: The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

...read moreread less

1,797 citations

Journal Article•DOI•

Web mining research: a survey

[...]

Raymond Kosala¹, Hendrik Blockeel¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jun 2000-Sigkdd Explorations

TL;DR: This paper surveys the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories, which are then situate some of the research with respect to these three categories.

...read moreread less

Abstract: With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. The Web mining research is at the cross road of research from several research communities, such as database, information retrieval, and within AI, especially the sub-areas of machine learning and natural language processing. However, there is a lot of confusions when comparing research efforts from different point of views. In this paper, we survey the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories. Then we situate some of the research with respect to these three categories. We also explore the connection between the Web mining categories and the related agent paradigm. For the survey, we focus on representation issues, on the process, on the learning algorithm, and on the application of the recent works as the criteria. We conclude the paper with some research issues.

...read moreread less

1,699 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse