Home
/
Authors
/
Jun Yang

Author

Jun Yang

Other affiliations: University of California, Berkeley, Durham University, Wilmington University ...read more

Bio: Jun Yang is an academic researcher from Duke University. The author has contributed to research in topics: Tuple & Wireless sensor network. The author has an hindex of 37, co-authored 167 publications receiving 5195 citations. Previous affiliations of Jun Yang include University of California, Berkeley & Durham University.

Topics: Tuple, Wireless sensor network, Audio signal, Adaptive filter, Data warehouse ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1998
1997
1996

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

BLINKS: ranked keyword searches on graphs

[...]

Hao He¹, Haixun Wang², Jun Yang¹, Philip S. Yu²•Institutions (2)

Duke University¹, IBM²

11 Jun 2007

TL;DR: BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search, and offers orders-of-magnitude performance improvement over existing approaches.

...read moreread less

Abstract: Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph finds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bi-level index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.

...read moreread less

601 citations

Proceedings Article•

Optimizing Queries Across Diverse Data Sources

[...]

Laura M. Haas, Donald Kossmann, Edward L. Wimmers, Jun Yang

25 Aug 1997

TL;DR: This work presents the design of a query optimizer for Garlic, a middleware system designed to integrate data from a broad range of data sources with very different query capabilities, and describes the design and implementation.

...read moreread less

Abstract: Businessestoday need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Garlic [C 95], a middleware system designedto integrate data from a broad range of data sources with very different query capabilities. Garlic’s optimizer extends the rule-based approach of [Loh88] to work in a heterogeneous environment, by defining generic rules for the middleware and using wrapper-provided rules to encapsulate the capabilities of each data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementationof rules for new sources, and the ability to express the capabilities of a diverse set of sources. We describe the design and implementationof this optimizer, and illustrate its actions through an example.

...read moreread less

537 citations

Proceedings Article•DOI•

Dual Labeling: Answering Graph Reachability Queries in Constant Time

[...]

Haixun Wang¹, Hao He², Jun Yang², Philip S. Yu¹, Jeffrey Xu Yu³ - Show less +1 more•Institutions (3)

IBM¹, Duke University², The Chinese University of Hong Kong³

03 Apr 2006

TL;DR: This paper proposes a novel labeling scheme for sparse graphs that ensures that graph reachability queries can be answered in constant time, and provides an alternative scheme to tradeoff query time for label space, which further benefits applications that use tree-like graphs.

...read moreread less

Abstract: Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve huge graphs and require fast answering of reachability queries. Several reachability labeling methods have been proposed for this purpose. They assign labels to the vertices, such that the reachability between any two vertices may be decided using their labels only. For sparse graphs, 2-hop based reachability labeling schemes answer reachability queries efficiently using relatively small label space. However, the labeling process itself is often too time consuming to be practical for large graphs. In this paper, we propose a novel labeling scheme for sparse graphs. Our scheme ensures that graph reachability queries can be answered in constant time. Furthermore, for sparse graphs, the complexity of the labeling process is almost linear, which makes our algorithm applicable to massive datasets. Analytical and experimental results show that our approach is much more efficient than stateof- the-art approaches. Furthermore, our labeling method also provides an alternative scheme to tradeoff query time for label space, which further benefits applications that use tree-like graphs.

...read moreread less

258 citations

Proceedings Article•DOI•

Constraint chaining: on energy-efficient continuous monitoring in sensor networks

[...]

Adam Silberstein¹, R. Braynard¹, Jun Yang¹•Institutions (1)

Duke University¹

27 Jun 2006

TL;DR: This work adds enhancements to CONCH to build in redundant constraints and provide a method to interpret the resulting reports in case of uncertainty, and experimentally evaluates CONCH's effectiveness against competing schemes in a number of interesting scenarios.

...read moreread less

Abstract: Wireless sensor networks have created new opportunities for data collection in a variety of scenarios, such as environmental and industrial, where we expect data to be temporally and spatially correlated. Researchers may want to continuously collect all sensor data from the network for later analysis. Suppression, both temporal and spatial, provides opportunities for reducing the energy cost of sensor data collection. We demonstrate how both types can be combined for maximal benefit. We frame the problem as one of monitoring node and edge constraints. A monitored node triggers a report if its value changes. A monitored edge triggers a report if the difference between its nodes' values changes. The set of reports collected at the base station is used to derive all node values. We fully exploit the potential of this global inference in our algorithm, CONCH, short for constraint chaining. Constraint chaining builds a network of constraints that are maintained locally, but allow a global view of values to be maintained with minimal cost. Network failure complicates the use of suppression, since either causes an absence of reports. We add enhancements to CONCH to build in redundant constraints and provide a method to interpret the resulting reports in case of uncertainty. Using simulation we experimentally evaluate CONCH's effectiveness against competing schemes in a number of interesting scenarios.

...read moreread less

175 citations

Book•

Materialized Views

[...]

Rada Chirkova, Jun Yang

16 Nov 2012

TL;DR: This monograph provides an accessible introduction and reference to materialized views, explains its core ideas, highlights its recent developments, and points out its sometimes subtle connections to other research topics in databases.

...read moreread less

Abstract: Materialized views are a natural embodiment of the ideas of precomputation and caching in databases. Instead of computing a query from scratch, a system can use results that have already been computed, stored, and kept in sync with database updates. The ability of materialized views to speed up queries benefits most database applications, ranging from traditional querying and reporting to web database caching, online analytical processing, and data mining. By reducing dependency on the availability of base data, materialized views have also laid much of the foundation for information integration and data warehousing systems. The database tradition of declarative querying distinguishes materialized views from generic applications of precomputation and caching in other contexts, and makes materialized views especially interesting, powerful, and challenging at the same time. Study of materialized views has generated a rich research literature and mature commercial implementations, aimed at providing efficient, effective, automated, and general solutions to the selection, use, and maintenance of materialized views. This monograph provides an accessible introduction and reference to materialized views, explains its core ideas, highlights its recent developments, and points out its sometimes subtle connections to other research topics in databases.

...read moreread less

172 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Models and issues in data stream systems

[...]

Brian Babcock¹, Shivnath Babu¹, Mayur Datar¹, Rajeev Motwani¹, Jennifer Widom¹ - Show less +1 more•Institutions (1)

Stanford University¹

03 Jun 2002

TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.

...read moreread less

Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

...read moreread less

2,933 citations

Journal Article•DOI•

Fake News Detection on Social Media: A Data Mining Perspective

[...]

Kai Shu¹, Amy Sliva², Suhang Wang¹, Jiliang Tang³, Huan Liu¹ - Show less +1 more•Institutions (3)

Arizona State University¹, Charles River Laboratories², Michigan State University³

01 Sep 2017-Sigkdd Explorations

TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.

...read moreread less

Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

...read moreread less

1,891 citations

Book•

Computational geometry

[...]

F. Frances Yao

02 Jan 1991

1,377 citations

Proceedings Article•

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World.

[...]

Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Vijayshankar Raman, Frederick Reiss, Mehul A. Shah - Show less +7 more

01 Jan 2003

TL;DR: The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams and leverages the PostgreSQL open source code base.

...read moreread less

Abstract: Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary. The Telegraph project has developed a suite of novel technologies for continuously adaptive query processing. The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams. In this paper, we describe the system architecture and its underlying technology, and report on our ongoing implementation effort, which leverages the PostgreSQL open source code base. We also discuss open issues and our research agenda.

...read moreread less

1,248 citations

Journal Article•DOI•

The state of the art in distributed query processing

[...]

Donald Kossmann¹•Institutions (1)

University of Passau¹

01 Dec 2000-ACM Computing Surveys

TL;DR: The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems, and discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems and shows how query processing works in these systems.

...read moreread less

Abstract: Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and they often must do it in order to stay competitive. While much of the infrastructure for distributed data processing is already there (e.g., modern network technology), a number of issues make distributed data processing still a complex undertaking: (1) distributed systems can become very large, involving thousands of heterogeneous sites including PCs and mainframe server machines; (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system; (3) legacy systems need to be integrated—such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intraquery paralleli sm, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems, and shows how query processing works in these systems.

...read moreread less

980 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse