Home
/
Authors
/
Ye Yuan

Author

Ye Yuan

Other affiliations: Northeastern University, Northeastern University (China)

Bio: Ye Yuan is an academic researcher from Beijing Institute of Technology. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 13, co-authored 102 publications receiving 791 citations. Previous affiliations of Ye Yuan include Northeastern University & Northeastern University (China).

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Dynamic Bloom Filters

[...]

Deke Guo¹, Jie Wu², Honghui Chen¹, Ye Yuan³, Xueshan Luo¹ - Show less +1 more•Institutions (3)

National University of Defense Technology¹, Temple University², Northeastern University (China)³

01 Jan 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms.

...read moreread less

Abstract: A Bloom filter is an effective, space-efficient data structure for concisely representing a set, and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a sufficiently low level. By investigating mainstream applications based on the Bloom filter, we reveal that dynamic data sets are more common and important than static sets. However, existing variants of the Bloom filter cannot support dynamic data sets well. To address this issue, we propose dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms. The dynamic Bloom filter can control the false positive probability at a low level by expanding its capacity as the set cardinality increases. Through comprehensive mathematical analysis, we show that the dynamic Bloom filter uses less expected memory than the Bloom filter when representing dynamic sets with an upper bound on set cardinality, and also that the dynamic Bloom filter is more stable than the Bloom filter due to infrequent reconstruction when addressing dynamic sets without an upper bound on set cardinality. Moreover, the analysis results hold in stand-alone applications, as well as distributed applications.

...read moreread less

181 citations

Journal Article•DOI•

An OS-ELM based distributed ensemble classification framework in P2P networks

[...]

Yongjiao Sun¹, Ye Yuan¹, Guoren Wang¹•Institutions (1)

Northeastern University (China)¹

01 Sep 2011-Neurocomputing

TL;DR: An OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network is proposed and a data space coverage based peer selection approach is proposed to reduce high the communication cost and large delay.

...read moreread less

79 citations

Book Chapter•DOI•

Efficiently answering probability threshold-based shortest path queries over uncertain graphs

[...]

Ye Yuan¹, Lei Chen², Guoren Wang¹•Institutions (2)

Northeastern University (China)¹, Hong Kong University of Science and Technology²

01 Apr 2010

TL;DR: A new SP definition based on the possible world semantics that has been widely adopted for probabilistic data management is proposed, and efficient methods to find threshold-based SP path queries over an uncertain graph are developed.

...read moreread less

Abstract: Efficiently processing shortest path (SP) queries over stochastic networks attracted a lot of research attention as such queries are very popular in the emerging real world applications such as Intelligent Transportation Systems and communication networks whose edge weights can be modeled as a random variable. Some pervious works aim at finding the most likely SP (the path with largest probability to be SP), and others search the least-expected-weight path. In all these works, the definitions of the shortest path query are based on simple probabilistic models which can be converted into the multi-objective optimal issues on a weighted graph. However, these simple definitions miss important information about the internal structure of the probabilistic paths and the interplay among all the uncertain paths. Thus, in this paper, we propose a new SP definition based on the possible world semantics that has been widely adopted for probabilistic data management, and develop efficient methods to find threshold-based SP path queries over an uncertain graph. Extensive experiments based on real data sets verified the effectiveness of the proposed methods.

...read moreread less

59 citations

Journal Article•DOI•

Time-Dependent Graphs: Definitions, Applications, and Algorithms

[...]

Yishu Wang¹, Ye Yuan¹, Yuliang Ma¹, Guoren Wang²•Institutions (2)

Northeastern University (China)¹, Beijing Institute of Technology²

25 Sep 2019-Data Science and Engineering

TL;DR: The definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems, are discussed and some classic problems on time- dependent graphs are reviewed, e.g., route planning, social analysis, and subgraph problem (including matching and mining).

...read moreread less

Abstract: A time-dependent graph is, informally speaking, a graph structure dynamically changes with time. In such graphs, the weights associated with edges dynamically change over time, that is, the edges in such graphs are activated by sequences of time-dependent elements. Many real-life scenarios can be better modeled by time-dependent graphs, such as bioinformatics networks, transportation networks, and social networks. In particular, the time-dependent graph is a very broad concept, which is reflected in the related research with many names, including temporal graphs, evolving graphs, time-varying graphs, historical graphs, and so on. Though static graphs have been extensively studied, for their time-dependent generalizations, we are still far from a complete and mature theory of models and algorithms. In this paper, we discuss the definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems. In addition, we review some classic problems on time-dependent graphs, e.g., route planning, social analysis, and subgraph problem (including matching and mining). We also introduce existing time-dependent systems and summarize their advantages and limitations. We try to keep the descriptions consistent as much as possible and we hope the survey can help practitioners to understand existing time-dependent techniques.

...read moreread less

57 citations

Journal Article•DOI•

Efficient Keyword Search on Uncertain Graph Data

[...]

Ye Yuan¹, Guoren Wang¹, Lei Chen², Haixun Wang³•Institutions (3)

Northeastern University (China)¹, Hong Kong University of Science and Technology², Microsoft³

01 Dec 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A filtering-and-verification strategy based on a probabilistic keyword index, PKIndex, which offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way to improve the search efficiency.

...read moreread less

Abstract: As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.

...read moreread less

48 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Journal Article•DOI•

Extreme learning machines: a survey

[...]

Guang-Bin Huang¹, Dianhui Wang², Yuan Lan¹•Institutions (2)

Nanyang Technological University¹, La Trobe University²

25 May 2011-International Journal of Machine Learning and Cybernetics

TL;DR: A survey on Extreme learning machine (ELM) and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELm, (3) online sequential ELM; and (4) incremental ELM and (5) ensemble ofELM.

...read moreread less

Abstract: Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

...read moreread less

1,767 citations

Journal Article•DOI•

Trends in extreme learning machines

[...]

Gao Huang¹, Guang-Bin Huang², Shiji Song¹, Keyou You¹•Institutions (2)

Tsinghua University¹, Nanyang Technological University²

01 Jan 2015-Neural Networks

TL;DR: In this paper, the authors report the current state of the theoretical research and practical advances on this subject and provide a comprehensive view of these advances in ELM together with its future perspectives.

...read moreread less

1,289 citations

Journal Article•DOI•

Theory and Practice of Bloom Filters for Distributed Systems

[...]

Sasu Tarkoma¹, Christian Esteve Rothenberg², Eemil Lagerspetz¹•Institutions (2)

Helsinki Institute for Information Technology¹, State University of Campinas²

21 Jan 2012-IEEE Communications Surveys and Tutorials

TL;DR: An overview of the basic and advanced probabilistic techniques is given, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

...read moreread less

Abstract: Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

...read moreread less

480 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse