scispace - formally typeset
Search or ask a question
Author

Ye Yuan

Bio: Ye Yuan is an academic researcher from Beijing Institute of Technology. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 13, co-authored 102 publications receiving 791 citations. Previous affiliations of Ye Yuan include Northeastern University & Northeastern University (China).


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms.
Abstract: A Bloom filter is an effective, space-efficient data structure for concisely representing a set, and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a sufficiently low level. By investigating mainstream applications based on the Bloom filter, we reveal that dynamic data sets are more common and important than static sets. However, existing variants of the Bloom filter cannot support dynamic data sets well. To address this issue, we propose dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms. The dynamic Bloom filter can control the false positive probability at a low level by expanding its capacity as the set cardinality increases. Through comprehensive mathematical analysis, we show that the dynamic Bloom filter uses less expected memory than the Bloom filter when representing dynamic sets with an upper bound on set cardinality, and also that the dynamic Bloom filter is more stable than the Bloom filter due to infrequent reconstruction when addressing dynamic sets without an upper bound on set cardinality. Moreover, the analysis results hold in stand-alone applications, as well as distributed applications.

181 citations

Journal ArticleDOI
TL;DR: An OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network is proposed and a data space coverage based peer selection approach is proposed to reduce high the communication cost and large delay.

79 citations

Book ChapterDOI
01 Apr 2010
TL;DR: A new SP definition based on the possible world semantics that has been widely adopted for probabilistic data management is proposed, and efficient methods to find threshold-based SP path queries over an uncertain graph are developed.
Abstract: Efficiently processing shortest path (SP) queries over stochastic networks attracted a lot of research attention as such queries are very popular in the emerging real world applications such as Intelligent Transportation Systems and communication networks whose edge weights can be modeled as a random variable. Some pervious works aim at finding the most likely SP (the path with largest probability to be SP), and others search the least-expected-weight path. In all these works, the definitions of the shortest path query are based on simple probabilistic models which can be converted into the multi-objective optimal issues on a weighted graph. However, these simple definitions miss important information about the internal structure of the probabilistic paths and the interplay among all the uncertain paths. Thus, in this paper, we propose a new SP definition based on the possible world semantics that has been widely adopted for probabilistic data management, and develop efficient methods to find threshold-based SP path queries over an uncertain graph. Extensive experiments based on real data sets verified the effectiveness of the proposed methods.

59 citations

Journal ArticleDOI
TL;DR: The definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems, are discussed and some classic problems on time- dependent graphs are reviewed, e.g., route planning, social analysis, and subgraph problem (including matching and mining).
Abstract: A time-dependent graph is, informally speaking, a graph structure dynamically changes with time. In such graphs, the weights associated with edges dynamically change over time, that is, the edges in such graphs are activated by sequences of time-dependent elements. Many real-life scenarios can be better modeled by time-dependent graphs, such as bioinformatics networks, transportation networks, and social networks. In particular, the time-dependent graph is a very broad concept, which is reflected in the related research with many names, including temporal graphs, evolving graphs, time-varying graphs, historical graphs, and so on. Though static graphs have been extensively studied, for their time-dependent generalizations, we are still far from a complete and mature theory of models and algorithms. In this paper, we discuss the definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems. In addition, we review some classic problems on time-dependent graphs, e.g., route planning, social analysis, and subgraph problem (including matching and mining). We also introduce existing time-dependent systems and summarize their advantages and limitations. We try to keep the descriptions consistent as much as possible and we hope the survey can help practitioners to understand existing time-dependent techniques.

57 citations

Journal ArticleDOI
TL;DR: A filtering-and-verification strategy based on a probabilistic keyword index, PKIndex, which offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way to improve the search efficiency.
Abstract: As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.

48 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Jan 2012

3,692 citations

Journal ArticleDOI
TL;DR: A survey on Extreme learning machine (ELM) and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELm, (3) online sequential ELM; and (4) incremental ELM and (5) ensemble ofELM.
Abstract: Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

1,767 citations

Journal ArticleDOI
TL;DR: In this paper, the authors report the current state of the theoretical research and practical advances on this subject and provide a comprehensive view of these advances in ELM together with its future perspectives.

1,289 citations

Journal ArticleDOI
TL;DR: An overview of the basic and advanced probabilistic techniques is given, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.
Abstract: Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

480 citations