scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge and Information Systems in 2013"


Journal ArticleDOI
TL;DR: Several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features.
Abstract: With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selec- tion method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selec- tion methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.

637 citations


Journal ArticleDOI
TL;DR: The literature is surveyed to highlight recent advances in transfer learning for activity recognition, and existing approaches to transfer-based activity recognition are characterized by sensor modality, by differences between source and target environments, by data availability, and by type of information that is transferred.
Abstract: Many intelligent systems that focus on the needs of a human require information about the activities being performed by the human. At the core of this capability is activity recognition, which is a challenging and well-researched problem. Activity recognition algorithms require substantial amounts of labeled training data yet need to perform well under very diverse circumstances. As a result, researchers have been designing methods to identify and utilize subtle connections between activity recognition datasets, or to perform transfer-based activity recognition. In this paper, we survey the literature to highlight recent advances in transfer learning for activity recognition. We characterize existing approaches to transfer-based activity recognition by sensor modality, by differences between source and target environments, by data availability, and by type of information that is transferred. Finally, we present some grand challenges for the community to consider as this field is further developed.

395 citations


Journal ArticleDOI
TL;DR: This survey intends to provide a high-level summarization for active learning and motivates interested readers to consider instance-selection approaches for designing effective active learning solutions.
Abstract: Active learning aims to train an accurate prediction model with minimum cost by labeling most informative instances. In this paper, we survey existing works on active learning from an instance-selection perspective and classify them into two categories with a progressive relationship: (1) active learning merely based on uncertainty of independent and identically distributed (IID) instances, and (2) active learning by further taking into account instance correlations. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/weaknesses, followed by a simple runtime performance comparison, and discussion about emerging active learning applications and instance-selection challenges therein. This survey intends to provide a high-level summa- rization for active learning and motivates interested readers to consider instance-selection approaches for designing effective active learning solutions.

302 citations


Journal ArticleDOI
TL;DR: Novel topic-aware influence-driven propagation models that are more accurate in describing real-world cascades than the standard (i.e., topic-blind) propagation models studied in the literature are introduced.
Abstract: The study of influence-driven propagations in social networks and its exploitation for viral marketing purposes has recently received a large deal of attention. However, regardless of the fact that users authoritativeness, expertise, trust and influence are evidently topic-dependent, the research on social influence has surprisingly largely overlooked this aspect. In this article, we study social influence from a topic modeling perspective. We introduce novel topic-aware influence-driven propagation models that, as we show in our experiments, are more accurate in describing real-world cascades than the standard (i.e., topic-blind) propagation models studied in the literature. In particular, we first propose simple topic-aware extensions of the well-known Independent Cascade and Linear Threshold models. However, these propagation models have a very large number of parameters which could lead to overfitting. Therefore, we propose a different approach explicitly modeling authoritativeness, influence and relevance under a topic-aware perspective. Instead of considering user-to-user influence, the proposed model focuses on user authoritativeness and interests in a topic, leading to a drastic reduction in the number of parameters of the model. We devise methods to learn the parameters of the models from a data set of past propagations. Our experimentation confirms the high accuracy of the proposed models and learning schemes.

257 citations


Journal ArticleDOI
TL;DR: In this article, an unsupervised outlier detection approach for wireless sensor networks is proposed, which is flexible with respect to the outlier definition and uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms.
Abstract: To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy consumption, (3) uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance by simulation, using real sensor data streams. Our results demonstrate that our approach is accurate and imposes reasonable communication and power consumption demands.

217 citations


Journal ArticleDOI
TL;DR: A new SVDD-based approach to detect outliers on uncertain data that outperforms state-of-art outlier detection techniques and reduces the contribution of the examples with the least confidence score on the construction of the decision boundary.
Abstract: Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classi- fier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is gen- erated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global

127 citations


Journal ArticleDOI
TL;DR: Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap, and empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network.
Abstract: In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions OSM semantics consists of a set of properties (called ‘tags’) describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website Because of its simple and open semantic structure, the OSM approach often results in noisy and ambiguous data, limiting its usability for analysis in information retrieval, recommender systems and data mining Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap The contribution of this paper is twofold It consists of (1) the development of the OSM Semantic Network by means of a web crawler tailored to the OSM Wiki website; this semantic network can be used to compute semantic similarity through co-citation measures, providing a novel semantic tool for OSM and GIS communities; (2) a study of the cognitive plausibility (ie the ability to replicate human judgement) of co-citation algorithms when applied to the computation of semantic similarity of geographic concepts Empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network

126 citations


Journal ArticleDOI
TL;DR: This paper presents and analyzes several typical uncertain queries, such as skyline queries, top-$$k$$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data, and summarizes the main features of uncertain queries.
Abstract: Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top- $$k$$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.

114 citations


Journal ArticleDOI
TL;DR: A travel route recommendation method that makes use of the photographers’ histories as held by social photo-sharing sites and outputs a set of personalized travel plans that match the user’s preference, present location, spare time and transportation means.
Abstract: We propose a travel route recommendation method that makes use of the photographers’ histories as held by social photo-sharing sites. Assuming that the collection of each photographer’s geotagged photos is a sequence of visited locations, photo-sharing sites are important sources for gathering the location histories of tourists. By following their location sequences, we can find representative and diverse travel routes that link key landmarks. Recommendations are performed by our photographer behavior model, which estimates the probability of a photographer visiting a landmark. We incorporate user preference and present location information into the probabilistic behavior model by combining topic models and Markov models. Based on the photographer behavior model, proposed route recommendation method outputs a set of personalized travel plans that match the user’s preference, present location, spare time and transportation means. We demonstrate the effectiveness of the proposed method using an actual large-scale geotag dataset held by Flickr in terms of the prediction accuracy of travel behavior.

111 citations


Journal ArticleDOI
TL;DR: The refined notion of conditional non-discrimination in classifier design is introduced and it is shown that some of the differences in decisions across the sensitive groups can be explainable and are hence tolerable.
Abstract: Recently, the following discrimination-aware classification problem was introduced. Historical data used for supervised learning may contain discrimination, for instance, with respect to gender. The question addressed by discrimination-aware techniques is, given sensitive attribute, how to train discrimination-free classifiers on such historical data that are discriminative, with respect to the given sensitive attribute. Existing techniques that deal with this problem aim at removing all discrimination and do not take into account that part of the discrimination may be explainable by other attributes. For example, in a job application, the education level of a job candidate could be such an explainable attribute. If the data contain many highly educated male candidates and only few highly educated women, a difference in acceptance rates between woman and man does not necessarily reflect gender discrimination, as it could be explained by the different levels of education. Even though selecting on education level would result in more males being accepted, a difference with respect to such a criterion would not be considered to be undesirable, nor illegal. Current state-of-the-art techniques, however, do not take such gender-neutral explanations into account and tend to overreact and actually start reverse discriminating, as we will show in this paper. Therefore, we introduce and analyze the refined notion of conditional non-discrimination in classifier design. We show that some of the differences in decisions across the sensitive groups can be explainable and are hence tolerable. Therefore, we develop methodology for quantifying the explainable discrimination and algorithmic techniques for removing the illegal discrimination when one or more attributes are considered as explanatory. Experimental evaluation on synthetic and real-world classification datasets demonstrates that the new techniques are superior to the old ones in this new context, as they succeed in removing almost exclusively the undesirable discrimination, while leaving the explainable differences unchanged, allowing for differences in decisions as long as they are explainable.

104 citations


Journal ArticleDOI
TL;DR: A cube model is designed to explicitly describe the relationship among providers, consumers and Web services, and a Standard Deviation based Hybrid Collaborative Filtering (SD-HCF) for Web Service Recommendation (WSRec) and an Inverse consumer Frequency based User Collaborativefiltering (IF-UCF), which indicates the effectiveness of adding inverse consumer frequency to UCF.
Abstract: Web service recommendation has become a hot yet fundamental research topic in service computing. The most popular technique is the Collaborative Filtering (CF) based on a user-item matrix. However, it cannot well capture the relationship between Web services and providers. To address this issue, we first design a cube model to explicitly describe the relationship among providers, consumers and Web services. And then, we present a Standard Deviation based Hybrid Collaborative Filtering (SD-HCF) for Web Service Recommendation (WSRec) and an Inverse consumer Frequency based User Collaborative Filtering (IF-UCF) for Potential Consumers Recommendation (PCRec). Finally, the decision-making process of bidirectional recommendation is provided for both providers and consumers. Sets of experiments are conducted on real-world data provided by Planet-Lab. In the experiment phase, we show how the parameters of SD-HCF impact on the prediction quality as well as demonstrate that the SD-HCF is much better than extant methods on recommendation quality, including the CF based on user, the CF based on item and general HCF. Experimental comparison between IF-UCF and UCF indicates the effectiveness of adding inverse consumer frequency to UCF.

Journal ArticleDOI
TL;DR: This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner and presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far.
Abstract: Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.

Journal ArticleDOI
TL;DR: The formalization of a semantic-enriched KDD process for supporting meaningful pattern interpretations of human behavior, based on the integration of inductive reasoning and deductive reasoning, is described.
Abstract: The widespread use of mobile devices is producing a huge amount of trajectory data, making the discovery of movement patterns possible, which are crucial for understanding human behavior. Significant advances have been made with regard to knowledge discovery, but the process now needs to be extended bearing in mind the emerging field of behavior informatics. This paper describes the formalization of a semantic-enriched KDD process for supporting meaningful pattern interpretations of human behavior. Our approach is based on the integration of inductive reasoning (movement pattern discovery) and deductive reasoning (human behavior inference). We describe the implemented Athena system, which supports such a process, along with the experimental results on two different application domains related to traffic and recreation management.

Journal ArticleDOI
TL;DR: UMCourt is described, a project built around two sub-fields of AI research: Multi-agent Systems and Case-Based Reasoning, aimed at fostering the development of tools for ODR, to develop autonomous tools that can increase the effectiveness of the dispute resolution processes.
Abstract: The growing use of Information Technology in the commercial arena leads to an urgent need to find alternatives to traditional dispute resolution. New tools from fields such as artificial intelligence (AI) should be considered in the process of developing novel online dispute resolution (ODR) platforms, in order to make the ligation process simpler, faster and conform with the new virtual environments. In this work, we describe UMCourt, a project built around two sub-fields of AI research: Multi-agent Systems and Case-Based Reasoning, aimed at fostering the development of tools for ODR. This is then used to accomplish several objectives, from suggesting solutions to new disputes based on the observation of past similar disputes, to the improvement of the negotiation and mediation processes that may follow. The main objective of this work is to develop autonomous tools that can increase the effectiveness of the dispute resolution processes, namely by increasing the amount of meaningful information that is available for the parties.

Journal ArticleDOI
TL;DR: This paper defines a novel D-core framework and devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs, extending the classic graph-theoretic notion of $$k$$-cores for undirected graphs to directed ones.
Abstract: Community detection and evaluation is an important task in graph mining. In many cases, a community is defined as a subgraph characterized by dense connections or interactions between its nodes. A variety of measures are proposed to evaluate different quality aspects of such communities—in most cases ignoring the directed nature of edges. In this paper, we introduce novel metrics for evaluating the collaborative nature of directed graphs—a property not captured by the single node metrics or by other established community evaluation metrics. In order to accomplish this objective, we capitalize on the concept of graph degeneracy and define a novel D-core framework, extending the classic graph-theoretic notion of $$k$$ -cores for undirected graphs to directed ones. Based on the D-core, which essentially can be seen as a measure of the robustness of a community under degeneracy, we devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs. We applied the D-core approach on large synthetic and real-world graphs such as Wikipedia, DBLP, and ArXiv and report interesting results at the graph as well at the node level.

Journal ArticleDOI
TL;DR: This paper presents a signed-distance-based method for determining the objective importance of criteria and handling fuzzy, multiple criteria group decision-making problems in a flexible and intelligent way using interval type-2 trapezoidal fuzzy numbers.
Abstract: Interval type-2 fuzzy sets are associated with greater imprecision and more ambiguities than ordinary fuzzy sets. This paper presents a signed-distance-based method for determining the objective importance of criteria and handling fuzzy, multiple criteria group decision-making problems in a flexible and intelligent way. These advantages arise from the method’s use of interval type-2 trapezoidal fuzzy numbers to represent alternative ratings and the importance of various criteria. An integrated approach to determine the overall importance of the criteria is also developed using the subjective information provided by decision-makers and the objective information delivered by the decision matrix. In addition, a linear programming model is developed to estimate criterion weights and to extend the proposed multiple criteria decision analysis method. Finally, the feasibility and effectiveness of the proposed methods are illustrated by a group decision-making problem of patient-centered medicine in basilar artery occlusion.

Journal ArticleDOI
TL;DR: A new approach for finding overlapping clusters given pairwise similarities of objects is introduced, which relax the problem of correlation clustering by allowing an object to be assigned to more than one cluster.
Abstract: We introduce a new approach for finding overlapping clusters given pairwise similarities of objects. In particular, we relax the problem of correlation clustering by allowing an object to be assigned to more than one cluster. At the core of our approach is an optimization problem in which each data point is mapped to a small set of labels, representing membership in different clusters. The objective is to find a mapping so that the given similarities between objects agree as much as possible with similarities taken over their label sets. The number of labels can vary across objects. To define a similarity between label sets, we consider two measures: (i) a 0–1 function indicating whether the two label sets have non-zero intersection and (ii) the Jaccard coefficient between the two label sets. The algorithm we propose is an iterative local-search method. The definitions of label set similarity give rise to two non-trivial optimization problems, which, for the measures of set-intersection and Jaccard, we solve using a greedy strategy and non-negative least squares, respectively. We also develop a distributed version of our algorithm based on the BSP model and implement it using a Pregel framework. Our algorithm uses as input pairwise similarities of objects and can thus be applied when clustering structured objects for which feature vectors are not available. As a proof of concept, we apply our algorithms on three different and complex application domains: trajectories, amino-acid sequences, and textual documents.

Journal ArticleDOI
TL;DR: This paper proposes a novel document clustering framework that is designed to induce a document organization from the identification of cohesive groups of segment-based portions of the original documents.
Abstract: Document clustering has been recognized as a central problem in text data management. Such a problem becomes particularly challenging when document contents are characterized by subtopical discussions that are not necessarily relevant to each other. Existing methods for document clustering have traditionally assumed that a document is an indivisible unit for text representation and similarity computation, which may not be appropriate to handle documents with multiple topics. In this paper, we address the problem of multi-topic document clustering by leveraging the natural composition of documents in text segments that are coherent with respect to the underlying subtopics. We propose a novel document clustering framework that is designed to induce a document organization from the identification of cohesive groups of segment-based portions of the original documents. We empirically give evidence of the significance of our segment-based approach on large collections of multi-topic documents, and we compare it to conventional methods for document clustering.

Journal ArticleDOI
TL;DR: This paper develops and evaluates an automatic keyphrase extraction system for scientific documents and shows the efficiency and effectiveness of the refined candidate set and demonstrates that the new features improve the accuracy of the system.
Abstract: Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.

Journal ArticleDOI
TL;DR: This paper introduces the concept of distance graph representations of text data that preserve information about the relative ordering and distance between the words in the graphs and provide a much richer representation in terms of sentence structure of the underlying data.
Abstract: The rapid proliferation of the World Wide Web has increased the importance and prevalence of text as a medium for dissemination of information. A variety of text mining and management algorithms have been developed in recent years such as clustering, classification, indexing, and similarity search. Almost all these applications use the well-known vector-space model for text representation and analysis. While the vector-space model has proven itself to be an effective and efficient representation for mining purposes, it does not preserve information about the ordering of the words in the representation. In this paper, we will introduce the concept of distance graph representations of text data. Such representations preserve information about the relative ordering and distance between the words in the graphs and provide a much richer representation in terms of sentence structure of the underlying data. Recent advances in graph mining and hardware capabilities of modern computers enable us to process more complex representations of text. We will see that such an approach has clear advantages from a qualitative perspective. This approach enables knowledge discovery from text which is not possible with the use of a pure vector-space representation, because it loses much less information about the ordering of the underlying words. Furthermore, this representation does not require the development of new mining and management techniques. This is because the technique can also be converted into a structural version of the vector-space representation, which allows the use of all existing tools for text. In addition, existing techniques for graph and XML data can be directly leveraged with this new representation. Thus, a much wider spectrum of algorithms is available for processing this representation. We will apply this technique to a variety of mining and management applications and show its advantages and richness in exploring the structure of the underlying text documents.

Journal ArticleDOI
TL;DR: Two types of algorithms, namely level-wise and tree-based methods, are proposed for mining high-utility mobile sequential patterns and the results show that the proposed algorithms outperform the state-of-the-art mobile sequential pattern algorithms and that the tree- based algorithms deliver better performance than the level- wise ones under various conditions.
Abstract: Mining user behavior patterns in mobile environments is an emerging topic in data mining fields with wide applications. By integrating moving paths with purchasing transactions, one can find the sequential purchasing patterns with the moving paths, which are called mobile sequential patterns of the mobile users. Mobile sequential patterns can be applied not only for planning mobile commerce environments but also for analyzing and managing online shopping websites. However, unit profits and purchased numbers of the items are not considered in traditional framework of mobile sequential pattern mining. Thus, the patterns with high utility (i.e., profit here) cannot be found. In view of this, we aim at integrating mobile data mining with utility mining for finding high-utility mobile sequential patterns in this study. Two types of algorithms, namely level-wise and tree-based methods, are proposed for mining high-utility mobile sequential patterns. A series of analyses and comparisons on the performance of the two different types of algorithms are conducted through experimental evaluations. The results show that the proposed algorithms outperform the state-of-the-art mobile sequential pattern algorithms and that the tree-based algorithms deliver better performance than the level-wise ones under various conditions.

Journal ArticleDOI
TL;DR: A rule-based privacy model is introduced that allows data publishers to express fine-grained protection requirements for both identity and sensitive information disclosure, and two anonymization algorithms are developed that significantly outperform the state-of-the-art in terms of retaining data utility, while achieving good protection and scalability.
Abstract: Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing transaction data have been proposed recently, but they may produce excessively distorted and inadequately protected solutions. This is because these approaches do not consider privacy requirements that are common in real-world applications in a realistic and flexible manner, and attempt to safeguard the data only against either identity disclosure or sensitive information inference. In this paper, we propose a new approach that overcomes these limitations. We introduce a rule-based privacy model that allows data publishers to express fine-grained protection requirements for both identity and sensitive information disclosure. Based on this model, we also develop two anonymization algorithms. Our first algorithm works in a top-down fashion, employing an efficient strategy to recursively generalize data with low information loss. Our second algorithm uses sampling and a combination of top-down and bottom-up generalization heuristics, which greatly improves scalability while maintaining low information loss. Extensive experiments show that our algorithms significantly outperform the state-of-the-art in terms of retaining data utility, while achieving good protection and scalability.

Journal ArticleDOI
TL;DR: This work develops an efficient algorithm to extract a hierarchy of overlapping communities and demonstrates the promising potential of the proposed approach in real-world applications.
Abstract: A recent surge of participatory web and social media has created a new laboratory for studying human relations and collective behavior on an unprecedented scale. In this work, we study the predictive power of social connections to determine the preferences or behaviors of individuals such as whether a user supports a certain political view, whether one likes a product, whether she would like to vote for a presidential candidate, etc. Since an actor is likely to participate in multiple different communities with each regulating the actor’s behavior in varying degrees, and a natural hierarchy might exist between these communities, we propose to zoom into a network at multiple different resolutions and determine which communities reflect a targeted behavior. We develop an efficient algorithm to extract a hierarchy of overlapping communities. Empirical results on social media networks demonstrate the promising potential of the proposed approach in real-world applications.

Journal ArticleDOI
TL;DR: This paper presents an innovative mathematical model for improving the accuracy of RTLSs, focusing on the mitigation of the ground reflection effect by using multilayer perceptron artificial neural networks.
Abstract: Wireless sensor networks (WSNs) have become much more relevant in recent years, mainly because they can be used in a wide diversity of applications. Real-time locating systems (RTLSs) are one of the most promising applications based on WSNs and represent a currently growing market. Specifically, WSNs are an ideal alternative to develop RTLSs aimed at indoor environments where existing global navigation satellite systems, such as the global positioning system, do not work correctly due to the blockage of the satellite signals. However, accuracy in indoor RTLSs is still a problem requiring novel solutions. One of the main challenges is to deal with the problems that arise from the effects of the propagation of radiofrequency waves, such as attenuation, diffraction, reflection and scattering. These effects can lead to other undesired problems, such as multipath. When the ground is responsible for wave reflections, multipath can be modeled as the ground reflection effect. This paper presents an innovative mathematical model for improving the accuracy of RTLSs, focusing on the mitigation of the ground reflection effect by using multilayer perceptron artificial neural networks.

Journal ArticleDOI
TL;DR: A new method is developed for multiple attributes group decision-making problems under uncertain environment, in which the information about attribute weights is incompletely known or completely unknown, and each maker’s decision information is expressed by an interval-valued fuzzy soft set.
Abstract: In this paper, we develop a new method for multiple attributes group decision-making problems under uncertain environment, in which the information about attribute weights is incompletely known or completely unknown, and each maker’s decision information is expressed by an interval-valued fuzzy soft set. Moreover, this paper takes account of the decision makers’ attitude toward risk. In order to get the weight vector of the attributes, we construct the score matrix of the final fuzzy soft set. From the score matrix and the given attribute weights information, we establish an optimization model to determine the weights of attributes. For the special situations where the information about attribute weights is completely unknown, we establish another optimization model. By solving this model, we get a simple and exact formula, which can be used to determine the attribute weights. According to these models, a method based on interval-valued fuzzy soft set, which considers the decision makers’ risk attitude under uncertain environment, is given to rank the alternatives. Finally, a numerical example is used to illustrate the applicability of the proposed approach.

Journal ArticleDOI
TL;DR: The results show that the approach can find the same optimal solution as the champion system from the competition but also can provide more alternative solutions with the optimal QoS for users.
Abstract: This paper proposes a novel approach based on the planning-graph to solve the top-k QoS-aware automatic composition problem of semantic Web services. The approach includes three sequential stages: a forward search stage to generate a planning-graph to reduce the search space of the following two stages greatly, an optimal local QoS calculating stage to compute all the optimal local QoS values of services in the planning, and a backward search stage to find the top-K composed services with optimal QoS values according to the planning-graph and the optimal QoS value. In order to validate the approach, experiments are carried out based on the test sets offered by the WS-Challenge competition 2009. The results show that the approach can find the same optimal solution as the champion system from the competition but also can provide more alternative solutions with the optimal QoS for users.

Journal ArticleDOI
TL;DR: This paper proposes a new approach called MaxSegment for a two-dimensional space when the $$L_2$$-norm is used and extends this algorithm to other variations of the MaxBRNN problem such as the Max BRNN problem with other metric spaces, and a three- dimensional space.
Abstract: Maximizing bichromatic reverse nearest neighbor (MaxBRNN) is a variant of bichromatic reverse nearest neighbor (BRNN). The purpose of the MaxBRNN problem is to find an optimal region that maximizes the size of BRNNs. This problem has lots of real applications such as location planning and profile-based marketing. The best-known algorithm for the MaxBRNN problem is called MaxOverlap. In this paper, we study the MaxBRNN problem and propose a new approach called MaxSegment for a two-dimensional space when the $$L_2$$ -norm is used. Then, we extend our algorithm to other variations of the MaxBRNN problem such as the MaxBRNN problem with other metric spaces, and a three-dimensional space. Finally, we conducted experiments on real and synthetic datasets to compare our proposed algorithm with existing algorithms. The experimental results verify the efficiency of our proposed approach.

Journal ArticleDOI
TL;DR: Inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems, is considered.
Abstract: Obtaining an indication of confidence of predictions is desirable for many data mining applications. Predictions complemented with confidence levels can inform on the certainty or extent of reliability that may be associated with the prediction. This can be useful in varied application contexts where model outputs form the basis for potentially costly decisions, and in general across risk sensitive applications. The conformal prediction framework presents a novel approach for obtaining valid confidence measures associated with predictions from machine learning algorithms. Confidence levels are obtained from the underlying algorithm, using a non-conformity measure which indicates how ‘atypical’ a given example set is. The non-conformity measure is a key to determining the usefulness and efficiency of the approach. This paper considers inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems. Focusing on classification tasks, and considering realistic data contexts including class imbalance, we develop non-conformity measures for assessing the confidence of predicted class labels from random forests. We examine the performance of these measures on multiple data sets. Results demonstrate the usefulness and validity of the measures, their relative differences, and highlight the effectiveness of conformal prediction random forests for obtaining predictions with associated confidence.

Journal ArticleDOI
TL;DR: A weight-based semantics for querying an answer of a conjunctive query posed upon an inconsistent KB as a tuple of individuals whose substitution for the variables in the query head makes the query body entailed by any subbase of the KB consisting of the intensional knowledge and a weight-maximally consistent subset of the extensional knowledge.
Abstract: Non-standard query mechanisms that work under inconsistency are required in some important description logic (DL)-based applications, including those involving an inconsistent DL knowledge base ( KB) whose intensional knowledge is consistent but is violated by its extensional knowledge. This paper proposes a weight-based semantics for querying such an inconsistent KB. This semantics defines an answer of a conjunctive query posed upon an inconsistent KB as a tuple of individuals whose substitution for the variables in the query head makes the query body entailed by any subbase of the KB consisting of the intensional knowledge and a weight-maximally consistent subset of the extensional knowledge. A novel computational method for this semantics is proposed, which works for extensionally reduced $${\mathcal {SHIQ}}$$ KBs and conjunctive queries without non-distinguished variables. The method first compiles the given KB to a propositional program; then, for any given conjunctive query, it reduces the problem of computing all answers of the given query to a set of propositional satisfiability (SAT) problems with PB-constraints, which are then solved by SAT solvers. A decomposition-based framework for optimizing the method is also proposed. The feasibility of this method is demonstrated in our experiments.

Journal ArticleDOI
TL;DR: This work develops a novel pattern mining approach to mine a set of pairs of communities that behave in opposite ways with one another, and focuses on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities.
Abstract: Antagonistic communities refer to groups of people with opposite tastes, opinions, and factions within a community. Given a set of interactions among people in a community, we develop a novel pattern mining approach to mine a set of antagonistic communities. In particular, based on a set of user-specified thresholds, we extract a set of pairs of communities that behave in opposite ways with one another. We focus on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities. We also present a variation of the algorithm using a divide and conquer strategy to handle large datasets when main memory is inadequate. The scalability of our approach is tested on synthetic datasets of various sizes mined using various parameters. Case studies on Amazon, Epinions, and Slashdot datasets further show the efficiency and the utility of our approach in extracting antagonistic communities from social interactions.