scispace - formally typeset
Search or ask a question

Showing papers on "Knowledge extraction published in 2013"


Journal ArticleDOI
TL;DR: Key milestones and the current state of affairs in the field of EDM are reviewed, together with specific applications, tools, and future insights.
Abstract: Applying data mining DM in education is an emerging interdisciplinary research field also known as educational data mining EDM. It is concerned with developing methods for exploring the unique types of data that come from educational environments. Its goal is to better understand how students learn and identify the settings in which they learn to improve educational outcomes and to gain insights into and explain educational phenomena. Educational information systems can store a huge amount of potential data from multiple sources coming in different formats and at different granularity levels. Each particular educational problem has a specific objective with special characteristics that require a different treatment of the mining problem. The issues mean that traditional DM techniques cannot be applied directly to these types of data and problems. As a consequence, the knowledge discovery process has to be adapted and some specific DM techniques are needed. This paper introduces and reviews key milestones and the current state of affairs in the field of EDM, together with specific applications, tools, and future insights. © 2012 Wiley Periodicals, Inc.

885 citations


Journal ArticleDOI
04 Jun 2013-Entropy
TL;DR: An unsupervised and incremental learning approach to the extraction of maritime movement patterns is presented here to convert from raw data to information supporting decisions, and is a basis for automatically detecting anomalies and projecting current trajectories and patterns into the future.
Abstract: Understanding maritime traffic patterns is key to Maritime Situational Awareness applications, in particular, to classify and predict activities. Facilitated by the recent build-up of terrestrial networks and satellite constellations of Automatic Identification System (AIS) receivers, ship movement information is becoming increasingly available, both in coastal areas and open waters. The resulting amount of information is increasingly overwhelming to human operators, requiring the aid of automatic processing to synthesize the behaviors of interest in a clear and effective way. Although AIS data are only legally required for larger vessels, their use is growing, and they can be effectively used to infer different levels of contextual information, from the characterization of ports and off-shore platforms to spatial and temporal distributions of routes. An unsupervised and incremental learning approach to the extraction of maritime movement patterns is presented here to convert from raw data to information supporting decisions. This is a basis for automatically detecting anomalies and projecting current trajectories and patterns into the future. The proposed methodology, called TREAD (Traffic Route Extraction and Anomaly Detection) was developed for different levels of intermittency (i.e., sensor coverage and performance), persistence (i.e., time lag between subsequent observations) and data sources (i.e., ground-based and space-based receivers).

522 citations


Journal ArticleDOI
TL;DR: A survey of discretization methods can be found in this paper, where the main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data.
Abstract: Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.

419 citations


Journal ArticleDOI
TL;DR: This article introduces the KnowRob knowledge processing system, a system specifically designed to provide autonomous robots with the knowledge needed for performing everyday manipulation tasks, and evaluates the system’s scalability and present different integrated experiments that show its versatility and comprehensiveness.
Abstract: Autonomous service robots will have to understand vaguely described tasks, such as “set the table” or “clean up”. Performing such tasks as intended requires robots to fully, precisely, and appropriately parameterize their low-level control programs. We propose knowledge processing as a computational resource for enabling robots to bridge the gap between vague task descriptions and the detailed information needed to actually perform those tasks in the intended way. In this article, we introduce the KnowRob knowledge processing system that is specifically designed to provide autonomous robots with the knowledge needed for performing everyday manipulation tasks. The system allows the realization of “virtual knowledge bases”: collections of knowledge pieces that are not explicitly represented but computed on demand from the robot's internal data structures, its perception system, or external sources of information. This article gives an overview of the different kinds of knowledge, the different inference mechanisms, and interfaces for acquiring knowledge from external sources, such as the robot's perception system, observations of human activities, Web sites on the Internet, as well as Web-based knowledge bases for information exchange between robots. We evaluate the system's scalability and present different integrated experiments that show its versatility and comprehensiveness.

373 citations


Journal ArticleDOI
TL;DR: A novel method for building the approximate concept lattice of an incomplete context, the notion of an approximate decision rule and an approach for extracting non-redundant approximate decision rules from an incomplete decision context are presented.

239 citations


Journal ArticleDOI
TL;DR: A semantic model and a computation and annotation platform for developing a semantic approach that progressively transforms the raw mobility data into semantic trajectories enriched with segmentations and annotations is presented.
Abstract: With the large-scale adoption of GPS equipped mobile sensing devices, positional data generated by moving objects (e.g., vehicles, people, animals) are being easily collected. Such data are typically modeled as streams of spatio-temporal (x,y,t) points, called trajectories. In recent years trajectory management research has progressed significantly towards efficient storage and indexing techniques, as well as suitable knowledge discovery. These works focused on the geometric aspect of the raw mobility data. We are now witnessing a growing demand in several application sectors (e.g., from shipment tracking to geo-social networks) on understanding the semantic behavior of moving objects. Semantic behavior refers to the use of semantic abstractions of the raw mobility data, including not only geometric patterns but also knowledge extracted jointly from the mobility data and the underlying geographic and application domains information. The core contribution of this article lies in a semantic model and a computation and annotation platform for developing a semantic approach that progressively transforms the raw mobility data into semantic trajectories enriched with segmentations and annotations. We also analyze a number of experiments we did with semantic trajectories in different domains.

232 citations


Journal ArticleDOI
TL;DR: This second part of a large survey paper analyzes recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA and uses the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community.
Abstract: This is the second part of a large survey paper in which we analyze recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and keywords. We developed a knowledge browsing environment to support our literature analysis process. We use the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community. In this second part, we zoom in on and give an extensive overview of the papers published between 2003 and 2011 which applied FCA-based methods for knowledge discovery and ontology engineering in various application domains. These domains include software mining, web analytics, medicine, biology and chemistry data.

223 citations


Proceedings ArticleDOI
27 Oct 2013
TL;DR: This paper questions the idea that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals.
Abstract: Much work in knowledge extraction from text tacitly assumes that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals. In this paper, we question this idea, examining the phenomenon of reporting bias and the challenge it poses for knowledge extraction. We conclude with discussion of approaches to learning commonsense knowledge from text despite this distortion.

208 citations


Journal ArticleDOI
TL;DR: A stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data is described.

199 citations


Book ChapterDOI
22 Apr 2013
TL;DR: This paper provides an overview of big data mining and discusses the related challenges and the new opportunities, including a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected onbig data mining.
Abstract: While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and discusses the related challenges and the new opportunities. The discussion includes a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected on big data mining. We address broad issues related to big data and/or big data mining, and point out opportunities and research topics as they shall duly flesh out. We hope our effort will help reshape the subject area of today's data mining technology toward solving tomorrow's bigger challenges emerging in accordance with big data.

184 citations


Journal ArticleDOI
TL;DR: In an assessment using four complex, real-life event logs, it is shown that this technique significantly outperforms currently available trace clustering techniques.
Abstract: Process discovery is the learning task that entails the construction of process models from event logs of information systems. Typically, these event logs are large data sets that contain the process executions by registering what activity has taken place at a certain moment in time. By far the most arduous challenge for process discovery algorithms consists of tackling the problem of accurate and comprehensible knowledge discovery from highly flexible environments. Event logs from such flexible systems often contain a large variety of process executions which makes the application of process mining most interesting. However, simply applying existing process discovery techniques will often yield highly incomprehensible process models because of their inaccuracy and complexity. With respect to resolving this problem, trace clustering is one very interesting approach since it allows to split up an existing event log so as to facilitate the knowledge discovery process. In this paper, we propose a novel trace clustering technique that significantly differs from previous approaches. Above all, it starts from the observation that currently available techniques suffer from a large divergence between the clustering bias and the evaluation bias. By employing an active learning inspired approach, this bias divergence is solved. In an assessment using four complex, real-life event logs, it is shown that our technique significantly outperforms currently available trace clustering techniques.

Journal ArticleDOI
TL;DR: This work is studying an approach to the underlying multi-relational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification, and describes implementations of rca and list applications to problems from software and knowledge engineering.
Abstract: The processing of complex data is admittedly among the major concerns of knowledge discovery from data (kdd). Indeed, a major part of the data worth analyzing is stored in relational databases and, since recently, on the Web of Data. This clearly underscores the need for Entity-Relationship and rdf compliant data mining (dm) tools. We are studying an approach to the underlying multi-relational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification. Our relational concept analysis (rca) extends fca to the processing of multi-relational datasets, i.e., with multiple sorts of individuals, each provided with its own set of attributes, and relationships among those. Given such a dataset, rca constructs a set of concept lattices, one per object sort, through an iterative analysis process that is bound towards a fixed-point. In doing that, it abstracts the links between objects into attributes akin to role restrictions from description logics (dls). We address here key aspects of the iterative calculation such as evolution in data description along the iterations and process termination. We describe implementations of rca and list applications to problems from software and knowledge engineering.

Proceedings ArticleDOI
27 Apr 2013
TL;DR: The principles driving design mining, the implementation of the Webzeitgeist architecture, and the new class of data-driven design applications it enables are described.
Abstract: Advances in data mining and knowledge discovery have transformed the way Web sites are designed. However, while visual presentation is an intrinsic part of the Web, traditional data mining techniques ignore render-time page structures and their attributes. This paper introduces design mining for the Web: using knowledge discovery techniques to understand design demographics, automate design curation, and support data-driven design tools. This idea is manifest in Webzeitgeist, a platform for large-scale design mining comprising a repository of over 100,000 Web pages and 100 million design elements. This paper describes the principles driving design mining, the implementation of the Webzeitgeist architecture, and the new class of data-driven design applications it enables.

Journal ArticleDOI
TL;DR: Results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes, and some new interesting results with data mining and knowledge discovery techniques applied to a drill production process are reported.
Abstract: Academics and practitioners have a common interest in the continuing development of methods and computer applications that support or perform knowledge-intensive engineering tasks. Operations management dysfunctions and lost production time are problems of enormous magnitude that impact the performance and quality of industrial systems as well as their cost of production. Association rule mining is a data mining technique used to find out useful and invaluable information from huge databases. This work develops a better conceptual base for improving the application of association rule mining methods to extract knowledge on operations and information management. The emphasis of the paper is on the improvement of the operations processes. The application example details an industrial experiment in which association rule mining is used to analyze the manufacturing process of a fully integrated provider of drilling products. The study reports some new interesting results with data mining and knowledge discovery techniques applied to a drill production process. Experiment's results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes.

Journal ArticleDOI
TL;DR: Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects and are necessary for rapid access to, and sharing of knowledge through data mining and knowledge discovery approaches.
Abstract: Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects. The application of advanced computer models enabling the simulation of complex biological processes generates hypotheses and suggests experiments. Appropriately interfaced with biomedical databases, models are necessary for rapid access to, and sharing of knowledge through data mining and knowledge discovery approaches.

Journal ArticleDOI
TL;DR: The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods that allow a more efficient passage from (unstructured) textual information to machine-processable data, in potentially any domain.
Abstract: The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.

Journal ArticleDOI
TL;DR: This work proposes a new structure–activity relationship (SAR) approach to mine molecular fragments that act as structural alerts for biological activity, and has been tested on the mutagenicity endpoint, showing marked prediction skills and bringing to the surface much of the knowledge already collected in the literature as well as new evidence.
Abstract: This work proposes a new structure–activity relationship (SAR) approach to mine molecular fragments that act as structural alerts for biological activity. The entire process is designed to fit with human reasoning, not only to make the predictions more reliable but also to permit clear control by the user in order to meet customized requirements. This approach has been tested on the mutagenicity endpoint, showing marked prediction skills and, more interestingly, bringing to the surface much of the knowledge already collected in the literature as well as new evidence.

Book ChapterDOI
02 Sep 2013
TL;DR: A novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence in the life sciences domain.
Abstract: A major challenge in our networked world is the increasing amount of data, which require efficient and user-friendly solutions. A timely example is the biomedical domain: the trend towards personalized medicine has resulted in a sheer mass of the generated (-omics) data. In the life sciences domain, most data models are characterized by complexity, which makes manual analysis very time-consuming and frequently practically impossible. Computational methods may help; however, we must acknowledge that the problem-solving knowledge is located in the human mind and - not in machines. A strategic aim to find solutions for data intensive problems could lay in the combination of two areas, which bring ideal pre-conditions: Human-Computer Interaction (HCI) and Knowledge Discovery (KDD). HCI deals with questions of human perception, cognition, intelligence, decision-making and interactive techniques of visualization, so it centers mainly on supervised methods. KDD deals mainly with questions of machine intelligence and data mining, in particular with the development of scalable algorithms for finding previously unknown relationships in data, thus centers on automatic computational methods. A proverb attributed perhaps incorrectly to Albert Einstein illustrates this perfectly: “Computers are incredibly fast, accurate, but stupid. Humans are incredibly slow, inaccurate, but brilliant. Together they may be powerful beyond imagination”. Consequently, a novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence.

Journal ArticleDOI
TL;DR: A knowledge acquisition and representation approach using the fuzzy evidential reasoning approach and dynamic adaptive FPNs to solve the problems of domain experts' diversity experience and reason the rule-based knowledge more intelligently is presented.
Abstract: The two most important issues of expert systems are the acquisition of domain experts' professional knowledge and the representation and reasoning of the knowledge rules that have been identified. First, during expert knowledge acquisition processes, the domain expert panel often demonstrates different experience and knowledge from one another and produces different types of knowledge information such as complete and incomplete, precise and imprecise, and known and unknown because of its cross-functional and multidisciplinary nature. Second, as a promising tool for knowledge representation and reasoning, fuzzy Petri nets (FPNs) still suffer a couple of deficiencies. The parameters in current FPN models could not accurately represent the increasingly complex knowledge-based systems, and the rules in most existing knowledge inference frameworks could not be dynamically adjustable according to propositions' variation as human cognition and thinking. In this paper, we present a knowledge acquisition and representation approach using the fuzzy evidential reasoning approach and dynamic adaptive FPNs to solve the problems mentioned above. As is illustrated by the numerical example, the proposed approach can well capture experts' diversity experience, enhance the knowledge representation power, and reason the rule-based knowledge more intelligently.

Proceedings ArticleDOI
27 Oct 2013
TL;DR: A framework to leverage the general knowledge in topic models, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries and is the first such model that can incorporate the domain independent knowledge.
Abstract: Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.

Journal ArticleDOI
TL;DR: A data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions is developed and demonstrated by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density cholesterol test result during the measurement year was <100 mg/dL.

Proceedings Article
03 Aug 2013
TL;DR: This paper proposes a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains, and its evaluation results will demonstrate its effectiveness.
Abstract: Topic models have been widely used to identify topics in text corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. In this paper, we go one step further to study how the prior knowledge from other domains can be exploited to help topic modeling in the new domain. This problem setting is important from both the application and the learning perspectives because knowledge is inherently accumulative. We human beings gain knowledge gradually and use the old knowledge to help solve new problems. To achieve this objective, existing models have some major difficulties. In this paper, we propose a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains. Our evaluation results will demonstrate its effectiveness.

Proceedings ArticleDOI
11 Aug 2013
TL;DR: This paper translates the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describes how advances within each of these areas can be leveraged to understand the domain of healthcare.
Abstract: he role of big data in addressing the needs of the present healthcare system in US and rest of the world has been echoed by government, private, and academic sectors. There has been a growing emphasis to explore the promise of big data analytics in tapping the potential of the massive healthcare data emanating from private and government health insurance providers. While the domain implications of such collaboration are well known, this type of data has been explored to a limited extent in the data mining community. The objective of this paper is two fold: first, we introduce the emerging domain of "big" healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. Specifically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. Each case study illustrates a unique intersection of data mining and healthcare with a common objective of improving the cost-care ratio by mining for opportunities to improve healthcare operations and reducing what seems to fall under fraud, waste, and abuse.

Journal ArticleDOI
TL;DR: This paper forms data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode.

Journal ArticleDOI
TL;DR: This work proposes a novel in-network knowledge discovery approach that provides outlier detection and data clustering simultaneously and shows that the proposed algorithm outperforms other techniques in both effectiveness and efficiency.

Proceedings Article
01 Oct 2013
TL;DR: A more advanced topic model, called MC-LDA (LDA with m-set and c-set), is proposed, which is based on an Extended generalized Polya urn (E-GPU) model (which is also proposed in this paper).
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Polya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

Proceedings ArticleDOI
16 Jun 2013
TL;DR: The proposed methodology has performed better than the existing system in terms of tweets classification and sentiment analysis and the increase in information gain has enabled the proposed system to better summarize the twitter data for user sentiments regarding a keyword from a particular category.
Abstract: The rise of social media in couple of years has changed the general perspective of networking, socialization, and personalization. Use of data from social networks for different purposes, such as election prediction, sentimental analysis, marketing, communication, business, and education, is increasing day by day. Precise extraction of valuable information from short text messages posted on social media (Twitter) is a collaborative task. In this paper, we analyze tweets to classify data and sentiments from Twitter more precisely. The information from tweets are extracted using keyword based knowledge extraction. Moreover, the extracted knowledge is further enhanced using domain specific seed based enrichment technique. The proposed methodology facilitates the extraction of keywords, entities, synonyms, and parts of speech from tweets which are then used for tweets classification and sentimental analysis. The proposed system is tested on a collection of 40,000 tweets. The proposed methodology has performed better than the existing system in terms of tweets classification and sentiment analysis. By applying the Knowledge Enhancer and Synonym Binder module on the extracted information we have achieved increase in information gain in a range of 0.1% to 55%. The increase in information gain has enabled our proposed system to better summarize the twitter data for user sentiments regarding a keyword from a particular category.

Journal ArticleDOI
TL;DR: Two incremental algorithms for updating the approximations in disjunctive/conjunctive set-valued information systems are proposed and results indicate the incremental approaches significantly outperform non-incremental approaches with a dramatic reduction in the computational speed.
Abstract: Incremental learning is an efficient technique for knowledge discovery in a dynamic database, which enables acquiring additional knowledge from new data without forgetting prior knowledge. Rough set theory has been successfully used in information systems for classification analysis. Set-valued information systems are generalized models of single-valued information systems, which can be classified into two categories: disjunctive and conjunctive. Approximations are fundamental concepts of rough set theory, which need to be updated incrementally while the object set varies over time in the set-valued information systems. In this paper, we analyze the updating mechanisms for computing approximations with the variation of the object set. Two incremental algorithms for updating the approximations in disjunctive/conjunctive set-valued information systems are proposed, respectively. Furthermore, extensive experiments are carried out on several data sets to verify the performance of the proposed algorithms. The results indicate the incremental approaches significantly outperform non-incremental approaches with a dramatic reduction in the computational speed.

Journal ArticleDOI
01 Apr 2013
TL;DR: A data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service, whose main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process.
Abstract: In today's competitive market, companies need to use discovery knowledge techniques to make better, more informed decisions. But these techniques are out of the reach of most users as the knowledge discovery process requires an incredible amount of expertise. Additionally, business intelligence vendors are moving their systems to the cloud in order to provide services which offer companies cost-savings, better performance and faster access to new applications. This work joins both facets. It describes a data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service. Its main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process.

Journal ArticleDOI
TL;DR: A novel way is proposed to identify efficiently and accurately the ''best'' features by first combining the results of some well-known FS techniques to find consistent features, and then using the proposed concept of support to select a smallest set of features and cover data optimality.