Showing papers on "Knowledge extraction published in 2013"

PDF

Open Access

Journal Article•DOI•

[...]

Cristóbal Romero¹, Sebastián Ventura¹•Institutions (1)

01 Jan 2013-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Key milestones and the current state of affairs in the field of EDM are reviewed, together with specific applications, tools, and future insights.

...read moreread less

Abstract: Applying data mining DM in education is an emerging interdisciplinary research field also known as educational data mining EDM. It is concerned with developing methods for exploring the unique types of data that come from educational environments. Its goal is to better understand how students learn and identify the settings in which they learn to improve educational outcomes and to gain insights into and explain educational phenomena. Educational information systems can store a huge amount of potential data from multiple sources coming in different formats and at different granularity levels. Each particular educational problem has a specific objective with special characteristics that require a different treatment of the mining problem. The issues mean that traditional DM techniques cannot be applied directly to these types of data and problems. As a consequence, the knowledge discovery process has to be adapted and some specific DM techniques are needed. This paper introduces and reviews key milestones and the current state of affairs in the field of EDM, together with specific applications, tools, and future insights. © 2012 Wiley Periodicals, Inc.

...read moreread less

885 citations

Journal Article•DOI•

Vessel Pattern Knowledge Discovery from AIS Data: A Framework for Anomaly Detection and Route Prediction

[...]

Giuliana Pallotta, Michele Vespe, Karna Bryan

04 Jun 2013-Entropy

TL;DR: An unsupervised and incremental learning approach to the extraction of maritime movement patterns is presented here to convert from raw data to information supporting decisions, and is a basis for automatically detecting anomalies and projecting current trajectories and patterns into the future.

...read moreread less

Abstract: Understanding maritime traffic patterns is key to Maritime Situational Awareness applications, in particular, to classify and predict activities. Facilitated by the recent build-up of terrestrial networks and satellite constellations of Automatic Identification System (AIS) receivers, ship movement information is becoming increasingly available, both in coastal areas and open waters. The resulting amount of information is increasingly overwhelming to human operators, requiring the aid of automatic processing to synthesize the behaviors of interest in a clear and effective way. Although AIS data are only legally required for larger vessels, their use is growing, and they can be effectively used to infer different levels of contextual information, from the characterization of ports and off-shore platforms to spatial and temporal distributions of routes. An unsupervised and incremental learning approach to the extraction of maritime movement patterns is presented here to convert from raw data to information supporting decisions. This is a basis for automatically detecting anomalies and projecting current trajectories and patterns into the future. The proposed methodology, called TREAD (Traffic Route Extraction and Anomaly Detection) was developed for different levels of intermittency (i.e., sensor coverage and performance), persistence (i.e., time lag between subsequent observations) and data sources (i.e., ground-based and space-based receivers).

...read moreread less

522 citations

Journal Article•DOI•

A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning

[...]

Salvador García¹, Julián Luengo², José A. Sáez³, Victoria López³, Francisco Herrera³ - Show less +1 more•Institutions (3)

University of Jaén¹, University of Burgos², University of Granada³

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A survey of discretization methods can be found in this paper, where the main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data.

...read moreread less

Abstract: Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.

...read moreread less

419 citations

Journal Article•DOI•

KnowRob: A knowledge processing infrastructure for cognition-enabled robots

[...]

Moritz Tenorth¹, Michael Beetz¹•Institutions (1)

University of Bremen¹

01 Apr 2013-The International Journal of Robotics Research

TL;DR: This article introduces the KnowRob knowledge processing system, a system specifically designed to provide autonomous robots with the knowledge needed for performing everyday manipulation tasks, and evaluates the system’s scalability and present different integrated experiments that show its versatility and comprehensiveness.

...read moreread less

Abstract: Autonomous service robots will have to understand vaguely described tasks, such as âset the tableâ or âclean upâ. Performing such tasks as intended requires robots to fully, precisely, and appropriately parameterize their low-level control programs. We propose knowledge processing as a computational resource for enabling robots to bridge the gap between vague task descriptions and the detailed information needed to actually perform those tasks in the intended way. In this article, we introduce the KnowRob knowledge processing system that is specifically designed to provide autonomous robots with the knowledge needed for performing everyday manipulation tasks. The system allows the realization of âvirtual knowledge basesâ: collections of knowledge pieces that are not explicitly represented but computed on demand from the robot's internal data structures, its perception system, or external sources of information. This article gives an overview of the different kinds of knowledge, the different inference mechanisms, and interfaces for acquiring knowledge from external sources, such as the robot's perception system, observations of human activities, Web sites on the Internet, as well as Web-based knowledge bases for information exchange between robots. We evaluate the system's scalability and present different integrated experiments that show its versatility and comprehensiveness.

...read moreread less

373 citations

Journal Article•DOI•

Incomplete decision contexts: Approximate concept construction, rule acquisition and knowledge reduction

[...]

Jinhai Li¹, Changlin Mei¹, Yue-jin Lv²•Institutions (2)

Xi'an Jiaotong University¹, Guangxi University²

01 Jan 2013-International Journal of Approximate Reasoning

TL;DR: A novel method for building the approximate concept lattice of an incomplete context, the notion of an approximate decision rule and an approach for extracting non-redundant approximate decision rules from an incomplete decision context are presented.

...read moreread less

239 citations

Journal Article•DOI•

Semantic trajectories: Mobility data computation and annotation

[...]

Zhixian Yan¹, Dipanjan Chakraborty², Christine Parent³, Stefano Spaccapietra¹, Karl Aberer¹ - Show less +1 more•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, IBM², University of Lausanne³

01 Jul 2013-ACM Transactions on Intelligent Systems and Technology

TL;DR: A semantic model and a computation and annotation platform for developing a semantic approach that progressively transforms the raw mobility data into semantic trajectories enriched with segmentations and annotations is presented.

...read moreread less

Abstract: With the large-scale adoption of GPS equipped mobile sensing devices, positional data generated by moving objects (e.g., vehicles, people, animals) are being easily collected. Such data are typically modeled as streams of spatio-temporal (x,y,t) points, called trajectories. In recent years trajectory management research has progressed significantly towards efficient storage and indexing techniques, as well as suitable knowledge discovery. These works focused on the geometric aspect of the raw mobility data. We are now witnessing a growing demand in several application sectors (e.g., from shipment tracking to geo-social networks) on understanding the semantic behavior of moving objects. Semantic behavior refers to the use of semantic abstractions of the raw mobility data, including not only geometric patterns but also knowledge extracted jointly from the mobility data and the underlying geographic and application domains information. The core contribution of this article lies in a semantic model and a computation and annotation platform for developing a semantic approach that progressively transforms the raw mobility data into semantic trajectories enriched with segmentations and annotations. We also analyze a number of experiments we did with semantic trajectories in different domains.

...read moreread less

232 citations

Journal Article•DOI•

Review: Formal concept analysis in knowledge processing: A survey on applications

[...]

Jonas Poelmans¹, Dmitry I. Ignatov¹, Sergei O. Kuznetsov¹, Guido Dedene²•Institutions (2)

National Research University – Higher School of Economics¹, University of Amsterdam²

01 Nov 2013-Expert Systems With Applications

TL;DR: This second part of a large survey paper analyzes recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA and uses the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community.

...read moreread less

Abstract: This is the second part of a large survey paper in which we analyze recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and keywords. We developed a knowledge browsing environment to support our literature analysis process. We use the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community. In this second part, we zoom in on and give an extensive overview of the papers published between 2003 and 2011 which applied FCA-based methods for knowledge discovery and ontology engineering in various application domains. These domains include software mining, web analytics, medicine, biology and chemistry data.

...read moreread less

223 citations

Proceedings Article•DOI•

Reporting bias and knowledge acquisition

[...]

Jonathan Gordon¹, Benjamin Van Durme²•Institutions (2)

University of Rochester¹, Johns Hopkins University²

27 Oct 2013

TL;DR: This paper questions the idea that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals.

...read moreread less

Abstract: Much work in knowledge extraction from text tacitly assumes that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals. In this paper, we question this idea, examining the phenomenon of reporting bias and the challenge it poses for knowledge extraction. We conclude with discussion of approaches to learning commonsense knowledge from text despite this distortion.

...read moreread less

208 citations

Journal Article•DOI•

Unsupervised biomedical named entity recognition

[...]

Shaodian Zhang¹, Noémie Elhadad¹•Institutions (1)

Columbia University¹

01 Dec 2013-Journal of Biomedical Informatics

TL;DR: A stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data is described.

...read moreread less

199 citations

Book Chapter•DOI•

From Big Data to Big Data Mining: Challenges, Issues, and Opportunities

[...]

Dunren Che¹, Mejdl Safran¹, Zhiyong Peng²•Institutions (2)

Southern Illinois University Carbondale¹, Wuhan University²

22 Apr 2013

TL;DR: This paper provides an overview of big data mining and discusses the related challenges and the new opportunities, including a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected onbig data mining.

...read moreread less

Abstract: While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and discusses the related challenges and the new opportunities. The discussion includes a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected on big data mining. We address broad issues related to big data and/or big data mining, and point out opportunities and research topics as they shall duly flesh out. We hope our effort will help reshape the subject area of today's data mining technology toward solving tomorrow's bigger challenges emerging in accordance with big data.

...read moreread less

184 citations

Journal Article•DOI•

Active Trace Clustering for Improved Process Discovery

[...]

Jochen De Weerdt¹, Seppe vanden Broucke¹, Jan Vanthienen¹, Bart Baesens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Dec 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: In an assessment using four complex, real-life event logs, it is shown that this technique significantly outperforms currently available trace clustering techniques.

...read moreread less

Abstract: Process discovery is the learning task that entails the construction of process models from event logs of information systems. Typically, these event logs are large data sets that contain the process executions by registering what activity has taken place at a certain moment in time. By far the most arduous challenge for process discovery algorithms consists of tackling the problem of accurate and comprehensible knowledge discovery from highly flexible environments. Event logs from such flexible systems often contain a large variety of process executions which makes the application of process mining most interesting. However, simply applying existing process discovery techniques will often yield highly incomprehensible process models because of their inaccuracy and complexity. With respect to resolving this problem, trace clustering is one very interesting approach since it allows to split up an existing event log so as to facilitate the knowledge discovery process. In this paper, we propose a novel trace clustering technique that significantly differs from previous approaches. Above all, it starts from the observation that currently available techniques suffer from a large divergence between the clustering bias and the evaluation bias. By employing an active learning inspired approach, this bias divergence is solved. In an assessment using four complex, real-life event logs, it is shown that our technique significantly outperforms currently available trace clustering techniques.

...read moreread less

Journal Article•DOI•

Relational concept analysis: mining concept lattices from multi-relational data

[...]

Mohamed Rouane-Hacene¹, Marianne Huchard², Amedeo Napoli³, Petko Valtchev¹•Institutions (3)

Université du Québec à Montréal¹, University of Montpellier², University of Lorraine³

01 Jan 2013-Annals of Mathematics and Artificial Intelligence

TL;DR: This work is studying an approach to the underlying multi-relational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification, and describes implementations of rca and list applications to problems from software and knowledge engineering.

...read moreread less

Abstract: The processing of complex data is admittedly among the major concerns of knowledge discovery from data (kdd). Indeed, a major part of the data worth analyzing is stored in relational databases and, since recently, on the Web of Data. This clearly underscores the need for Entity-Relationship and rdf compliant data mining (dm) tools. We are studying an approach to the underlying multi-relational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification. Our relational concept analysis (rca) extends fca to the processing of multi-relational datasets, i.e., with multiple sorts of individuals, each provided with its own set of attributes, and relationships among those. Given such a dataset, rca constructs a set of concept lattices, one per object sort, through an iterative analysis process that is bound towards a fixed-point. In doing that, it abstracts the links between objects into attributes akin to role restrictions from description logics (dls). We address here key aspects of the iterative calculation such as evolution in data description along the iterations and process termination. We describe implementations of rca and list applications to problems from software and knowledge engineering.

...read moreread less

Proceedings Article•DOI•

Webzeitgeist: design mining the web

[...]

Ranjitha Kumar¹, Arvind Satyanarayan¹, César I. Torres¹, Maxine Lim¹, Salman Ahmad², Scott R. Klemmer¹, Jerry O. Talton³ - Show less +3 more•Institutions (3)

Stanford University¹, Massachusetts Institute of Technology², Intel³

27 Apr 2013

TL;DR: The principles driving design mining, the implementation of the Webzeitgeist architecture, and the new class of data-driven design applications it enables are described.

...read moreread less

Abstract: Advances in data mining and knowledge discovery have transformed the way Web sites are designed. However, while visual presentation is an intrinsic part of the Web, traditional data mining techniques ignore render-time page structures and their attributes. This paper introduces design mining for the Web: using knowledge discovery techniques to understand design demographics, automate design curation, and support data-driven design tools. This idea is manifest in Webzeitgeist, a platform for large-scale design mining comprising a repository of over 100,000 Web pages and 100 million design elements. This paper describes the principles driving design mining, the implementation of the Webzeitgeist architecture, and the new class of data-driven design applications it enables.

...read moreread less

Journal Article•DOI•

Mining association rules for the quality improvement of the production process

[...]

Bernard Kamsu-Foguem¹, Fabien Rigal¹, FéLix Mauget¹•Institutions (1)

University of Toulouse¹

01 Mar 2013-Expert Systems With Applications

TL;DR: Results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes, and some new interesting results with data mining and knowledge discovery techniques applied to a drill production process are reported.

...read moreread less

Abstract: Academics and practitioners have a common interest in the continuing development of methods and computer applications that support or perform knowledge-intensive engineering tasks. Operations management dysfunctions and lost production time are problems of enormous magnitude that impact the performance and quality of industrial systems as well as their cost of production. Association rule mining is a data mining technique used to find out useful and invaluable information from huge databases. This work develops a better conceptual base for improving the application of association rule mining methods to extract knowledge on operations and information management. The emphasis of the paper is on the improvement of the operations processes. The application example details an industrial experiment in which association rule mining is used to analyze the manufacturing process of a fully integrated provider of drilling products. The study reports some new interesting results with data mining and knowledge discovery techniques applied to a drill production process. Experiment's results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes.

...read moreread less

Journal Article•DOI•

Mathematical modeling of biological systems

[...]

Santo Motta¹, Francesco Pappalardo•Institutions (1)

University of Catania¹

01 Jul 2013-Briefings in Bioinformatics

TL;DR: Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects and are necessary for rapid access to, and sharing of knowledge through data mining and knowledge discovery approaches.

...read moreread less

Abstract: Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects. The application of advanced computer models enabling the simulation of complex biological processes generates hypotheses and suggests experiments. Appropriately interfaced with biomedical databases, models are necessary for rapid access to, and sharing of knowledge through data mining and knowledge discovery approaches.

...read moreread less

Journal Article•DOI•

Knowledge-Based Approaches to Concept-Level Sentiment Analysis

[...]

Erik Cambria¹, Björn Schuller², Bing Liu³, Haixun Wang⁴, Catherine Havasi⁵ - Show less +1 more•Institutions (5)

National University of Singapore¹, Technische Universität München², University of Illinois at Chicago³, Microsoft⁴, Massachusetts Institute of Technology⁵

01 Mar 2013-IEEE Intelligent Systems

TL;DR: The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods that allow a more efficient passage from (unstructured) textual information to machine-processable data, in potentially any domain.

...read moreread less

Abstract: The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.

...read moreread less

Journal Article•DOI•

Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction.

[...]

Thomas Ferrari¹, Dario Cattaneo¹, Giuseppina Gini¹, N. Golbamaki Bakhtyari², Alberto Manganaro², Emilio Benfenati² - Show less +2 more•Institutions (2)

Polytechnic University of Milan¹, Mario Negri Institute for Pharmacological Research²

11 Jun 2013-Sar and Qsar in Environmental Research

TL;DR: This work proposes a new structure–activity relationship (SAR) approach to mine molecular fragments that act as structural alerts for biological activity, and has been tested on the mutagenicity endpoint, showing marked prediction skills and bringing to the surface much of the knowledge already collected in the literature as well as new evidence.

...read moreread less

Abstract: This work proposes a new structure–activity relationship (SAR) approach to mine molecular fragments that act as structural alerts for biological activity. The entire process is designed to fit with human reasoning, not only to make the predictions more reliable but also to permit clear control by the user in order to meet customized requirements. This approach has been tested on the mutagenicity endpoint, showing marked prediction skills and, more interestingly, bringing to the surface much of the knowledge already collected in the literature as well as new evidence.

...read moreread less

Book Chapter•DOI•

Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together?

[...]

Andreas Holzinger¹, Andreas Holzinger²•Institutions (2)

University of Graz¹, Graz University of Technology²

02 Sep 2013

TL;DR: A novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence in the life sciences domain.

...read moreread less

Abstract: A major challenge in our networked world is the increasing amount of data, which require efficient and user-friendly solutions. A timely example is the biomedical domain: the trend towards personalized medicine has resulted in a sheer mass of the generated (-omics) data. In the life sciences domain, most data models are characterized by complexity, which makes manual analysis very time-consuming and frequently practically impossible. Computational methods may help; however, we must acknowledge that the problem-solving knowledge is located in the human mind and - not in machines. A strategic aim to find solutions for data intensive problems could lay in the combination of two areas, which bring ideal pre-conditions: Human-Computer Interaction (HCI) and Knowledge Discovery (KDD). HCI deals with questions of human perception, cognition, intelligence, decision-making and interactive techniques of visualization, so it centers mainly on supervised methods. KDD deals mainly with questions of machine intelligence and data mining, in particular with the development of scalable algorithms for finding previously unknown relationships in data, thus centers on automatic computational methods. A proverb attributed perhaps incorrectly to Albert Einstein illustrates this perfectly: “Computers are incredibly fast, accurate, but stupid. Humans are incredibly slow, inaccurate, but brilliant. Together they may be powerful beyond imagination”. Consequently, a novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence.

...read moreread less

Journal Article•DOI•

Knowledge Acquisition and Representation Using Fuzzy Evidential Reasoning and Dynamic Adaptive Fuzzy Petri Nets

[...]

Hu-Chen Liu¹, Long Liu², Qing-Lian Lin³, Nan Liu⁴•Institutions (4)

Tokyo Institute of Technology¹, Tongji University², Technical University of Berlin³, Chongqing Jiaotong University⁴

01 Jun 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A knowledge acquisition and representation approach using the fuzzy evidential reasoning approach and dynamic adaptive FPNs to solve the problems of domain experts' diversity experience and reason the rule-based knowledge more intelligently is presented.

...read moreread less

Abstract: The two most important issues of expert systems are the acquisition of domain experts' professional knowledge and the representation and reasoning of the knowledge rules that have been identified. First, during expert knowledge acquisition processes, the domain expert panel often demonstrates different experience and knowledge from one another and produces different types of knowledge information such as complete and incomplete, precise and imprecise, and known and unknown because of its cross-functional and multidisciplinary nature. Second, as a promising tool for knowledge representation and reasoning, fuzzy Petri nets (FPNs) still suffer a couple of deficiencies. The parameters in current FPN models could not accurately represent the increasingly complex knowledge-based systems, and the rules in most existing knowledge inference frameworks could not be dynamically adjustable according to propositions' variation as human cognition and thinking. In this paper, we present a knowledge acquisition and representation approach using the fuzzy evidential reasoning approach and dynamic adaptive FPNs to solve the problems mentioned above. As is illustrated by the numerical example, the proposed approach can well capture experts' diversity experience, enhance the knowledge representation power, and reason the rule-based knowledge more intelligently.

...read moreread less

Proceedings Article•DOI•

Discovering coherent topics using general knowledge

[...]

Zhiyuan Chen¹, Arjun Mukherjee¹, Bing Liu¹, Meichun Hsu², Malu Castellanos², Riddhiman Ghosh² - Show less +2 more•Institutions (2)

University of Illinois at Chicago¹, Hewlett-Packard²

27 Oct 2013

TL;DR: A framework to leverage the general knowledge in topic models, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries and is the first such model that can incorporate the domain independent knowledge.

...read moreread less

Abstract: Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.

...read moreread less

Journal Article•DOI•

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

[...]

Jyotishman Pathak¹, Kent R. Bailey¹, Calvin E. Beebe¹, Steven Bethard², David Carrell³, Pei Chen⁴, Dmitriy Dligach⁴, Cory M. Endle¹, Lacey A. Hart¹, Peter J. Haug, Stanley M. Huff, Vinod C. Kaggal¹, Dingcheng Li¹, Hongfang Liu¹, Kyle Marchant, James J. Masanz¹, Timothy A. Miller⁴, Thomas A. Oniki, Martha Palmer², Kevin J. Peterson¹, Susan Rea, Guergana Savova⁴, Craig Stancl¹, Sunghwan Sohn¹, Harold R. Solbrig¹, Dale Suesse¹, Cui Tao⁵, David P. Taylor, Les Westberg, Stephen Wu¹, Ning Zhuo, Christopher G. Chute¹ - Show less +28 more•Institutions (5)

Mayo Clinic¹, University of Colorado Boulder², Group Health Cooperative³, Harvard University⁴, University of Texas at Austin⁵

01 Dec 2013-Journal of the American Medical Informatics Association

TL;DR: A data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions is developed and demonstrated by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density cholesterol test result during the measurement year was <100 mg/dL.

...read moreread less

Proceedings Article•

Leveraging multi-domain prior knowledge in topic models

[...]

Zhiyuan Chen¹, Arjun Mukherjee¹, Bing Liu¹, Meichun Hsu², Malu Castellanos², Riddhiman Ghosh² - Show less +2 more•Institutions (2)

University of Illinois at Chicago¹, Hewlett-Packard²

03 Aug 2013

TL;DR: This paper proposes a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains, and its evaluation results will demonstrate its effectiveness.

...read moreread less

Abstract: Topic models have been widely used to identify topics in text corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. In this paper, we go one step further to study how the prior knowledge from other domains can be exploited to help topic modeling in the new domain. This problem setting is important from both the application and the learning perspectives because knowledge is inherently accumulative. We human beings gain knowledge gradually and use the old knowledge to help solve new problems. To achieve this objective, existing models have some major difficulties. In this paper, we propose a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains. Our evaluation results will demonstrate its effectiveness.

...read moreread less

Proceedings Article•DOI•

Knowledge discovery from massive healthcare claims data

[...]

Varun Chandola¹, Sreenivas R. Sukumar¹, Jack C. Schryver¹•Institutions (1)

Oak Ridge National Laboratory¹

11 Aug 2013

TL;DR: This paper translates the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describes how advances within each of these areas can be leveraged to understand the domain of healthcare.

...read moreread less

Abstract: he role of big data in addressing the needs of the present healthcare system in US and rest of the world has been echoed by government, private, and academic sectors. There has been a growing emphasis to explore the promise of big data analytics in tapping the potential of the massive healthcare data emanating from private and government health insurance providers. While the domain implications of such collaboration are well known, this type of data has been explored to a limited extent in the data mining community. The objective of this paper is two fold: first, we introduce the emerging domain of "big" healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. Specifically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. Each case study illustrates a unique intersection of data mining and healthcare with a common objective of improving the cost-care ratio by mining for opportunities to improve healthcare operations and reducing what seems to fall under fraud, waste, and abuse.

...read moreread less

Journal Article•DOI•

Understanding data fusion within the framework of coupled matrix and tensor factorizations

[...]

Evrim Acar¹, Morten Arendt Rasmussen¹, Francesco Savorani¹, Tormod Næs¹, Rasmus Bro¹ - Show less +1 more•Institutions (1)

University of Copenhagen¹

15 Nov 2013-Chemometrics and Intelligent Laboratory Systems

TL;DR: This paper forms data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode.

...read moreread less

Journal Article•DOI•

Outliers detection and classification in wireless sensor networks

[...]

Asmaa Fawzy¹, Hoda M. O. Mokhtar², Osman Hegazy²•Institutions (2)

Northern Borders University¹, Cairo University²

01 Jul 2013-Egyptian Informatics Journal

TL;DR: This work proposes a novel in-network knowledge discovery approach that provides outlier detection and data clustering simultaneously and shows that the proposed algorithm outperforms other techniques in both effectiveness and efficiency.

...read moreread less

Proceedings Article•

Exploiting Domain Knowledge in Aspect Extraction

[...]

Zhiyuan Chen¹, Arjun Mukherjee¹, Bing Liu¹, Meichun Hsu², Malu Castellanos², Riddhiman Ghosh² - Show less +2 more•Institutions (2)

University of Illinois at Chicago¹, Hewlett-Packard²

01 Oct 2013

TL;DR: A more advanced topic model, called MC-LDA (LDA with m-set and c-set), is proposed, which is based on an Extended generalized Polya urn (E-GPU) model (which is also proposed in this paper).

...read moreread less

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Polya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

...read moreread less

Proceedings Article•DOI•

Precise tweet classification and sentiment analysis

[...]

Rabia Batool¹, Asad Masood Khattak¹, Jahanzeb Maqbool², Sungyoung Lee¹•Institutions (2)

Kyung Hee University¹, Ajou University²

16 Jun 2013

TL;DR: The proposed methodology has performed better than the existing system in terms of tweets classification and sentiment analysis and the increase in information gain has enabled the proposed system to better summarize the twitter data for user sentiments regarding a keyword from a particular category.

...read moreread less

Abstract: The rise of social media in couple of years has changed the general perspective of networking, socialization, and personalization. Use of data from social networks for different purposes, such as election prediction, sentimental analysis, marketing, communication, business, and education, is increasing day by day. Precise extraction of valuable information from short text messages posted on social media (Twitter) is a collaborative task. In this paper, we analyze tweets to classify data and sentiments from Twitter more precisely. The information from tweets are extracted using keyword based knowledge extraction. Moreover, the extracted knowledge is further enhanced using domain specific seed based enrichment technique. The proposed methodology facilitates the extraction of keywords, entities, synonyms, and parts of speech from tweets which are then used for tweets classification and sentimental analysis. The proposed system is tested on a collection of 40,000 tweets. The proposed methodology has performed better than the existing system in terms of tweets classification and sentiment analysis. By applying the Knowledge Enhancer and Synonym Binder module on the extracted information we have achieved increase in information gain in a range of 0.1% to 55%. The increase in information gain has enabled our proposed system to better summarize the twitter data for user sentiments regarding a keyword from a particular category.

...read moreread less

Journal Article•DOI•

Incremental approaches for updating approximations in set-valued ordered information systems

[...]

Chuan Luo¹, Tianrui Li¹, Hongmei Chen¹, Dun Liu¹•Institutions (1)

Southwest Jiaotong University¹

01 Sep 2013-Knowledge Based Systems

TL;DR: Two incremental algorithms for updating the approximations in disjunctive/conjunctive set-valued information systems are proposed and results indicate the incremental approaches significantly outperform non-incremental approaches with a dramatic reduction in the computational speed.

...read moreread less

Abstract: Incremental learning is an efficient technique for knowledge discovery in a dynamic database, which enables acquiring additional knowledge from new data without forgetting prior knowledge. Rough set theory has been successfully used in information systems for classification analysis. Set-valued information systems are generalized models of single-valued information systems, which can be classified into two categories: disjunctive and conjunctive. Approximations are fundamental concepts of rough set theory, which need to be updated incrementally while the object set varies over time in the set-valued information systems. In this paper, we analyze the updating mechanisms for computing approximations with the variation of the object set. Two incremental algorithms for updating the approximations in disjunctive/conjunctive set-valued information systems are proposed, respectively. Furthermore, extensive experiments are carried out on several data sets to verify the performance of the proposed algorithms. The results indicate the incremental approaches significantly outperform non-incremental approaches with a dramatic reduction in the computational speed.

...read moreread less

Journal Article•DOI•

A service oriented architecture to provide data mining services for non-expert data miners

[...]

Marta E. Zorrilla¹, Diego García-Saiz¹•Institutions (1)

University of Cantabria¹

01 Apr 2013

TL;DR: A data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service, whose main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process.

...read moreread less

Abstract: In today's competitive market, companies need to use discovery knowledge techniques to make better, more informed decisions. But these techniques are out of the reach of most users as the knowledge discovery process requires an incredible amount of expertise. Additionally, business intelligence vendors are moving their systems to the cloud in order to provide services which offer companies cost-savings, better performance and faster access to new applications. This work joins both facets. It describes a data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service. Its main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process.

...read moreread less

Journal Article•DOI•

Toward an efficient and scalable feature selection approach for internet traffic classification

[...]

Adil Fahad¹, Zahir Tari¹, Ibrahim Khalil¹, Ibrahim Habib², Hussein Alnuweiri³ - Show less +1 more•Institutions (3)

RMIT University¹, City University of New York², Qatar Airways³

01 Jun 2013-Computer Networks

TL;DR: A novel way is proposed to identify efficiently and accurately the ''best'' features by first combining the results of some well-known FS techniques to find consistent features, and then using the proposed concept of support to select a smallest set of features and cover data optimality.

...read moreread less

Collapse