Showing papers on "Knowledge extraction published in 2015"

PDF

Open Access

Journal Article•DOI•

Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions

[...]

Wei Shen¹, Jianyong Wang², Jiawei Han³•Institutions (3)

Nankai University¹, Tsinghua University², University of Illinois at Urbana–Champaign³

01 Feb 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A thorough overview and analysis of the main approaches to entity linking is presented, and various applications, the evaluation of entity linking systems, and future directions are discussed.

...read moreread less

Abstract: The large number of potential applications from bridging web data with knowledge bases have led to an increase in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledge base. Potential applications include information extraction, information retrieval, and knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this survey, we present a thorough overview and analysis of the main approaches to entity linking, and discuss various applications, the evaluation of entity linking systems, and future directions.

...read moreread less

702 citations

Journal Article•DOI•

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

[...]

Isaac Triguero¹, Salvador García², Francisco Herrera¹•Institutions (2)

University of Granada¹, University of Jaén²

01 Feb 2015-Knowledge and Information Systems

TL;DR: This paper provides a survey of self-labeled methods for semi-supervised classification and proposes a taxonomy based on the main characteristics presented in them, aiming to measure their performance in terms of transductive and inductive classification capabilities.

...read moreread less

Abstract: Semi-supervised classification methods are suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. This problem has been addressed by several approaches with different assumptions about the characteristics of the input data. Among them, self-labeled techniques follow an iterative procedure, aiming to obtain an enlarged labeled data set, in which they accept that their own predictions tend to be correct. In this paper, we provide a survey of self-labeled methods for semi-supervised classification. From a theoretical point of view, we propose a taxonomy based on the main characteristics presented in them. Empirically, we conduct an exhaustive study that involves a large number of data sets, with different ratios of labeled data, aiming to measure their performance in terms of transductive and inductive classification capabilities. The results are contrasted with nonparametric statistical tests. Note is then taken of which self-labeled models are the best-performing ones. Moreover, a semi-supervised learning module has been developed for the Knowledge Extraction based on Evolutionary Learning software, integrating analyzed methods and data sets.

...read moreread less

457 citations

Proceedings Article•DOI•

MLaaS: Machine Learning as a Service

[...]

Mauro Ribeiro¹, Katarina Grolinger¹, Miriam A. M. Capretz¹•Institutions (1)

University of Western Ontario¹

01 Dec 2015

TL;DR: This paper proposes an architecture to create a flexible and scalable machine learning as a service, using real-world sensor and weather data by running different algorithms at the same time.

...read moreread less

Abstract: The demand for knowledge extraction has been increasing. With the growing amount of data being generated by global data sources (e.g., social media and mobile apps) and the popularization of context-specific data (e.g., the Internet of Things), companies and researchers need to connect all these data and extract valuable information. Machine learning has been gaining much attention in data mining, leveraging the birth of new solutions. This paper proposes an architecture to create a flexible and scalable machine learning as a service. An open source solution was implemented and presented. As a case study, a forecast of electricity demand was generated using real-world sensor and weather data by running different algorithms at the same time.

...read moreread less

281 citations

Journal Article•DOI•

Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector

[...]

Parneet Kaur¹, Manpreet Singh², Gurpreet Singh Josan³•Institutions (3)

Punjab Technical University¹, Guru Nanak Dev Engineering College, Ludhiana², Punjabi University³

01 Jan 2015-Procedia Computer Science

TL;DR: This paper focus on identifying the slow learners among students and displaying it by a predictive data mining model using classification based algorithms and a knowledge flow model is also shown among all five classifiers.

...read moreread less

212 citations

Journal Article•DOI•

Data Mining in Healthcare – A Review

[...]

Neesha Jothi¹, Nur'Aini Abdul Rashid¹, Wahidah Husain¹•Institutions (1)

Universiti Sains Malaysia¹

01 Jan 2015-Procedia Computer Science

TL;DR: This review paper has consolidated the papers reviewed inline to the disciplines, model, tasks and methods involved in data mining in terms of method, algorithms and results.

...read moreread less

209 citations

Journal Article•DOI•

Innovative information visualization of electronic health record data: a systematic review

[...]

Vivian West¹, David Borland², W. Ed Hammond¹•Institutions (2)

Duke University¹, University of North Carolina at Chapel Hill²

01 Mar 2015-Journal of the American Medical Informatics Association

TL;DR: This study investigates the use of visualization techniques reported between 1996 and 2013 and evaluates innovative approaches to information visualization of electronic health record (EHR) data for knowledge discovery.

...read moreread less

202 citations

Journal Article•DOI•

A framework for knowledge discovery in massive building automation data and its application in building diagnostics

[...]

Cheng Fan¹, Fu Xiao¹, Chengchu Yan¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Feb 2015-Automation in Construction

TL;DR: A generic framework for knowledge discovery in massive BAS data using DM techniques is presented, specifically designed considering the low quality and complexity of BAS data, the diversity of advanced DM techniques, as well as the integration of knowledge discovered by DM techniques and domain knowledge in the building field.

...read moreread less

176 citations

Proceedings Article•DOI•

Semantic data mining: A survey of ontology-based approaches

[...]

Dejing Dou¹, Hao Wang¹, Haishan Liu¹•Institutions (1)

University of Oregon¹

01 Feb 2015

TL;DR: This survey paper investigates why ontology has the potential to help semantic data mining and how formal semantics in ontologies can be incorporated into the data mining process.

...read moreread less

Abstract: Semantic Data Mining refers to the data mining tasks that systematically incorporate domain knowledge, especially formal semantics, into the process. In the past, many research efforts have attested the benefits of incorporating domain knowledge in data mining. At the same time, the proliferation of knowledge engineering has enriched the family of domain knowledge, especially formal semantics and Semantic Web ontologies. Ontology is an explicit specification of conceptualization and a formal way to define the semantics of knowledge and data. The formal structure of ontology makes it a nature way to encode domain knowledge for the data mining use. In this survey paper, we introduce general concepts of semantic data mining. We investigate why ontology has the potential to help semantic data mining and how formal semantics in ontologies can be incorporated into the data mining process. We provide detail discussions for the advances and state of art of ontology-based approaches and an introduction of approaches that are based on other form of knowledge representations.

...read moreread less

157 citations

Journal Article•DOI•

Toward a Literature-Driven Definition of Big Data in Healthcare

[...]

E. Baro¹, Samuel Degoul¹, Régis Beuscart¹, Emmanuel Chazard¹•Institutions (1)

university of lille¹

02 Jun 2015-BioMed Research International

TL;DR: A definition of big data in healthcare is proposed, defined by volume, which can be defined as datasets with Log⁡(n∗p) ≥ 7 and its great variety and high velocity.

...read moreread less

Abstract: Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals and the number of variables for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with . Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.

...read moreread less

152 citations

Posted Content•

The Anatomy of Big Data Computing

[...]

Raghavendra Kune¹, Pramod Kumar Konugurthi¹, Arun Agarwal², Raghavendra Rao Chillarige², Rajkumar Buyya³ - Show less +1 more•Institutions (3)

Department of Space¹, University UCINF², University of Melbourne³

04 Sep 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The evolution of big data computing, differences between traditional data warehousing and big data, taxonomy ofbig data computing and underpinning technologies, integrated platform of bigdata and clouds known as big data clouds, layered architecture and components of bigData cloud, and finally open‐technical challenges and future directions are discussed.

...read moreread less

Abstract: Advances in information technology and its widespread growth in several areas of business, engineering, medical and scientific studies are resulting in information/data explosion. Knowledge discovery and decision making from such rapidly growing voluminous data is a challenging task in terms of data organization and processing, which is an emerging trend known as Big Data Computing; a new paradigm which combines large scale compute, new data intensive techniques and mathematical models to build data analytics. Big Data computing demands a huge storage and computing for data curation and processing that could be delivered from on-premise or clouds infrastructures. This paper discusses the evolution of Big Data computing, differences between traditional data warehousing and Big Data, taxonomy of Big Data computing and underpinning technologies, integrated platform of Big Data and Clouds known as Big Data Clouds, layered architecture and components of Big Data Cloud and finally discusses open technical challenges and future directions.

...read moreread less

148 citations

Journal Article•DOI•

Automated daily pattern filtering of measured building performance data

[...]

Clayton Miller¹, Zoltan Nagy¹, Arno Schlueter¹•Institutions (1)

ETH Zurich¹

01 Jan 2015-Automation in Construction

TL;DR: In this paper, a day-typing process that uses Symbolic Aggregate approXimation (SAX), motif and discord extraction, and clustering to detect the underlying structure of building performance data is presented.

...read moreread less

Proceedings Article•DOI•

VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases

[...]

Fereshteh Sadeghi¹, Santosh K. Divvala², Ali Farhadi¹•Institutions (2)

University of Washington¹, Allen Institute for Artificial Intelligence²

07 Jun 2015

TL;DR: This work introduces the problem of visual verification of relation phrases and developed a Visual Knowledge Extraction system called VisKE, which has been used to not only enrich existing textual knowledge bases by improving their recall, but also augment open-domain question-answer reasoning.

...read moreread less

Abstract: How can we know whether a statement about our world is valid. For example, given a relationship between a pair of entities e.g., ‘eat(horse, hay)’, how can we know whether this relationship is true or false in general. Gathering such knowledge about entities and their relationships is one of the fundamental challenges in knowledge extraction. Most previous works on knowledge extraction have focused purely on text-driven reasoning for verifying relation phrases. In this work, we introduce the problem of visual verification of relation phrases and developed a Visual Knowledge Extraction system called VisKE. Given a verb-based relation phrase between common nouns, our approach assess its validity by jointly analyzing over text and images and reasoning about the spatial consistency of the relative configurations of the entities and the relation involved. Our approach involves no explicit human supervision thereby enabling large-scale analysis. Using our approach, we have already verified over 12000 relation phrases. Our approach has been used to not only enrich existing textual knowledge bases by improving their recall, but also augment open-domain question-answer reasoning.

...read moreread less

Journal Article•DOI•

Fuzzy-rough feature selection accelerator

[...]

Yuhua Qian¹, Qi Wang², Honghong Cheng², Jiye Liang², Chuangyin Dang¹ - Show less +1 more•Institutions (2)

City University of Hong Kong¹, Shanxi University²

01 Jan 2015-Fuzzy Sets and Systems

TL;DR: Through the use of the accelerator, three representative heuristic fuzzy-rough feature selection algorithms have been enhanced and it is shown that these modified algorithms are much faster than their original counterparts.

...read moreread less

Journal Article•DOI•

Temporal knowledge discovery in big BAS data for building energy management

[...]

Cheng Fan¹, Fu Xiao¹, Henrik Madsen, Dan Wang¹•Institutions (1)

Hong Kong Polytechnic University¹

15 Dec 2015-Energy and Buildings

TL;DR: A time series data mining methodology for temporal knowledge discovery in big BAS data to identify dynamics, patterns and anomalies in building operations, derive temporal association rules within and between subsystems, assess building system performance and spot opportunities in energy conservation.

...read moreread less

Journal Article•DOI•

Rough set-based approaches for discretization: a compact review

[...]

Rahman Ali¹, Muhammad Hameed Siddiqi¹, Sungyoung Lee¹•Institutions (1)

Kyung Hee University¹

01 Aug 2015-Artificial Intelligence Review

TL;DR: A systematic study of the rough set-based discretization techniques found in the literature and categorizes them into a taxonomy that provides a useful roadmap for new researchers in the area of RSBD.

...read moreread less

Abstract: The extraction of knowledge from a huge volume of data using rough set methods requires the transformation of continuous value attributes to discrete intervals. This paper presents a systematic study of the rough set-based discretization (RSBD) techniques found in the literature and categorizes them into a taxonomy. In the literature, no review is solely based on RSBD. Only a few rough set discretizers have been studied, while many new developments have been overlooked and need to be highlighted. Therefore, this study presents a formal taxonomy that provides a useful roadmap for new researchers in the area of RSBD. The review also elaborates the process of RSBD with the help of a case study. The study of the existing literature focuses on the techniques adapted in each article, the comparison of these with other similar approaches, the number of discrete intervals they produce as output, their effects on classification and the application of these techniques in a domain. The techniques adopted in each article have been considered as the foundation for the taxonomy. Moreover, a detailed analysis of the existing discretization techniques has been conducted while keeping the concept of RSBD applications in mind. The findings are summarized and presented in this paper.

...read moreread less

Journal Article•DOI•

PKDE4J: Entity and relation extraction for public knowledge discovery.

[...]

Min Song¹, Won Chul Kim¹, Dahee Lee¹, Go Eun Heo¹, Keun Young Kang¹ - Show less +1 more•Institutions (1)

Yonsei University¹

01 Oct 2015-Journal of Biomedical Informatics

TL;DR: In this paper, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework is presented, which has fairly good performance in terms of accuracy as well as the ability to configure text processing components.

...read moreread less

Journal Article•DOI•

A rule-based approach to emotion cause detection for Chinese micro-blogs

[...]

Kai Gao¹, Hua Xu², Jiushuo Wang¹•Institutions (2)

Hebei University of Science and Technology¹, Tsinghua University²

01 Jun 2015-Expert Systems With Applications

TL;DR: A rule-based system underlying the conditions that trigger emotions based on an emotional model, based on Bayesian probability is proposed and the experimental results validate the feasibility of the approach.

...read moreread less

Abstract: We develop a rule-based system that trigger emotions based on the emotional model.We extract the corresponding cause events in fine-grained emotions.We get the proportions of different cause components under different emotions.The language features and Bayesian probability are used in this paper. Emotion analysis and emotion cause extraction are key research tasks in natural language processing and public opinion mining. This paper presents a rule-based approach to emotion cause component detection for Chinese micro-blogs. Our research has important scientific values on social network knowledge discovery and data mining. It also has a great potential in analyzing the psychological processes of consumers. Firstly, this paper proposes a rule-based system underlying the conditions that trigger emotions based on an emotional model. Secondly, this paper extracts the corresponding cause events in fine-grained emotions from the results of events, actions of agents and aspects of objects. Meanwhile, it is reasonable to get the proportions of different cause components under different emotions by constructing the emotional lexicon and identifying different linguistic features, and the proposed approach is based on Bayesian probability. Finally, this paper presents the experiments on an emotion corpus of Chinese micro-blogs. The experimental results validate the feasibility of the approach. The existing problems and the further works are also present at the end.

...read moreread less

Journal Article•DOI•

Data mining and linked open data – New perspectives for data analysis in environmental research

[...]

Angela Lausch¹, Andreas Schmidt¹, Lutz Tischendorf¹•Institutions (1)

Helmholtz Centre for Environmental Research - UFZ¹

10 Jan 2015-Ecological Modelling

TL;DR: This paper aims to encourage those research scientists which do not have extensive programming and data mining knowledge to take advantage of existing data mining tools, to embrace classical data mining and LOD approaches in support of gaining more insight and recognizing patterns in highly complex data sets.

...read moreread less

Journal Article•DOI•

ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.

[...]

Ryan J. Urbanowicz¹, Jason H. Moore¹•Institutions (1)

University of Pennsylvania¹

03 Apr 2015-Evolutionary Intelligence

TL;DR: Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets.

...read moreread less

Abstract: Algorithmic scalability is a major concern for any machine learning strategy in this age of ‘big data’. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExSTraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExSTraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExSTraCS usability was made simpler through the elimination of previously critical run parameters.

...read moreread less

Journal Article•DOI•

A Parallel Matrix-Based Method for Computing Approximations in Incomplete Information Systems

[...]

Junbo Zhang¹, Jian-Syuan Wong², Yi Pan², Tianrui Li¹•Institutions (2)

Southwest Jiaotong University¹, Georgia State University²

01 Feb 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Three different parallel matrix-based methods are introduced to process large-scale, incomplete data and are built on MapReduce and implemented on Twister that is a lightweight Map Reduce runtime system.

...read moreread less

Abstract: As the volume of data grows at an unprecedented rate, large-scale data mining and knowledge discovery present a tremendous challenge. Rough set theory, which has been used successfully in solving problems in pattern recognition, machine learning, and data mining, centers around the idea that a set of distinct objects may be approximated via a lower and upper bound. In order to obtain the benefits that rough sets can provide for data mining and related tasks, efficient computation of these approximations is vital. The recently introduced cloud computing model, MapReduce, has gained a lot of attention from the scientific community for its applicability to large-scale data analysis. In previous research, we proposed a MapReduce-based method for computing approximations in parallel, which can efficiently process complete data but fails in the case of missing (incomplete) data. To address this shortcoming, three different parallel matrix-based methods are introduced to process large-scale, incomplete data. All of them are built on MapReduce and implemented on Twister that is a lightweight MapReduce runtime system. The proposed parallel methods are then experimentally shown to be efficient for processing large-scale data.

...read moreread less

Journal Article•DOI•

Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery.

[...]

Hugo López-Fernández¹, Hugo M. Santos², José Luis Capelo², Florentino Fdez-Riverola¹, Daniel Glez-Peña¹, Miguel Reboiro-Jato¹ - Show less +2 more•Institutions (2)

University of Vigo¹, Universidade Nova de Lisboa²

05 Oct 2015-BMC Bioinformatics

TL;DR: Mass-Up brings knowledge discovery within reach of MALDI-TOF-MS researchers by allowing data preprocessing, as well as subsequent analysis including biomarker discovery, clustering, biclustering and three-dimensional PCA visualization.

...read moreread less

Abstract: Mass spectrometry is one of the most important techniques in the field of proteomics. MALDI-TOF mass spectrometry has become popular during the last decade due to its high speed and sensitivity for detecting proteins and peptides. MALDI-TOF-MS can be also used in combination with Machine Learning techniques and statistical methods for knowledge discovery. Although there are many software libraries and tools that can be combined for these kind of analysis, there is still a need for all-in-one solutions with graphical user-friendly interfaces and avoiding the need of programming skills. Mass-Up, an open software multiplatform application for MALDI-TOF-MS knowledge discovery is herein presented. Mass-Up software allows data preprocessing, as well as subsequent analysis including (i) biomarker discovery, (ii) clustering, (iii) biclustering, (iv) three-dimensional PCA visualization and (v) classification of large sets of spectra data. Mass-Up brings knowledge discovery within reach of MALDI-TOF-MS researchers. Mass-Up is distributed under license GPLv3 and it is open and free to all users at http://sing.ei.uvigo.es/mass-up .

...read moreread less

Journal Article•DOI•

Knowledge reduction in formal contexts using non-negative matrix factorization

[...]

Aswani Kumar Ch.¹, Sérgio M. Dias, Newton J. Vieira²•Institutions (2)

VIT University¹, Universidade Federal de Minas Gerais²

01 Mar 2015-Mathematics and Computers in Simulation

TL;DR: The objective of this paper is to investigate the knowledge reduction in FCA and propose a method based on Non-Negative Matrix Factorization (NMF) for addressing the issue.

...read moreread less

Book Chapter•DOI•

Artificial Intelligence Techniques in Human Resource Management—A Conceptual Exploration

[...]

Stefan Strohmeier¹, Franca Piazza¹•Institutions (1)

Saarland University¹

01 Jan 2015

TL;DR: This chapter offers a first exploration of the general potential of Artificial Intelligence Techniques in Human Resource Management and a brief foundation elaborates on the central functionalities of Artificial intelligence Techniques and the central requirements of Human resource Management based on the task-technology fit approach.

...read moreread less

Abstract: Artificial Intelligence Techniques and its subset, Computational Intelligence Techniques, are not new to Human Resource Management, and since their introduction, a heterogeneous set of suggestions on how to use Artificial Intelligence and Computational Intelligence in Human Resource Management has accumulated. While such contributions offer detailed insights into specific application possibilities, an overview of the general potential is missing. Therefore, this chapter offers a first exploration of the general potential of Artificial Intelligence Techniques in Human Resource Management . To this end, a brief foundation elaborates on the central functionalities of Artificial Intelligence Techniques and the central requirements of Human Resource Management based on the task-technology fit approach. Based on this, the potential of Artificial Intelligence in Human Resource Management is explored in six selected scenarios (turnover prediction with artificial neural networks , candidate search with knowledge-based search engines, staff rostering with genetic algorithms , HR sentiment analysis with text mining , resume data acquisition with information extraction and employee self-service with interactive voice response ). The insights gained based on the foundation and exploration are discussed and summarized.

...read moreread less

Book Chapter•DOI•

Building and Using a Knowledge Graph to Combat Human Trafficking

[...]

Pedro Szekely¹, Craig A. Knoblock¹, Jason Slepicka¹, Andrew Philpot¹, Amandeep Singh¹, Chengye Yin¹, Dipsy Kapoor¹, Prem Natarajan¹, Daniel Marcu¹, Kevin Knight¹, David Stallard¹, Subessware S. Karunamoorthy¹, Rajagopal Bojanapalli¹, Steven Minton, Brian Amanatullah, Todd Hughes, Mike Tamayo, David Flynt, Rachel Artiss, Shih-Fu Chang², Tao Chen², Gerald Hiebel, Lidia Silva Ferreira³ - Show less +19 more•Institutions (3)

University of Southern California¹, Columbia University², Universidade Federal de Minas Gerais³

11 Oct 2015

TL;DR: This paper presents an approach to building knowledge graphs by exploiting semantic technologies to reconcile the data continuously crawled from diverse sources, to scale to billions of triples extracted from the crawled content, and to support interactive queries on the data.

...read moreread less

Abstract: There is a huge amount of data spread across the web and stored in databases that we can use to build knowledge graphs. However, exploiting this data to build knowledge graphs is difficult due to the heterogeneity of the sources, scale of the amount of data, and noise in the data. In this paper we present an approach to building knowledge graphs by exploiting semantic technologies to reconcile the data continuously crawled from diverse sources, to scale to billions of triples extracted from the crawled content, and to support interactive queries on the data. We applied our approach, implemented in the DIG system, to the problem of combating human trafficking and deployed it to six law enforcement agencies and several non-governmental organizations to assist them with finding traffickers and helping victims.

...read moreread less

Journal Article•DOI•

RRW—A Robust and Reversible Watermarking Technique for Relational Data

[...]

Saman Iftikhar¹, Muhammad Kamran², Zahid Anwar¹•Institutions (2)

University of the Sciences¹, COMSATS Institute of Information Technology²

01 Apr 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Experimental studies prove the effectiveness of RRW against malicious attacks and show that the proposed technique outperforms existing ones.

...read moreread less

Abstract: Advancement in information technology is playing an increasing role in the use of information systems comprising relational databases. These databases are used effectively in collaborative environments for information extraction; consequently, they are vulnerable to security threats concerning ownership rights and data tampering. Watermarking is advocated to enforce ownership rights over shared relational data and for providing a means for tackling data tampering. When ownership rights are enforced using watermarking, the underlying data undergoes certain modifications; as a result of which, the data quality gets compromised. Reversible watermarking is employed to ensure data quality along-with data recovery. However, such techniques are usually not robust against malicious attacks and do not provide any mechanism to selectively watermark a particular attribute by taking into account its role in knowledge discovery. Therefore, reversible watermarking is required that ensures; (i) watermark encoding and decoding by accounting for the role of all the features in knowledge discovery; and, (ii) original data recovery in the presence of active malicious attacks. In this paper, a robust and semi-blind reversible watermarking (RRW) technique for numerical relational data has been proposed that addresses the above objectives. Experimental studies prove the effectiveness of RRW against malicious attacks and show that the proposed technique outperforms existing ones.

...read moreread less

Journal Article•DOI•

The Impact of Driving Styles on Fuel Consumption: A Data-Warehouse-and-Data-Mining-Based Discovery Process

[...]

João Ferreira¹, Jose de Almeida, Alberto Rodrigues da Silva•Institutions (1)

University of Minho¹

20 Apr 2015-IEEE Transactions on Intelligent Transportation Systems

TL;DR: These findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day.

...read moreread less

Abstract: This paper discusses the results of applied research on the eco-driving domain based on a huge data set produced from a fleet of Lisbon's public transportation buses for a three-year period. This data set is based on events automatically extracted from the control area network bus and enriched with GPS coordinates, weather conditions, and road information. We apply online analytical processing (OLAP) and knowledge discovery (KD) techniques to deal with the high volume of this data set and to determine the major factors that influence the average fuel consumption, and then classify the drivers involved according to their driving efficiency. Consequently, we identify the most appropriate driving practices and styles. Our findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day. These findings have been strongly considered in the drivers' training sessions.

...read moreread less

Journal Article•DOI•

A data-mining-based methodology to support MV electricity customers’ characterization

[...]

Sergio Ramos¹, João Duarte¹, F. Jorge F. Duarte¹, Zita Vale¹•Institutions (1)

International Student Exchange Programs¹

15 Mar 2015-Energy and Buildings

TL;DR: An electricity medium voltage (MV) customer characterization framework supported by knowledge discovery in database (KDD) is presented and a rule set for the automatic classification of new consumers is developed.

...read moreread less

Book Chapter•DOI•

Explaining and Suggesting Relatedness in Knowledge Graphs

[...]

Giuseppe Pirrò¹•Institutions (1)

Indian Council of Agricultural Research¹

11 Oct 2015

TL;DR: The notion of relatedness explanation is formalized and different criteria are introduced to build explanations based on information-theory, diversity and their combinations to harness knowledge available in a variety of KGs.

...read moreread less

Abstract: Knowledge graphs KGs are a key ingredient for searching, browsing and knowledge discovery activities. Motivated by the need to harness knowledge available in a variety of KGs, we face the following two problems. First, given a pair of entities defined in some KG, find an explanation of their relatedness. We formalize the notion of relatedness explanation and introduce different criteria to build explanations based on information-theory, diversity and their combinations. Second, given a pair of entities, find other pairs of entities sharing a similar relatedness perspective. We describe an implementation of our ideas in a tool, called RECAP, which is based on RDF and SPARQL. We provide an evaluation of RECAP and a comparison with related systems on real-world data.

...read moreread less

Journal Article•DOI•

Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values

[...]

Chuan Luo¹, Tianrui Li², Hongmei Chen², Lixia Lu•Institutions (2)

University of Regina¹, Southwest Jiaotong University²

01 Apr 2015-Information Sciences

TL;DR: This paper presents the updating properties for dynamic maintenance of approximations when the criteria values in the set-valued decision system evolve with time, and proposes two incremental algorithms corresponding to the addition and removal of criteria values.

...read moreread less

Journal Article•DOI•

From Patterns in Data to Knowledge Discovery: What Data Mining Can Do☆

[...]

Francesco Gullo¹•Institutions (1)

Yahoo!¹

01 Jan 2015-Physics Procedia

TL;DR: A high-level overview of the most prominent tasks and methods that form the basis of data mining is provided.

...read moreread less

Collapse