Showing papers on "Knowledge extraction published in 2017"

PDF

Open Access

Journal Article•DOI•

Knowledge Graph Embedding: A Survey of Approaches and Applications

[...]

Quan Wang¹, Zhendong Mao¹, Bin Wang¹, Li Guo¹•Institutions (1)

01 Dec 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This article provides a systematic review of existing techniques of Knowledge graph embedding, including not only the state-of-the-arts but also those with latest trends, based on the type of information used in the embedding task.

...read moreread less

Abstract: Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

...read moreread less

1,905 citations

Journal Article•DOI•

Data Mining and Analytics in the Process Industry: The Role of Machine Learning

[...]

Zhiqiang Ge¹, Zhihuan Song¹, Steven X. Ding², Biao Huang³•Institutions (3)

Zhejiang University¹, University of Duisburg-Essen², University of Alberta³

26 Sep 2017-IEEE Access

TL;DR: The state-of-the-art of data mining and analytics are reviewed through eight unsupervisedLearning and ten supervised learning algorithms, as well as the application status of semi-supervised learning algorithms.

...read moreread less

Abstract: Data mining and analytics have played an important role in knowledge discovery and decision making/supports in the process industry over the past several decades. As a computational engine to data mining and analytics, machine learning serves as basic tools for information extraction, data pattern recognition and predictions. From the perspective of machine learning, this paper provides a review on existing data mining and analytics applications in the process industry over the past several decades. The state-of-the-art of data mining and analytics are reviewed through eight unsupervised learning and ten supervised learning algorithms, as well as the application status of semi-supervised learning algorithms. Several perspectives are highlighted and discussed for future researches on data mining and analytics in the process industry.

...read moreread less

657 citations

Journal Article•DOI•

A survey on data preprocessing for data stream mining

[...]

Sergio Ramrez-Gallego¹, Bartosz Krawczyk², Salvador Garca¹, Micha Woniak, Francisco Herrera³ - Show less +1 more•Institutions (3)

University of Granada¹, Virginia Commonwealth University², King Abdulaziz University³

24 May 2017-Neurocomputing

TL;DR: This survey summarizes, categorize and analyze those contributions on data preprocessing that cope with streaming data, and takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization).

...read moreread less

342 citations

Proceedings Article•DOI•

Data mining techniques and applications — A decade review

[...]

Hussain Ahmad Madni¹, Zahid Anwar², Munam Ali Shah¹•Institutions (2)

COMSATS Institute of Information Technology¹, Bahria University²

01 Sep 2017

TL;DR: This paper reviews data mining techniques and its applications such as educational data mining (EDM), finance, commerce, life sciences and medical etc, and group existing approaches to determine how the data mining can be used in different fields.

...read moreread less

Abstract: Data mining is also known as Knowledge Discovery in Database (KDD). It is also defined as the process which includes extracting the interesting, interpretable and useful information from the raw data. There are different sources that generate raw data in very large amount. This is the main reason the applications of data mining are increasing rapidly. This paper reviews data mining techniques and its applications such as educational data mining (EDM), finance, commerce, life sciences and medical etc. We group existing approaches to determine how the data mining can be used in different fields. Our categorization specifically focuses on the research that has been published over the period 2007–2017. With this categorization, we present an easy and concise view of different models adapted in the data mining.

...read moreread less

235 citations

Journal Article•DOI•

Privacy-Preserving Data Mining: Methods, Metrics, and Applications

[...]

Ricardo Mendes¹, Joao P. Vilela¹•Institutions (1)

University of Coimbra¹

16 Jun 2017-IEEE Access

TL;DR: The most relevant PPDM techniques from the literature and the metrics used to evaluate such techniques are surveyed and typical applications of PPDD methods in relevant fields are presented.

...read moreread less

Abstract: The collection and analysis of data are continuously growing due to the pervasiveness of computing devices. The analysis of such information is fostering businesses and contributing beneficially to the society in many different fields. However, this storage and flow of possibly sensitive data poses serious privacy concerns. Methods that allow the knowledge extraction from data, while preserving privacy, are known as privacy-preserving data mining (PPDM) techniques. This paper surveys the most relevant PPDM techniques from the literature and the metrics used to evaluate such techniques and presents typical applications of PPDM methods in relevant fields. Furthermore, the current challenges and open issues in PPDM are discussed.

...read moreread less

223 citations

Book Chapter•DOI•

CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System

[...]

Bo Xu¹, Yong Xu¹, Jiaqing Liang¹, Chenhao Xie¹, Bin Liang¹, Wanyun Cui¹, Yanghua Xiao¹ - Show less +3 more•Institutions (1)

Fudan University¹

27 Jun 2017

TL;DR: A never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated, and reduces the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model.

...read moreread less

Abstract: Great efforts have been dedicated to harvesting knowledge bases from online encyclopedias These knowledge bases play important roles in enabling machines to understand texts However, most current knowledge bases are in English and non-English knowledge bases, especially Chinese ones, are still very rare Many previous systems that extract knowledge from online encyclopedias, although are applicable for building a Chinese knowledge base, still suffer from two challenges The first is that it requires great human efforts to construct an ontology and build a supervised knowledge extraction model The second is that the update frequency of knowledge bases is very slow To solve these challenges, we propose a never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated Specially, we reduce the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model We further propose a smart active update strategy to keep the freshness of our knowledge base with little human costs The 164 million API calls of the published services justify the success of our system

...read moreread less

197 citations

Journal Article•DOI•

Big data technologies and Management

[...]

Veda C. Storey¹, Il-Yeol Song²•Institutions (2)

J. Mack Robinson College of Business¹, Drexel University²

01 Mar 2017

TL;DR: The five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs ofbig data initiatives.

...read moreread less

Abstract: The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. The role of conceptual modeling for big data is then analyzed and suggestions made for effective conceptual modeling efforts with respect to big data.

...read moreread less

197 citations

Book Chapter•DOI•

Big Data and Urban Informatics: Innovations and Challenges to Urban Planning and Knowledge Discovery

[...]

Piyushimita Thakuriah¹, Nebiyou Tilahun², Moira Zellner²•Institutions (2)

University of Glasgow¹, University of Illinois at Chicago²

01 Jan 2017

TL;DR: The objective of this background paper is to describe emerging sources of Big Data, their use in urban research, and the challenges that arise with their use.

...read moreread less

Abstract: Big Data is the term being used to describe a wide spectrum of observational or “naturally-occurring” data generated through transactional, operational, planning and social activities that are not specifically designed for research. Due to the structure and access conditions associated with such data, their use for research and analysis becomes significantly complicated. New sources of Big Data are rapidly emerging as a result of technological, institutional, social, and business innovations. The objective of this background paper is to describe emerging sources of Big Data, their use in urban research, and the challenges that arise with their use. To a certain extent, Big Data in the urban context has become narrowly associated with sensor (e.g., Internet of Things) or socially generated (e.g., social media or citizen science) data. However, there are many other sources of observational data that are meaningful to different groups of urban researchers and user communities. Examples include privately held transactions data, confidential administrative micro-data, data from arts and humanities collections, and hybrid data consisting of synthetic or linked data.

...read moreread less

168 citations

Journal Article•DOI•

An insight into imbalanced Big Data classification: outcomes and challenges

[...]

Alberto Fernández¹, Sara del Río¹, Nitesh V. Chawla², Francisco Herrera¹•Institutions (2)

University of Granada¹, University of Notre Dame²

01 Jun 2017-Complex & Intelligent Systems

TL;DR: The first outcomes for imbalanced classification in Big Data problems are presented, introducing the current research state of this area and analyzing the behavior of standard pre-processing techniques in this particular framework.

...read moreread less

Abstract: Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.

...read moreread less

165 citations

Journal Article•DOI•

Data mining methods for knowledge discovery in multi-objective optimization

[...]

Sunith Bandaru, Amos H. C. Ng, Kalyanmoy Deb¹•Institutions (1)

Michigan State University¹

15 Mar 2017-Expert Systems With Applications

TL;DR: Overall, the unsupervised rules generated by flexible pattern mining are found to be the most consistent, whereas the supervised rules from classification trees are the most sensitive to user-preferences.

...read moreread less

Abstract: Four methods are developed for data mining discrete multi-objective optimization datasets.Two of the methods are unsupervised, one is supervised and the other is hybrid.Knowledge is represented as patterns in one method, and as rules in other methods.Methods are applied to three real-world production system optimization problems.Extracted knowledge is compared across methods and provides new insights. The first part of this paper served as a comprehensive survey of data mining methods that have been used to extract knowledge from solutions generated during multi-objective optimization. The current paper addresses three major shortcomings of existing methods, namely, lack of interactiveness in the objective space, inability to handle discrete variables and inability to generate explicit knowledge. Four data mining methods are developed that can discover knowledge in the decision space and visualize it in the objective space. These methods are (i) sequential pattern mining, (ii) clustering-based classification trees, (iii) hybrid learning, and (iv) flexible pattern mining. Each method uses a unique learning strategy to generate explicit knowledge in the form of patterns, decision rules and unsupervised rules. The methods are also capable of taking the decision makers preferences into account to generate knowledge unique to preferred regions of the objective space. Three realistic production systems involving different types of discrete variables are chosen as application studies. A multi-objective optimization problem is formulated for each system and solved using NSGA-II to generate the optimization datasets. Next, all four methods are applied to each dataset. In each application, the methods discover similar knowledge for specified regions of the objective space. Overall, the unsupervised rules generated by flexible pattern mining are found to be the most consistent, whereas the supervised rules from classification trees are the most sensitive to user-preferences.

...read moreread less

145 citations

Journal Article•DOI•

Behavior change interventions: the potential of ontologies for advancing science and practice

[...]

Kai R. Larsen¹, Susan Michie², Eric B. Hekler³, Bryan Gibson⁴, Donna Spruijt-Metz⁵, David K. Ahern⁶, Heather Cole-Lewis⁷, Rebecca J. Bartlett Ellis⁸, Bradford W. Hesse⁹, Richard P. Moser⁹, Jean Yi - Show less +7 more•Institutions (9)

University of Colorado Boulder¹, University College London², Arizona State University³, University of Utah⁴, University of Southern California⁵, Harvard University⁶, Johnson & Johnson⁷, Indiana University⁸, National Institutes of Health⁹

01 Feb 2017-Journal of Behavioral Medicine

TL;DR: This paper provides a review of current efforts to create ontologies related to behavior change interventions and suggests future work, and introduces ontologies, a systematic method for articulating a “controlled vocabulary” of agreed-upon terms and their inter-relationships.

...read moreread less

Abstract: A central goal of behavioral medicine is the creation of evidence-based interventions for promoting behavior change. Scientific knowledge about behavior change could be more effectively accumulated using "ontologies." In information science, an ontology is a systematic method for articulating a "controlled vocabulary" of agreed-upon terms and their inter-relationships. It involves three core elements: (1) a controlled vocabulary specifying and defining existing classes; (2) specification of the inter-relationships between classes; and (3) codification in a computer-readable format to enable knowledge generation, organization, reuse, integration, and analysis. This paper introduces ontologies, provides a review of current efforts to create ontologies related to behavior change interventions and suggests future work. This paper was written by behavioral medicine and information science experts and was developed in partnership between the Society of Behavioral Medicine's Technology Special Interest Group (SIG) and the Theories and Techniques of Behavior Change Interventions SIG. In recent years significant progress has been made in the foundational work needed to develop ontologies of behavior change. Ontologies of behavior change could facilitate a transformation of behavioral science from a field in which data from different experiments are siloed into one in which data across experiments could be compared and/or integrated. This could facilitate new approaches to hypothesis generation and knowledge discovery in behavioral science.

...read moreread less

Journal Article•DOI•

Neuro-symbolic representation learning on biological knowledge graphs.

[...]

Mona Alshahrani¹, Mohammad Asif Khan¹, Omar Maddouri¹, Omar Maddouri², Akira R. Kinjo³, Núria Queralt-Rosinach⁴, Robert Hoehndorf¹ - Show less +3 more•Institutions (4)

King Abdullah University of Science and Technology¹, Khalifa University², Osaka University³, Scripps Research Institute⁴

01 Sep 2017-Bioinformatics

TL;DR: This work develops a novel method for feature learning on biological knowledge graphs that combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs.

...read moreread less

Abstract: Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. Availability and implementation https://github.com/bio-ontology-research-group/walking-rdf-and-owl. Contact robert.hoehndorf@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

On rule acquisition in incomplete multi-scale decision tables

[...]

Wei-Zhi Wu¹, Yuhua Qian², Tong-Jun Li¹, Shen-Ming Gu¹•Institutions (2)

Zhejiang Ocean University¹, Shanxi University²

01 Feb 2017-Information Sciences

TL;DR: Granular computing and acquisition of IF-THEN rules are two basic issues in knowledge representation and data mining and a rough set approach to knowledge discovery in incomplete multi-scale decision tables from the perspective of granular computing is proposed.

...read moreread less

Journal Article•DOI•

Knowledge graph for TCM health preservation

[...]

Yu Tong, Li Jinghua, Yu Qi, Tian Ye, Xiaofeng Shun, Lili Xu, Ling Zhu, Hongjie Gao - Show less +4 more

01 Mar 2017-Artificial Intelligence in Medicine

TL;DR: A large-scale knowledge graph is constructed, which integrates terms, documents, databases and other knowledge resources and can facilitate various knowledge services such as knowledge visualization, knowledge retrieval, and knowledge recommendation, and helps the sharing, interpretation, and utilization of TCM health care knowledge.

...read moreread less

Journal Article•DOI•

BDCaM: Big Data for Context-Aware Monitoring—A Personalized Knowledge Discovery Framework for Assisted Healthcare

[...]

Abdur Rahim Mohammad Forkan¹, Ibrahim Khalil¹, Ayman Ibaida¹, Zahir Tari¹•Institutions (1)

RMIT University¹

01 Oct 2017-IEEE Transactions on Cloud Computing

TL;DR: A knowledge discovery-based approach that allows the context-aware system to adapt its behaviour in runtime by analysing large amounts of data generated in Ambient assisted living systems and stored in cloud repositories is proposed.

...read moreread less

Abstract: Context-aware monitoring is an emerging technology that provides real-time personalised health-care services and a rich area of big data application. In this paper, we propose a knowledge discovery-based approach that allows the context-aware system to adapt its behaviour in runtime by analysing large amounts of data generated in ambient assisted living (AAL) systems and stored in cloud repositories . The proposed BDCaM model facilitates analysis of big data inside a cloud environment. It first mines the trends and patterns in the data of an individual patient with associated probabilities and utilizes that knowledge to learn proper abnormal conditions. The outcomes of this learning method are then applied in context-aware decision-making processes for the patient. A use case is implemented to illustrate the applicability of the framework that discovers the knowledge of classification to identify the true abnormal conditions of patients having variations in blood pressure (BP) and heart rate (HR). The evaluation shows a much better estimate of detecting proper anomalous situations for different types of patients. The accuracy and efficiency obtained for the implemented case study demonstrate the effectiveness of the proposed model.

...read moreread less

Proceedings Article•DOI•

When is a Network a Network?: Multi-Order Graphical Model Selection in Pathways and Temporal Networks

[...]

Ingo Scholtes¹•Institutions (1)

ETH Zurich¹

13 Aug 2017

TL;DR: This work develops a model selection technique to infer the optimal number of layers of such a model and shows that it outperforms baseline Markov order detection techniques and allows to infer graphical models that capture both topological and temporal characteristics of such data.

...read moreread less

Abstract: We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in the Web, travel patterns in transportation systems, information cascades in social networks, biological pathways, or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: When is a network abstraction of sequential data justified?Addressing this open question, we propose a framework that combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms baseline Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models that capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.

...read moreread less

Journal Article•DOI•

A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations.

[...]

Tome Eftimov¹, Barbara Koroušić Seljak, Peter Korošec•Institutions (1)

Jožef Stefan Institute¹

23 Jun 2017-PLOS ONE

TL;DR: Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations and is the first attempt at extracting dietary concepts.

...read moreread less

Abstract: Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations.

...read moreread less

Book Chapter•DOI•

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

[...]

Jiaming Shen¹, Zeqiu Wu¹, Dongming Lei¹, Jingbo Shang¹, Xiang Ren¹, Jiawei Han¹ - Show less +2 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

18 Sep 2017

TL;DR: This work focuses on corpus-based set expansion, which is a critical task in knowledge discovery and may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search.

...read moreread less

Abstract: Corpus-based set expansion (i.e., finding the “complete” set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search.

...read moreread less

Journal Article•DOI•

Semi-random partitioning of data into training and test sets in granular computing context

[...]

Han Liu¹, Mihaela Cocea²•Institutions (2)

Cardiff University¹, University of Portsmouth²

09 Aug 2017

TL;DR: A semi-random data partitioning framework is proposed to solve class imbalance and sample representativeness issues in granular computing, and shows that avoiding class imbalance results in better model performance.

...read moreread less

Abstract: Due to the vast and rapid increase in the size of data, machine learning has become an increasingly more popular approach for the purpose of knowledge discovery and predictive modelling. For both of the above purposes, it is essential to have a data set partitioned into a training set and a test set. In particular, the training set is used towards learning a model and the test set is then used towards evaluating the performance of the model learned from the training set. The split of the data into the two sets, however, and the influence on model performance, has only been investigated with respect to the optimal proportion for the two sets, with no attention paid to the characteristics of the data within the training and test sets. Thus, the current practice is to randomly split the data into approximately 70% for training and 30% for testing. In this paper, we show that this way of partitioning the data leads to two major issues: (a) class imbalance and (b) sample representativeness issues. Class imbalance is known to affect the performance of many classifiers by introducing a bias towards the majority class; the representativeness of the training set affects a model’s performance through the lack of opportunity for the algorithm to learn, by not presenting it with relevant examples—similar to testing a student on material that was not taught. To solve the above two issues, we propose a semi-random data partitioning framework, in the setting of granular computing. While we discuss how the framework can address both issues, in this paper, we focus on avoiding class imbalance when partitioning the data, through the proposed approach. The results show that avoiding class imbalance results in better model performance.

...read moreread less

Journal Article•DOI•

A novel fuzzy similarity measure and prevalence estimation approach for similarity profiled temporal association pattern mining

[...]

Vangipuram Radhakrishna¹, Shadi Aljawarneh², P. V. Kumar³, P. V. Kumar⁴, Vinjamuri Janaki⁵ - Show less +1 more•Institutions (5)

VNR Vignana Jyothi Institute of Engineering and Technology¹, Jordan University of Science and Technology², Acharya Institute of Technology³, Osmania University⁴, Vaagdevi College of Engineering⁵

14 Mar 2017-Future Generation Computer Systems

TL;DR: A novel approach for estimation of temporal association pattern prevalence values and a novel temporal fuzzy similarity measure which holds monotonicity to find similarity between any two temporal patterns are proposed.

...read moreread less

Journal Article•DOI•

Finding useful data across multiple biomedical data repositories using DataMed

[...]

Lucila Ohno-Machado¹, Susanna-Assunta Sansone², George Alter³, Ian Fore⁴, Jeffrey S. Grethe¹, Hongyan Xu⁵, Alejandra Gonzalez-Beltran², Philippe Rocca-Serra², Anupama E. Gururaj⁵, Elizabeth A. Bell¹, Ergin Soysal⁵, Nansu Zong¹, Hyeoneui Kim¹ - Show less +9 more•Institutions (5)

University of California, San Diego¹, University of Oxford², Inter-university Consortium for Political and Social Research³, National Institutes of Health⁴, University of Texas Health Science Center at Houston⁵

26 May 2017-Nature Genetics

TL;DR: The data index and search engine DataMed is designed to be, for data, what PubMed has been for the scientific literature, which supports the findability and accessibility of data sets.

...read moreread less

Abstract: The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today's big data–intensive science landscape.

...read moreread less

Journal Article•DOI•

Knowledge-leveraged transfer fuzzy C-Means for texture image segmentation with self-adaptive cluster prototype matching.

[...]

Pengjiang Qian¹, Pengjiang Qian², Pengjiang Qian³, Kaifa Zhao², Yizhang Jiang², Kuan-Hao Su³, Zhaohong Deng², Shitong Wang², Raymond F. Muzic³, Raymond F. Muzic¹ - Show less +6 more•Institutions (3)

University Hospitals of Cleveland¹, Jiangnan University², Case Western Reserve University³

15 Aug 2017-Knowledge Based Systems

TL;DR: Using the three-stage-based knowledge transfer, the beneficial knowledge from the source domain can be extensively, self-adaptively leveraged in the target domain.

...read moreread less

Abstract: We study a novel fuzzy clustering method to improve the segmentation performance on the target texture image by leveraging the knowledge from a prior texture image. Two knowledge transfer mechanisms, i.e. knowledge-leveraged prototype transfer (KL-PT) and knowledge-leveraged prototype matching (KL-PM) are first introduced as the bases. Applying them, the knowledge-leveraged transfer fuzzy C-means (KL-TFCM) method and its three-stage-interlinked framework, including knowledge extraction, knowledge matching, and knowledge utilization, are developed. There are two specific versions: KL-TFCM-c and KL-TFCM-f, i.e. the so-called crisp and flexible forms, which use the strategies of maximum matching degree and weighted sum, respectively. The significance of our work is fourfold: 1) Owing to the adjustability of referable degree between the source and target domains, KL-PT is capable of appropriately learning the insightful knowledge, i.e. the cluster prototypes, from the source domain; 2) KL-PM is able to self-adaptively determine the reasonable pairwise relationships of cluster prototypes between the source and target domains, even if the numbers of clusters differ in the two domains; 3) The joint action of KL-PM and KL-PT can effectively resolve the data inconsistency and heterogeneity between the source and target domains, e.g. the data distribution diversity and cluster number difference. Thus, using the three-stage-based knowledge transfer, the beneficial knowledge from the source domain can be extensively, self-adaptively leveraged in the target domain. As evidence of this, both KL-TFCM-c and KL-TFCM-f surpass many existing clustering methods in texture image segmentation; and 4) In the case of different cluster numbers between the source and target domains, KL-TFCM-f proves higher clustering effectiveness and segmentation performance than does KL-TFCM-c.

...read moreread less

Journal Article•DOI•

The knowledge graph as the default data model for learning on heterogeneous knowledge

[...]

Xander Wilcke¹, Peter Bloem, Victor de Boer•Institutions (1)

VU University Amsterdam¹

01 Jan 2017

TL;DR: It is argued that the knowledge graph is a suitable candidate for this data model, and current research is described and some of the promises and challenges of this approach are discussed.

...read moreread less

Abstract: In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data scien-tists were still engineering features, manually picking out the details they thought salient, they now prefer the data in their raw form. As long as we can assume that all relevant and ir-relevant information is present in the input data, we can de-sign deep models that build up intermediate representations to sift out relevant features. However, these models are often domain specific and tailored to the task at hand, and therefore unsuited for learning on heterogeneous knowledge: informa-tion of different types and from different domains. If we can develop methods that operate on this form of knowledge, we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. To accomplish this, we first need a data model capable of ex-pressing heterogeneous knowledge naturally in various do-mains, in as usable a form as possible, and satisfying as many use cases as possible. In this position paper, we argue that the knowledge graph is a suitable candidate for this data model. This paper describes current research and discusses some of the promises and challenges of this approach.

...read moreread less

Journal Article•DOI•

A survey of graph-modification techniques for privacy-preserving on networks

[...]

Jordi Casas-Roma¹, Jordi Herrera-Joancomartí², Vicenç Torra³•Institutions (3)

Open University of Catalonia¹, Autonomous University of Barcelona², University of Skövde³

01 Mar 2017-Artificial Intelligence Review

TL;DR: This paper summarizes privacy-preserving techniques, focusing on graph-modification methods which alter graph’s structure and release the entire anonymous network, which allow researchers and third-parties to apply all graph-mining processes on anonymous data, from local to global knowledge extraction.

...read moreread less

Abstract: Recently, a huge amount of social networks have been made publicly available. In parallel, several definitions and methods have been proposed to protect users' privacy when publicly releasing these data. Some of them were picked out from relational dataset anonymization techniques, which are riper than network anonymization techniques. In this paper we summarize privacy-preserving techniques, focusing on graph-modification methods which alter graph's structure and release the entire anonymous network. These methods allow researchers and third-parties to apply all graph-mining processes on anonymous data, from local to global knowledge extraction.

...read moreread less

Journal Article•DOI•

Applications of association rule mining in health informatics: a survey

[...]

Wasif Altaf¹, Muhammad Shahbaz², Aziz Guergachi³•Institutions (3)

Saarland University¹, University of Engineering and Technology, Lahore², Ryerson University³

01 Mar 2017-Artificial Intelligence Review

TL;DR: It has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics.

...read moreread less

Abstract: Association rule mining is an effective data mining technique which has been used widely in health informatics research right from its introduction. Since health informatics has received a lot of attention from researchers in last decade, and it has developed various sub-domains, so it is interesting as well as essential to review state of the art health informatics research. As knowledge discovery researchers and practitioners have applied an array of data mining techniques for knowledge extraction from health data, so the application of association rule mining techniques to health informatics domain has been focused and studied in detail in this survey. Through critical analysis of applications of association rule mining literature for health informatics from 2005 to 2014, it has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics. Moreover, other limitations related to applications of association rule mining for health informatics have also been identified and recommendations have been made to mitigate those limitations. Furthermore, the algorithms and tools utilized for application of association rule mining have also been identified, conclusions have been drawn from the literature surveyed, and future research directions have been presented.

...read moreread less

Journal Article•DOI•

Graph-based knowledge reuse for supporting knowledge-driven decision-making in new product development

[...]

Chao Zhang¹, Guanghui Zhou¹, Qi Lu¹, Fengtian Chang¹•Institutions (1)

Xi'an Jiaotong University¹

13 Jul 2017-International Journal of Production Research

TL;DR: A graph-based approach to knowledge reuse for supporting knowledge-driven decision-making in new product development and the feasibility and effectiveness of the proposed approach are demonstrated.

...read moreread less

Abstract: Pre-existing knowledge buried in manufacturing enterprises can be reused to help decision-makers develop good judgements to make decisions about the problems in new product development, which in turn speeds up and improves the quality of product innovation. This paper presents a graph-based approach to knowledge reuse for supporting knowledge-driven decision-making in new product development. The paper first illustrates the iterative process of knowledge-driven decision-making in new product development. Then, a novel framework is proposed to facilitate this process, where knowledge maps and knowledge navigation are involved. Here, OWL ontologies are employed to construct knowledge maps, which appropriately capture and organise knowledge resources generated at various stages of product lifecycle; the Personalised PageRank algorithm is used to perform knowledge navigation, which finds the most relevant knowledge in knowledge maps for a given problem in new product development. Finally, the feasibility and ...

...read moreread less

Journal Article•DOI•

Learning Entity and Relation Embeddings for Knowledge Resolution

[...]

Hailun Lin¹, Yong Liu¹, Weiping Wang¹, Yinliang Yue¹, Zheng Lin¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2017

TL;DR: This work proposes ETransR, a method which automatically learns entity and relation feature representations in continuous vector spaces, in order to measure the semantic relatedness of knowledge mentions for knowledge resolution.

...read moreread less

Abstract: Knowledge resolution is the task of clustering knowledge mentions, e.g., entity and relation mentions into several disjoint groups with each group representing a unique entity or relation. Such resolution is a central step in constructing high-quality knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a semantic dictionary or a knowledge graph. This may lead to poor performance on knowledge mentions with poor or not well-known contexts. In addition, it is also limited by the coverage of the semantic dictionary or knowledge graph. In this work, we propose ETransR, a method which automatically learns entity and relation feature representations in continuous vector spaces, in order to measure the semantic relatedness of knowledge mentions for knowledge resolution. Experimental results on two benchmark datasets show that our proposed method delivers significant improvements compared with the state-of-the-art baselines on the task of knowledge resolution.

...read moreread less

Journal Article•DOI•

Data mining and clustering in chemical process databases for monitoring and knowledge discovery

[...]

Michael C. Thomas¹, Wenbo Zhu¹, Jose A. Romagnoli¹•Institutions (1)

Louisiana State University¹

16 Feb 2017-Journal of Process Control

TL;DR: Two studies on an industrial scale separation tower and the Tennessee Eastman process simulation demonstrate data clustering and feature extraction effectively revealing significant process trends from high dimensional, multivariate data.

...read moreread less

Journal Article•DOI•

Domain-Targeted, High Precision Knowledge Extraction

[...]

Bhavana Dalvi Mishra¹, Niket Tandon¹, Peter Clark¹•Institutions (1)

Allen Institute for Artificial Intelligence¹

26 Jul 2017-Transactions of the Association for Computational Linguistics

TL;DR: This work has created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precisionknowledge targeted to a particular domain - in this case, elementary science.

...read moreread less

Abstract: Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject,predicate,object) statements about the world, in support of a downstream question-answering (QA) application. Despite recent advances in information extraction (IE) techniques, no suitable resource for our task already exists; existing resources are either too noisy, too named-entity centric, or too incomplete, and typically have not been constructed with a clear scope or purpose. To address these, we have created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precision knowledge targeted to a particular domain - in our case, elementary science. To measure the KB’s coverage of the target domain’s knowledge (it’s "comprehensiveness" with respect to science) we measure recall with respect to an independent corpus of domain text, and show that our pipeline produces output with over 80% precision and 23% recall with respect to that target, a substantially higher coverage of tuple-expressible science knowledge than other comparable resources. We have made the KB publicly available at http://data.allenai.org/tuple-kb .

...read moreread less

Journal Article•DOI•

Data Sharing: Convert Challenges into Opportunities.

[...]

Ana Sofia Figueiredo¹•Institutions (1)

Heidelberg University¹

04 Dec 2017-Frontiers in Public Health

TL;DR: The impact that data sharing has in science and society is reviewed and guidelines to improve the efficient sharing of research data are presented.

...read moreread less

Abstract: Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set. For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community. These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

...read moreread less

Collapse