scispace - formally typeset
Search or ask a question

Showing papers on "Knowledge extraction published in 2022"


Journal ArticleDOI
Yuansheng Liu1
TL;DR: In this article , a review of knowledge graph-based works that implement drug repurposing and adverse drug reaction prediction for drug discovery is presented, and several representative embedding models are introduced to provide a comprehensive understanding of knowledge representation learning.

61 citations


Journal ArticleDOI
TL;DR: In this paper, the authors summarize knowledge graph-based works that implement drug repurposing and adverse drug reaction prediction for drug discovery, and introduce several representative embedding models to provide a comprehensive understanding of knowledge representation learning.

61 citations


Journal ArticleDOI
TL;DR: An edge-cloud-assisted federated learning framework for communication-efficient and privacy-preserving energy data sharing of users in smart grids and a two-layer deep reinforcement-learning-based incentive algorithm is developed to promote EDOs’ participation and high-quality model contribution.
Abstract: With the prevalence of smart appliances, smart meters, and Internet of Things (IoT) devices in smart grids, artificial intelligence (AI) built on the rich IoT big data enables various energy data analysis applications and brings intelligent and personalized energy services for users. In conventional AI of Things (AIoT) paradigms, a wealth of individual energy data distributed across users’ IoT devices needs to be migrated to a central storage (e.g., cloud or edge device) for knowledge extraction, which may impose severe privacy violation and data misuse risks. Federated learning, as an appealing privacy-preserving AI paradigm, enables energy data owners (EDOs) to cooperatively train a shared AI model without revealing the local energy data. Nevertheless, potential security and efficiency concerns still impede the deployment of federated-learning-based AIoT services in smart grids due to the low-quality shared local models, non-independently and identically distributed (non-IID) data distributions, and unpredictable communication delays. In this article, we propose a secure and efficient federated-learning-enabled AIoT scheme for private energy data sharing in smart grids with edge-cloud collaboration. Specifically, we first introduce an edge-cloud-assisted federated learning framework for communication-efficient and privacy-preserving energy data sharing of users in smart grids. Then, by considering non-IID effects, we design a local data evaluation mechanism in federated learning and formulate two optimization problems for EDOs and energy service providers. Furthermore, due to the lack of knowledge of multidimensional user private information in practical scenarios, a two-layer deep reinforcement-learning-based incentive algorithm is developed to promote EDOs’ participation and high-quality model contribution. Extensive simulation results show that the proposed scheme can effectively stimulate EDOs to share high-quality local model updates and improve the communication efficiency.

36 citations


Journal ArticleDOI
TL;DR: An automatic construction framework for the process knowledge base in the field of machining based on knowledge graph (KG) is introduced and a hybrid algorithm based on an improved edit distance and attribute weighting is built to overcome the redundancy in the knowledge fusion stage.
Abstract: The process knowledge base is the key module in intelligent process design, it determines the intelligence degree of the design system and affects the quality of product design. However, traditional process knowledge base construction is non-automated, time consuming and requires much manual work, which is not sufficient to meet the demands of the modern manufacturing mode. Moreover, the knowledge base often adopts a single knowledge representation, and this may lead to ambiguity in the meaning of some knowledge, which will affect the quality of the process knowledge base. To overcome the above problems, an automatic construction framework for the process knowledge base in the field of machining based on knowledge graph (KG) is introduced. First, the knowledge is classified and annotated based on the function-behavior-states (FBS) design method. Second, a knowledge extraction framework based on BERT-BiLSTM-CRF is established to perform the automatic knowledge extraction of process text. Third, a knowledge representation method based on fuzzy comprehensive evaluation is established, forming three types of knowledge representation with the KG as the main, production rules and two-dimensional data linked list as a supplement. In addition, to overcome the redundancy in the knowledge fusion stage, a hybrid algorithm based on an improved edit distance and attribute weighting is built. Finally, a prototype system is developed, and quality analysis is carried out. Compared with the F values of BiLSTM-CRF and CNN-BiLSTM-CRF, that of the proposed extraction method in the machining domain is increased by 7.35% and 3.87%, respectively.

32 citations


Journal ArticleDOI
TL;DR: In this article , an automatic construction framework for the process knowledge base in the field of machining based on knowledge graph (KG) is introduced, and a knowledge extraction framework based on BERT-BiLSTM-CRF is established to perform the automatic knowledge extraction of process text.
Abstract: • A framework for automatically constructing a knowledge base is developed. • The extraction effect of this framework is better than other frameworks. • A evaluation algorithm is proposed to judge the optimal expression of knowledge. • Semantic and attribute weighting factors among knowledge entities are considered. The process knowledge base is the key module in intelligent process design, it determines the intelligence degree of the design system and affects the quality of product design. However, traditional process knowledge base construction is non-automated, time consuming and requires much manual work, which is not sufficient to meet the demands of the modern manufacturing mode. Moreover, the knowledge base often adopts a single knowledge representation, and this may lead to ambiguity in the meaning of some knowledge, which will affect the quality of the process knowledge base. To overcome the above problems, an automatic construction framework for the process knowledge base in the field of machining based on knowledge graph (KG) is introduced. First, the knowledge is classified and annotated based on the function-behavior-states (FBS) design method. Second, a knowledge extraction framework based on BERT-BiLSTM-CRF is established to perform the automatic knowledge extraction of process text. Third, a knowledge representation method based on fuzzy comprehensive evaluation is established, forming three types of knowledge representation with the KG as the main, production rules and two-dimensional data linked list as a supplement. In addition, to overcome the redundancy in the knowledge fusion stage, a hybrid algorithm based on an improved edit distance and attribute weighting is built. Finally, a prototype system is developed, and quality analysis is carried out. Compared with the F values of BiLSTM-CRF and CNN-BiLSTM-CRF, that of the proposed extraction method in the machining domain is increased by 7.35% and 3.87%, respectively.

26 citations


Proceedings ArticleDOI
27 Jan 2022
TL;DR: This study develops the ontology transformation based on the external knowledge graph to address the knowledge missing issue and proposes ontology-enhanced prompt-tuning (OntoPrompt), which fulfills and converts structure knowledge to text.
Abstract: Few-shot Learning (FSL) is aimed to make predictions based on a limited number of samples. Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks. However, the priors adopted by the existing methods suffer from challenging knowledge missing, knowledge noise, and knowledge heterogeneity, which hinder the performance for few-shot learning. In this study, we explore knowledge injection for FSL with pre-trained language models and propose ontology-enhanced prompt-tuning (OntoPrompt). Specifically, we develop the ontology transformation based on the external knowledge graph to address the knowledge missing issue, which fulfills and converts structure knowledge to text. We further introduce span-sensitive knowledge injection via a visible matrix to select informative knowledge to handle the knowledge noise issue. To bridge the gap between knowledge and text, we propose a collective training algorithm to optimize representations jointly. We evaluate our proposed OntoPrompt in three tasks, including relation extraction, event extraction, and knowledge graph completion, with eight datasets. Experimental results demonstrate that our approach can obtain better few-shot performance than baselines.

24 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce multiple perspectives for event detection in the big social data era, and thoroughly investigate and summarize the significant progress in social event detection and visualization techniques, by emphasizing crucial challenges ranging from the management, fusion, and mining of big data, to the applicability of these methods to different platforms, multiple languages and dialects rather than a single language, and with multiple modalities.

20 citations



Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed a novel unsupervised methodology for feature extraction and knowledge discovery based on automatic identification system (AIS) data, allowing for seamless knowledge transfer to support trajectory data mining.
Abstract: Owing to the space–air–ground integrated networks (SAGIN), seaborne shipping has attracted increasing interest in the research on the motion behavior knowledge extraction and navigation pattern mining problems in the era of maritime big data for improving maritime traffic safety management. This study aims to develop a novel unsupervised methodology for feature extraction and knowledge discovery based on automatic identification system (AIS) data, allowing for seamless knowledge transfer to support trajectory data mining. The unsupervised hierarchical methodology is constructed from three parts: trajectory compression, trajectory similarity measure, and trajectory clustering. In the first part, an adaptive Douglas–Peucker with speed (ADPS) algorithm is created to preserve critical features, obtain useful information, and simplify trajectory information. Then, dynamic time warping (DTW) is utilized to measure the similarity between trajectories as the critical indicator in trajectory clustering. Finally, the improved spectral clustering with mapping (ISCM) is presented to extract vessel traffic behavior characteristics and mine movement patterns for enhancing marine safety and situational awareness. Comprehensive experiments are conducted and implemented in the Chengshan Jiao Promontory in China to verify the feasibility and effectiveness of the novel methodology. Experimental results show that the proposed methodology can effectively compress the trajectories, determine the number of clusters in advance, guarantee the clustering accuracy, and extract useful navigation knowledge while significantly reducing the computational cost. The clustering results are further explored and follow the Gaussian mixture distribution, which can help provide new discriminant criteria for trajectory clustering.

16 citations


ProceedingsDOI
14 Aug 2022
TL;DR: The 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2022) as mentioned in this paper will be held between August 14th and 18th, 2022 at the Walter E. Washington Convention Center in Washington, DC, USA.
Abstract: It is our great pleasure to welcome you back in person after a 2-year hiatus to the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2022), which is held between August 14th and 18th, 2022 at the Walter E. Washington Convention Center in Washington, DC, USA. This year's KDD continues its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of data science, machine learning, big data and artificial intelligence. The KDD 2022 program promises to be the most robust and diverse to date, with keynote presentations, industry-led sessions, workshops, and tutorials spanning a wide range of topics - from data-driven humanitarian mapping and applied data science in healthcare to the uses of artificial intelligence (AI) for climate mitigation and decision intelligence for online marketplaces.

11 citations


Journal ArticleDOI
28 Jun 2022
TL;DR: PICa as mentioned in this paper proposes to use image captions for knowledge-based visual question answering (VQA) in a few-shot manner by converting the image into captions (or tags) that GPT-3 can understand.
Abstract: Knowledge-based visual question answering (VQA) involves answering questions that require external knowledge not present in the image. Existing methods first retrieve knowledge from external resources, then reason over the selected knowledge, the input image, and question for answer prediction. However, this two-step approach could lead to mismatches that potentially limit the VQA performance. For example, the retrieved knowledge might be noisy and irrelevant to the question, and the re-embedded knowledge features during reasoning might deviate from their original meanings in the knowledge base (KB). To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA. Inspired by GPT-3’s power in knowledge retrieval and question answering, instead of using structured KBs as in previous work, we treat GPT-3 as an implicit and unstructured KB that can jointly acquire and process relevant knowledge. Specifically, we first convert the image into captions (or tags) that GPT-3 can understand, then adapt GPT-3 to solve the VQA task in a few-shot manner by just providing a few in-context VQA examples. We further boost performance by carefully investigating: (i) what text formats best describe the image content, and (ii) how in-context examples can be better selected and used. PICa unlocks the first use of GPT-3 for multimodal tasks. By using only 16 examples, PICa surpasses the supervised state of the art by an absolute +8.6 points on the OK-VQA dataset. We also benchmark PICa on VQAv2, where PICa also shows a decent few-shot performance.

Journal ArticleDOI
TL;DR: In this article , a model called ontology-based knowledge map is proposed to represent and store the results (knowledge) of data mining in crop farming to build, maintain, and enrich the process of knowledge discovery.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a new paradigm of predictive modeling with the classical approach of parameter estimation regressions to produce improved models that combine explanation and prediction, revealing valid and significant hidden patterns in data, identifying nonlinear and non-additive effects.

Journal ArticleDOI
TL;DR: In this paper , the authors present a literature review of process discovery approaches exploiting domain knowledge and identify remaining challenges for future work, and define a taxonomy that systematically classifies and compares existing approaches.

Journal ArticleDOI
TL;DR: Based on the equivalence analysis of the self-attention mechanism (SAM) and graph convolution (GC) operation, the spatial SAM is adopted for knowledge discovery from data directly, and the GC layer considering the relationship between process variables can utilize the knowledge for constructing soft sensor models as mentioned in this paper .
Abstract: In industrial processes, data-driven soft sensors have played an important role for the effective process control, optimization, and monitoring. Deep learning technique has been widely used in soft sensor field in recent years for its excellent feature representation capability in spatial and temporal scales. However, the shortcomings for deep learning technique seriously hinder its application in industrial processes. For example, the knowledge cannot be added into the model, and the model prediction could not be well explained. To solve those problems, the graph mining, convolution, and explanation framework is proposed for knowledge automation in this article. Based on the equivalence analysis of the self-attention mechanism (SAM) and graph convolution (GC) operation, the spatial SAM is adopted for knowledge discovery from data directly. After that, the GC layer considering the relationship between process variables can utilize the knowledge for constructing soft sensor models. Besides, to explain which knowledge contributes to the final model prediction, the graph neural network explainer is designed for explaining the model output. Finally, the effectiveness and feasibility of the framework are evaluated on an industrial process, in which the knowledge discovered from the data is of great consistence with the prior knowledge, and the final explanation indicated that most of the knowledge is consistent with the prior knowledge contributed to the prediction.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a knowledge guided distance supervision (KGDS) model to handle the biomedical relation extraction task in Chinese electronic medical records, which achieved the best performance comparing to other state-of-the-art models.
Abstract: The goal of biomedical relation extraction is to obtain structured information from electronic medical records by identifying relations among clinical entities. By integrating the advantages of unsupervised and semi-supervised learning, the distant supervision approach has achieved significant success for a relation extraction task without a large amount of labeled corpora. However, in many cases, the recognized entities from the Chinese clinical text are not defined in semantic knowledge base, which limits the application of distant supervision for biomedical relation extraction. This work proposes a Knowledge Guided Distance Supervision (KGDS) model for handling the biomedical relation extraction task in Chinese electronic medical records. To handle the unknown entities, entity-type alignment (instead of entity alignment in traditional distant supervision) is employed for extracting coarse-grained relations. Then, by learning the relation embeddings both from semantic knowledge base and electronic medical record dataset as knowledge-enhanced features, this work presents a knowledge-enhanced bootstrapping learning process for fine-grained relation disambiguation. The empirical experiments on the real-world dataset of electronic medical records illustrate that our KGDS model achieves the best performance comparing to other state-of-the-art models, thereby advancing the field of biomedical relation extraction from Chinese electronic medical records.

Journal ArticleDOI
TL;DR: The design of PSyKE is discussed, a platform providing general-purpose support to symbolic knowledge extraction from different sorts of black-box predictors via many extraction algorithms, which targets symbolic knowledge in logic form, allowing the extraction of first-order logic clauses.
Abstract: A common practice in modern explainable AI is to post-hoc explain black-box machine learning (ML) predictors – such as neural networks – by extracting symbolic knowledge out of them, in the form of either rule lists or decision trees. By acting as a surrogate model, the extracted knowledge aims at revealing the inner working of the black box, thus enabling its inspection, representation, and explanation. Various knowledge-extraction algorithms have been presented in the literature so far. Unfortunately, running implementations of most of them are currently either proofs of concept or unavailable. In any case, a unified, coherent software framework supporting them all – as well as their interchange, comparison, and exploitation in arbitrary ML workflows – is currently missing. Accordingly, in this paper we discuss the design of PSyKE, a platform providing general-purpose support to symbolic knowledge extraction from different sorts of black-box predictors via many extraction algorithms. Notably, PSyKE targets symbolic knowledge in logic form, allowing the extraction of first-order logic clauses. The extracted knowledge is thus both machine- and human-interpretable, and can be used as a starting point for further symbolic processing—e.g. automated reasoning.

Journal ArticleDOI
TL;DR: This paper conducted a comprehensive survey of causality extraction techniques, including knowledge-based, statistical machine learning (ML)-based, and deep learning-based approaches, and highlighted existing open challenges with their potential directions.
Abstract: Abstract As an essential component of human cognition, cause–effect relations appear frequently in text, and curating cause–effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning (ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multi-granularity rough set and concept lattice generation method for uncertain decision-making problems in traditional Chinese medicine (TCM) and Western medicine.

Journal ArticleDOI
TL;DR: In this article , the authors introduce and discuss several techniques and solutions used in information extraction from medical documents and outline the challenges of information extraction in the medical field with an experimental analysis and a suggestion for uncovered directions.
Abstract: In the medical field, a doctor must have a comprehensive knowledge by reading and writing narrative documents, and he is responsible for every decision he takes for patients. Unfortunately, it is very tiring to read all necessary information about drugs, diseases and patients due to the large amount of documents that are increasing every day. Consequently, so many medical errors can happen and even kill people. Likewise, there is such an important field that can handle this problem, which is the information extraction. There are several important tasks in this field to extract the important and desired information from unstructured text written in natural language. The main principal tasks are named entity recognition and relation extraction since they can structure the text by extracting the relevant information. However, in order to treat the narrative text we should use natural language processing techniques to extract useful information and features. In our paper, we introduce and discuss the several techniques and solutions used in these tasks. Furthermore, we outline the challenges in information extraction from medical documents. In our knowledge, this is the most comprehensive survey in the literature with an experimental analysis and a suggestion for some uncovered directions.

Journal ArticleDOI
TL;DR: In this article , a data-oriented methodology was proposed, combining Data Analysis, Machine Learning and Complex Network Analysis techniques, and Data Version Control (DVC) tool, for the extraction of implicit knowledge in scientific production bases.
Abstract: The mapping and analysis of scientific knowledge makes it possible to identify the dynamics and/or growth of a particular field of research or to support strategic decisions related to different research entities, based on bibliometric and/or scientometric indicators. However, with the exponential growth of scientific production, a systematic and data-oriented approach to the analysis of this large set of productions becomes increasingly essential. Thus, in this work, a data-oriented methodology was proposed, combining Data Analysis, Machine Learning and Complex Network Analysis techniques, and Data Version Control (DVC) tool, for the extraction of implicit knowledge in scientific production bases. In addition, the approach was validated through a case study in a COVID-19 manuscripts dataset, which had 199,895 articles published on arXiv, bioRxiv, medRxiv, PubMed and Scopus databases. The results suggest the feasibility of the proposed methodology, indicating the most active countries and the most explored themes in each period of the pandemic. Therefore, this study has the potential to instrument and expand strategic decisions by the scientific community, aiming at extracting knowledge that supports the fight against the COVID-19 pandemic.

Journal ArticleDOI
TL;DR: In this paper , the authors propose a decision support system that allows interactive knowledge discovery and knowledge visualization to support practitioners by simultaneously considering preferences in the objective space and their impact in the decision space.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a novel evolutionary model for event detection to capture the dynamism and evolving behavior of events, which uses a matrix decomposition technique and a Dirichlet Process to detect events and handle their dynamicity.
Abstract: With the huge expansion of user generated content on social networks, event detection has emerged as a major challenge and source of knowledge discovery. This knowledge is employed in different applications such as recommender systems, crisis management systems, and decision support systems. Dynamicity, overlapping, and evolutionary behavior are the most important issues in event detection. This paper proposes a novel evolutionary model for event detection to capture the dynamism and evolving behavior of events. The proposed method uses a matrix decomposition technique and a Dirichlet Process to detect events and handle their dynamicity. This model consists of two components, namely preliminary event detection and event evolvement tracking. The former component extracts preliminary events from the available data using the matrix decomposition method. Then, subsequent data is employed into a Non-Parametric Bayesian Network, namely Dirichlet Process Mixture Model to evolve the preliminary events. During the evolvement process, data may migrate between extracted events or new events may be discovered. The experimental results and comparisons with several recently developed approaches show the superiority of the proposed approach, and its ability to capture the evolutionary behavior of events over time.

Journal ArticleDOI
TL;DR: The patterns of associations between texts in problem-solving records are extracted to generate appropriate solutions automatically and the Apriori algorithm is used to identify pattern associations among document clusters that represent the problems, causes, and solutions.
Abstract: Prompt responses to problems/faults arising in an assembly workshop are crucial in terms of production reliability and efficiency. However, human-dependent tasks are time-consuming and prone to error. In this paper, we propose a knowledge discovery approach. We extract the patterns of associations between texts in problem-solving records to generate appropriate solutions automatically. First, we use an enhanced latent Dirichlet allocation (EnLDA) technique to explore the document-topic and topic-word distributions of a text corpus recording assembly problems, causes, and solutions. To increase accuracy, we adjust the elements of the document-term matrix, and we assign term frequency-inverse document frequencies. Second, we use the Refining Density-based Spatial Clustering of Application with Noise (Rf-DBSCAN) algorithm for text clustering. This refines the distances among topic distribution vectors and incorporates noise objects into clustering. This clusters textual documents with similar semantic information, maximizing information retention. Third, we use the Apriori algorithm to identify pattern associations among document clusters that represent the problems, causes, and solutions. We perform a case study using field data from an automobile assembly workshop. The results show that the method retrieves hidden but valuable information from textual records. The decision support knowledge facilitates assembly problem-solving.

Journal ArticleDOI
TL;DR: This article proposed an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET).
Abstract: Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.

Journal ArticleDOI
TL;DR: CRIKE as discussed by the authors is a knowledge-based framework conceived to support legal knowledge extraction from a collection of legal documents, based on a reference legal ontology called LATO (legal Abstract Term Ontology).

Proceedings ArticleDOI
TL;DR: A novel knowledge discovery algorithm based on double evolving frequent pattern trees that can trace the dynamically evolving data by an incremental sliding window is proposed that can discover new knowledge from evolving data with good performance and high accuracy.
Abstract: To understand current situation in specific scenarios, valuable knowledge should be mined from both historical data and emerging new data. However, most existing algorithms take the historical data and the emerging data as a whole and periodically repeat to analyze all of them, which results in heavy computation overhead. It is also challenging to accurately discover new knowledge in time, because the emerging data are usually small compared to the historical data. To address these challenges, we propose a novel knowledge discovery algorithm based on double evolving frequent pattern trees that can trace the dynamically evolving data by an incremental sliding window. One tree is used to record frequent patterns from the historical data, and the other one records incremental frequent items. The structures of the double frequent pattern trees and their relationships are updated periodically according to the emerging data and a sliding window. New frequent patterns are mined from the incremental data and new knowledge can be obtained from pattern changes. Evaluations show that this algorithm can discover new knowledge from evolving data with good performance and high accuracy.

Journal ArticleDOI
TL;DR: In this article , the authors developed a method for automatic creation of metro maps of information obtained by Association Rule Mining and, thus, spread its applicability to the other machine learning methods.
Abstract: Association Rule Mining is a machine learning method for discovering the interesting relations between the attributes in a huge transaction database. Typically, algorithms for Association Rule Mining generate a huge number of association rules, from which it is hard to extract structured knowledge and present this automatically in a form that would be suitable for the user. Recently, an information cartography has been proposed for creating structured summaries of information and visualizing with methodology called “metro maps”. This was applied to several problem domains, where pattern mining was necessary. The aim of this study is to develop a method for automatic creation of metro maps of information obtained by Association Rule Mining and, thus, spread its applicability to the other machine learning methods. Although the proposed method consists of multiple steps, its core presents metro map construction that is defined in the study as an optimization problem, which is solved using an evolutionary algorithm. Finally, this was applied to four well-known UCI Machine Learning datasets and one sport dataset. Visualizing the resulted metro maps not only justifies that this is a suitable tool for presenting structured knowledge hidden in data, but also that they can tell stories to users.

Journal ArticleDOI
TL;DR: In this paper , an ontology-based agro knowledge management system is proposed to capture the different kinds of knowledge associated with agriculture and attempt to obtain a single source of agro information that is usable and reusable to the users.
Abstract: The quality of agriculture depends on the quality of the yield, which is usually obtained through the well-being of the crop. The quality of any crop depends on the minerals in the soil, the type of soil, the location, and the seasons. The crop yield depends on soil fertility, availability of water, climate, and disease prevention. Although this information is prevailing in plenty among the expert farmers, the means of abducting the information to the future generation has not been much promoted. Hence, the knowledge disseminated regarding agriculture becomes scarce, affecting the entire agricultural process. Given these facts, a single source, strong knowledge management system is proposed to be designed. The system aims to embrace the different kinds of knowledge associated with agriculture and attempt to obtain a single source of agro information that is very much usable and reusable to the users. To ensure the maximum level of reusability, the knowledge of the domain needs to be modeled and represented in a way that is scalable and flexible. One of the knowledge representation techniques that emphasizes on reusability and scalability is ontology. Thus, this paper attempts to design an ontology-based agro knowledge management system. A rule base is constructed to improve the expressiveness of the knowledge. An incremental mining approach is adopted to extract the knowledge from multiple ontology. To understand better to aid decision-making, a visualization task is carried out. A multi ontology-based knowledge mining model is attempted in this research to provide better insight regarding agro knowledge.

Journal ArticleDOI
24 May 2022
TL;DR: The above trends and features of the KDD market should be taken into account in further theoretical research and practical implementation or reengineering of KDD systems in Ukraine, and are relevant and applicable not only for local companies and organizations, but also for international applications in the context of global, regional macroeconomic and current national crisis phenomena.
Abstract: In modern conditions of the development of the global economy and in connection with the emergence of new branches of economic activity in the field of IT, the phenomenon of Structured and Unstructured Big Data - the use of Data Science for advanced in-depth analysis of data and knowledge in all possible modes - leads to competitive advantages for corporations and institutions, both at the regional and interstate levels, which is especially relevant in the context of the current macroeconomic and military crisis [1].The following topical issues are systematically investigated in the article: current status and prospects for further development of Knowledge Discovery in Data Base (KDD), problems and critical issues of theory and practice of Data Mining, the specifics of effective use of Knowledge Discovery in DB (Data Base) in the current crisis in Ukraine.The above trends and features of the KDD market should be taken into account in further theoretical research and practical implementation or reengineering of KDD systems in Ukraine. The obtained results are relevant and applicable not only for local companies and organizations, but also for international applications in the context of global, regional macroeconomic and current national crisis phenomena.