scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An efficient integration and indexing method based on feature patterns and semantic analysis for big data

TL;DR: A probabilistic feature Patterns (PFP) approach using feature transformation and selection method is proposed for efficient data integration and utilizing the features latent semantic analysis (F-LSA) method for indexing the unsupervised multiple heterogeneous integrated cluster data sources.
Abstract: Big Data has received much attention in the multi-domain industry. In the digital and computing world, information is generated and collected at a rate that quickly exceeds the boundaries. The traditional data integration system interconnects the limited number of resources and is built with relatively stable and generally complex and time-consuming design activities. However, the rapid growth of these large data sets creates difficulties in learning heterogeneous data structures for integration and indexing. It also creates difficulty in information retrieval for the various data analysis requirements. In this paper, a probabilistic feature Patterns (PFP) approach using feature transformation and selection method is proposed for efficient data integration and utilizing the features latent semantic analysis (F-LSA) method for indexing the unsupervised multiple heterogeneous integrated cluster data sources. The PFP approach takes the advantage of the features transformation and selection mechanism to map and cluster the data for the integration, and an analysis of the data features context relation using LSA to provide the appropriate index for fast and accurate data extraction. A huge volume of BibText dataset from different publication sources are processed to evaluated to understand the effectiveness of the proposal. The analytical study and the outcome results show the improvisation in integration and indexing of the work.
Citations
More filters
Journal ArticleDOI
TL;DR: A conceptual framework of intelligent decision-making based on industrial big data-driven technology is proposed in this study, which provides valuable insights and thoughts for the severe challenges and future research directions in this field.

50 citations

Journal ArticleDOI
TL;DR: A conceptual framework of intelligent decision-making based on industrial big data-driven technology is proposed in this paper , which provides valuable insights and thoughts for the severe challenges and future research directions in this field.

18 citations

Journal ArticleDOI
TL;DR: A novel knowledge graph embedding model called MRotatE is proposed, which exploits triplet features from the perspective of relational and entity rotations, which can model and infer various relation patterns and handle with multi-fold relations at the same time.
Abstract: Knowledge graphs are typical large-scale multi-relational structures and useful for many artificial intelligence tasks. However, knowledge graphs often have missing facts, which limits the development of downstream tasks. To refine the knowledge graphs, knowledge graph embedding models have been developed. Knowledge graph embedding models aim to learn distributed representations for entities and relations and predict unknown triplets by scoring candidate triplets. Nevertheless, state-of-the-art works either aim to capture different relation patterns, or to model the multi-fold relations, and yet fail to consider these two aspects simultaneously. To fill this gap, in this paper, we propose a novel knowledge graph embedding model, MRotatE. It exploits triplet features from the perspective of relational and entity rotations, which can model and infer various relation patterns and handle with multi-fold relations at the same time. The experimental results demonstrate that MRotatE outperforms existing approaches and attains the state-of-the-art performance.

11 citations

Journal ArticleDOI
TL;DR: The proposed study has been considered to present decision-making and computational modeling of Big Data for sustaining influential usage in an organized way to analyze the currently available research.
Abstract: Big Data is data whose shape and volume are rising with the passage of time and innovations in technology. This increase will give birth to more uncertain and complex situations, which will then be difficult to properly analyze and manage. Various devices are interconnected with each other, which communicate different types of information. This information is used for different purposes. A huge volume of data is produced, and the storage becomes larger. Computational modeling is the tool that helps analyze, process, and manage the data to extract useful information. The modern industry's challenge is to incorporate knowledge into Big Data applications to deal with distinguishing difficulties in computational models. The techniques and models are delivered with guides to help analysts quickly fit models to information insights. The decision support system is a strong system that plays a significant role in shaping Big Data for sustaining efficiency and performance. Decision-making through computational modeling is also a powerful mechanism for supporting efficient tools for managing Big Data for influential use. Keeping in view the issues of modern-day industry, the proposed study has been considered to present decision-making and computational modeling of Big Data for sustaining influential usage. The existing state-of-the-art literature is presented in an organized way to analyze the currently available research.

2 citations

Journal ArticleDOI
TL;DR: The evaluation and conferred statistical results notify the development of the proposed BCTMP-WEABC technique in terms of higher classification accuracy, feature selection rate, and minimum time consumption as well as false positives than the conventional methods.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: This survey performs a comprehensive study of data collection from a data management point of view, providing a research landscape of these operations, guidelines on which technique to use when, and identify interesting research challenges.
Abstract: Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.

471 citations

Journal ArticleDOI
TL;DR: The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks, and indicates that considering feature selection as a multi- objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.
Abstract: Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

256 citations

Journal ArticleDOI
TL;DR: This work presents a comprehensive overview of various feature selection methods and their inherent pros and cons, and analyzes adaptive classification systems and parallel classification systems for chronic disease prediction.

238 citations

Journal ArticleDOI
TL;DR: Taxonomy of indexing techniques is developed to provide insight to enable researchers understand and select a technique as a basis to design an indexing mechanism with reduced time and space consumption for BD-MCC.
Abstract: The explosive growth in volume, velocity, and diversity of data produced by mobile devices and cloud applications has contributed to the abundance of data or `big data.' Available solutions for efficient data storage and management cannot fulfill the needs of such heterogeneous data where the amount of data is continuously increasing. For efficient retrieval and management, existing indexing solutions become inefficient with the rapidly growing index size and seek time and an optimized index scheme is required for big data. Regarding real-world applications, the indexing issue with big data in cloud computing is widespread in healthcare, enterprises, scientific experiments, and social networks. To date, diverse soft computing, machine learning, and other techniques in terms of artificial intelligence have been utilized to satisfy the indexing requirements, yet in the literature, there is no reported state-of-the-art survey investigating the performance and consequences of techniques for solving indexing in big data issues as they enter cloud computing. The objective of this paper is to investigate and examine the existing indexing techniques for big data. Taxonomy of indexing techniques is developed to provide insight to enable researchers understand and select a technique as a basis to design an indexing mechanism with reduced time and space consumption for BD-MCC. In this study, 48 indexing techniques have been studied and compared based on 60 articles related to the topic. The indexing techniques' performance is analyzed based on their characteristics and big data indexing requirements. The main contribution of this study is taxonomy of categorized indexing techniques based on their method. The categories are non-artificial intelligence, artificial intelligence, and collaborative artificial intelligence indexing methods. In addition, the significance of different procedures and performance is analyzed, besides limitations of each technique. In conclusion, several key future research topics with potential to accelerate the progress and deployment of artificial intelligence-based cooperative indexing in BD-MCC are elaborated on.

222 citations

Book ChapterDOI
01 Jan 2018
TL;DR: A comprehensive overview about text mining and its current research status is demonstrated and experimental results indicated that Springer database represents the main source for research articles in the field of mobile education for the medical domain.
Abstract: Nowadays, research in text mining has become one of the widespread fields in analyzing natural language documents. The present study demonstrates a comprehensive overview about text mining and its current research status. As indicated in the literature, there is a limitation in addressing Information Extraction from research articles using Data Mining techniques. The synergy between them helps to discover different interesting text patterns in the retrieved articles. In our study, we collected, and textually analyzed through various text mining techniques, three hundred refereed journal articles in the field of mobile learning from six scientific databases, namely: Springer, Wiley, Science Direct, SAGE, IEEE, and Cambridge. The selection of the collected articles was based on the criteria that all these articles should incorporate mobile learning as the main component in the higher educational context. Experimental results indicated that Springer database represents the main source for research articles in the field of mobile education for the medical domain. Moreover, results where the similarity among topics could not be detected were due to either their interrelations or ambiguity in their meaning. Furthermore, findings showed that there was a booming increase in the number of published articles during the years 2015 through 2016. In addition, other implications and future perspectives are presented in the study.

125 citations