scispace - formally typeset
Search or ask a question

Showing papers on "Concept mining published in 2018"


Journal ArticleDOI
TL;DR: How text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid is described.
Abstract: Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to ...

142 citations


Journal ArticleDOI
TL;DR: From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the Wireless Knowledge and wireless knowledge learning (WKL), and typical practices examples, to facilitate and open up more opportunities of W BD research and developments.
Abstract: Facing the development of future 5G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data (WBD) has tremendous value, and artificial intelligence (AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning (WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.

29 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: This work proposes a novel approach that mines concepts based on their occurrence contexts, by learning embedding vector representations that summarize the context information for each possible candidates, and use these embeddings to evaluate the concept's global quality and their fitness to each local context.
Abstract: In this work, we study the problem of concept mining, which serves as the first step in transforming unstructured text into structured information, and supports downstream analytical tasks such as information extraction, organization, recommendation and search. Previous work mainly relies on statistical signals, existing knowledge bases, or predefined linguistic patterns. In this work, we propose a novel approach that mines concepts based on their occurrence contexts, by learning embedding vector representations that summarize the context information for each possible candidates, and use these embeddings to evaluate the concept's global quality and their fitness to each local context. Experiments over several real-world corpora demonstrate the superior performance of our method. A publicly available implementation is provided at https://github.com/kleeeeea/ECON.

13 citations


Journal ArticleDOI
TL;DR: Experiments indicate that the proposed cluster based mining technique achieves promising results in comparison with the other well-known methods, and addresses effectiveness, robustness and efficiency for a high-dimensional multimedia database.
Abstract: With rapid innovations in digital technology and cloud computing off late, there has been a huge volume of research in the area of web based storage, cloud management and mining of data from the cloud Large volumes of data sets are being stored, processed in either virtual or physical storage and processing equipments on a daily basis Hence, there is a continuous need for research in these areas to minimize the computational complexity and subsequently reduce the time and cost factors The proposed research paper focuses towards handling and mining of multimedia data in a data base which is a mixed composition of data in the form of graphic arts and pictures, hyper text, text data, video or audio Since large amounts of storage are required for audio and video data in general, the management and mining of such data from the multimedia data base needs special attention Experimental observations using well known data sets of varying features and dimensions indicate that the proposed cluster based mining technique achieves promising results in comparison with the other well-known methods Every attribute denoting the efficiency of the mining process have been compared component wise with recent mining techniques in the past The proposed system addresses effectiveness, robustness and efficiency for a high-dimensional multimedia database

12 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: A model is devised to combine text mining and concept mining for large systems and shows a significant improvement in terms of accuracy, scalability, flexibility and efficiency over the contemporary methods existing in the literature.
Abstract: In the agile application development environment, automatically identifying relevant components in a large complex software system for software maintenance is still remain a research problem with the proliferation of software applications. Earlier, concept mining with formal concept analysis was one of the commonly applied techniques for legacy software systems of small to medium size. Recently, text mining is being widely used for locating features or concerns in a large complex software system. Nevertheless, the literature study reveals that combining text mining with other techniques always yield better accuracy in locating features. Even though it is efficient, applying formal concept analysis on the large systems poses limitation due to its exponential time complexity in constructing concept lattices. In this research work, a model is devised to combine text mining and concept mining for large systems. The unsupervised machine learning technique, Latent Dirichlet Allocation modeling also called as Topic Modeling is used to reduce the feature space on which K-Means clustering is applied to cluster the related documents and formal concept analysis is carried out on individual clusters. Three open source software systems namely JEdit, ArgoUML and JabRef are considered for the experimental study. The empirical evaluation of feature location measure of the proposed model shows a significant improvement in terms of accuracy, scalability, flexibility and efficiency over the contemporary methods existing in the literature.

3 citations


Journal ArticleDOI
TL;DR: The results showed that information, marketing, and strategy are the main elements of the CI that, along with other prerequisites, can lead to the CI and, consequently, the economic development, competitive advantage, and sustainability in market.
Abstract: Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intelligence" in the valid databases and search engines; then, through reviewing the topic, abstract, and main text of the articles as well as screening the articles in several steps, the authors eventually selected 135 relevant articles in order to perform the text mining process. In the second step, pre-processing of the data was carried out. In the third step, using non-hierarchical cluster analysis (k-means), 5 optimum clusters were obtained based on the Davies–Bouldin index, for each of which a word cloud was drawn; then, the association rules of each cluster was extracted and analyzed using the indices of support, confidence, and lift. The results indicated the increased interest in researches on CI in recent years and tangibility of the strong and weak presence of the developed and developing countries in formation of the scientific products; further, the results showed that information, marketing, and strategy are the main elements of the CI that, along with other prerequisites, can lead to the CI and, consequently, the economic development, competitive advantage, and sustainability in market.

2 citations


Posted ContentDOI
TL;DR: Q-Map as mentioned in this paper is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently, which is backed by an effective mining technique which is indexed on curated knowledge sources, that is both fast and configurable.
Abstract: Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

2 citations


Posted Content
30 Apr 2018
TL;DR: Q-Map is presented, which is a simple yet powerful system that can sift through datasets to retrieve structured information aggressively and efficiently and is backed by an effective mining algorithm based on curated knowledge sources, that is both fast and configurable.
Abstract: Over the past decade, there has been a steep rise in data driven analysis in major areas of medicine, such as, clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Also, there are various ongoing research efforts in the operational and financial fields using techniques such as demand forecasting, convex optimization. Most of the data used in these research applications are well-structured and available in numerical or categorical formats which can be used for experiments directly. On the opposite end, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature. These can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written free text format and neither have any relational model nor any standard grammatical structure. An important step in utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. The unregulated format coupled with massive size of datasets makes the mining process a monumental task requiring robust algorithms supported by ample hardware resources and computing power. In this paper, we present Q-Map, which is a simple yet powerful system that can sift through these datasets to retrieve structured information aggressively and efficiently. It is backed by an effective mining algorithm based on curated knowledge sources, that is both fast and configurable. We also present its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval.

2 citations


01 Aug 2018
TL;DR: The authors present Q-Map, a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently, backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable.
Abstract: Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.