Dissimilarity algorithm on conceptual graphs to mine text outliers

doi:10.1109/DMO.2009.5341910

Citations

PDF

Open Access

More filters

[...]

Anna Formica

01 Jan 2006

TL;DR: A method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author that allows a higher correlation with human judgement than other proposals for evaluating concept similarity in a taxonomy defined in the literature.

...read moreread less

Abstract: Formal Concept Analysis (FCA) is revealing interesting in supporting difficult activities that are becoming fundamental in the development of the Semantic Web. Assessing concept similarity is one of such activities since it allows the identification of different concepts that are semantically close. In this paper, a method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author. The refinement consists in determining the similarity of concept descriptors (attributes) by using the information content approach, rather than relying on human domain expertise. The information content approach which has been adopted allows a higher correlation with human judgement than other proposals for evaluating concept similarity in a taxonomy defined in the literature.

...read moreread less

124 citations

Journal Article•DOI•

Emerging directions in predictive text mining

[...]

Nitin Indurkhya¹•Institutions (1)

University of New South Wales¹

01 Jul 2015-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Six main directions are identified where research in text mining is heading: Deep Learning, Topic Models, Graphical Modeling, Summarization, Sentiment Analysis, Learning from Unlabeled Text, and data‐centric directions are likely to influence future research in Natural Language Processing.

...read moreread less

Abstract: In recent years, Text Mining has seen a tremendous spurt of growth as data scientists focus their attention on analyzing unstructured data. The main drivers for this growth have been big data as well as complex applications where the information in the text is often combined with other kinds of information in building predictive models. These applications require highly efficient and scalable algorithms to meet the overall performance demands. In this context, six main directions are identified where research in text mining is heading: Deep Learning, Topic Models, Graphical Modeling, Summarization, Sentiment Analysis, Learning from Unlabeled Text. Each direction has its own motivations and goals. There is some overlap of concepts because of the common themes of text and prediction. The predictive models involved are typically ones that involve meta-information or tags that could be added to the text. These tags can then be used in other text processing tasks such as information extraction. While the boundary between the fields of Text Mining and Natural Language Processing is becoming increasingly blurry, the importance of predictive models for various applications involving text means there is still substantial growth potential within the traditional sub-fields of text mining. These data-centric directions are also likely to influence future research in Natural Language Processing, especially in resource-poor languages and in multilingual texts. WIREs Data Mining Knowl Discov 2015, 5:155-164. doi: 10.1002/widm.1154

...read moreread less

20 citations

Proceedings Article•DOI•

Bess or xbest: Mining the Malaysian online reviews

[...]

Norlela Samsudin¹, Mazidah Puteh¹, Abdul Razak Hamdan²•Institutions (2)

Universiti Teknologi MARA¹, National University of Malaysia²

28 Jun 2011

TL;DR: An exploratory research on opinion mining of online movie reviews collected from several forums and blogs written by the Malaysian reviewers shows that the performance of machine learning techniques without any preprocessing of the micro-texts or feature selection is quite low.

...read moreread less

Abstract: Advancement in information and technology facilities especially the Internet has changed the way we communicate and express opinions or sentiments on services or products that we consume. Opinion mining aims to automate the process of mining opinions into the positive or the negative views. It will benefit both the customers and the sellers in identifying the best product or service. Although there are researchers that explore new techniques of identifying the sentiment polarization, few works have been done on opinion mining created by the Malaysian reviewers. The same scenario happens to micro-text. Therefore in this study, we conduct an exploratory research on opinion mining of online movie reviews collected from several forums and blogs written by the Malaysian. The experiment data are tested using machine learning classifiers i.e. Support VectorMachine, Naive Baiyes and k-Nearest Neighbor. The result illustrates that the performance of these machine learning techniques without any preprocessing of the micro-texts or feature selection is quite low. Therefore additional steps are required in order to mine the opinions from these data.

...read moreread less

13 citations

Journal Article•DOI•

Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

[...]

Siti Sakira Kamaruddin¹, Abdul Razak Hamdan², Azuraliza Abu Bakar², Fauzias Mat Nor²•Institutions (2)

Universiti Utara Malaysia¹, National University of Malaysia²

01 May 2012

TL;DR: This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection, which has managed to identify deviating sentences and it strongly correlates with expert judgments.

...read moreread less

Abstract: The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format CGIF --a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

...read moreread less

6 citations

Cites background from "Dissimilarity algorithm on conceptu..."

...Among these methods, CG has gained considerable attention due to various reasons: i.e. firstly, it simplifies the representation of relations of any arity compared to other network language that uses labelled arc....
[...]
...Thirdly, they are adequate to represent accurate and highly structured information beyond the keyword approach [12] and fourthly, both semantic and episodic association between words can be represented using CGs [13]....
[...]

Journal Article•DOI•

Research on Technique of Extracting Knowledge from Maintenance Experiences Based on Conceptual Graph

[...]

Yanbin Liu¹, Yanling Qian¹, Long Wang¹, Tengfei Xu¹•Institutions (1)

National University of Defense Technology¹

01 Jan 2015

TL;DR: The method of conceptual graph is introduced first, and then it is switched to expressing knowledge extracted from experiences accumulated in maintenance actions, through which maintenance staff could find out the most similar case when new fault appears.

...read moreread less

Abstract: Experience knowledge is extremely important in maintenance domain. However, it is difficult to express and extract this kind of knowledge. Conceptual Graph is a new and powerful visual knowledge represen- tation method. This paper proposes one technique based on conceptual graph to extract knowledge from experi- ences accumulated in maintenance actions. This technique introduces conceptual graph to maintenance domain. With VE distribution pump as an example, the method of conceptual graph is introduced first, and then it is ap- plied to expressing knowledge extracted from experiences accumulated in maintenance actions. Finally, the sim- ilarity between graphs of new fault case and base cases is computed, through which maintenance staff could find out the most similar case when new fault appears. The causes and solutions of the most similar case could assist maintenance staff to resolve new faults.

...read moreread less

2 citations

Dissimilarity algorithm on conceptual graphs to mine text outliers

Citations

Cites background from "Dissimilarity algorithm on conceptu..."

References

"Dissimilarity algorithm on conceptu..." refers background in this paper

"Dissimilarity algorithm on conceptu..." refers methods in this paper

"Dissimilarity algorithm on conceptu..." refers methods in this paper

Related Papers (5)