Knowledge Discovery from Semi-Structured Data for Conceptual Organization

doi:10.1109/WI-IATW.2006.86

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Notice of Violation of IEEE Publication Principles Knowledge Discovery from Virtual Enterprise Model Based on Semantic Annotation

[...]

Chengzhu Sun¹, Xiaofei Xu¹, Xiangyang Li, Shengchun Deng•Institutions (1)

Harbin Institute of Technology¹

18 Oct 2008

TL;DR: A graphic ontology representation schema for virtual enterprise model is described, which is called ontology structure graph (OSG) based on applications of semantic annotation, and the process of knowledge discovery in thevirtual enterprise model base is given.

...read moreread less

Abstract: Discovering knowledge from virtual enterprise model is becoming increasingly important, as numerical models established for virtual enterprise are difficult to support interoperability of virtual enterprise. To solve this problem, a knowledge discovery method based on semantic annotation is put forward in this paper. A graphic ontology representation schema for virtual enterprise model is described, which is called ontology structure graph (OSG). Based on applications of semantic annotation, the process of knowledge discovery in the virtual enterprise model base is given and activities such as model selection, semantic annotation, data transformation, knowledge extraction and ontology interoperation in knowledge discovery process are illustrated in detail. Then, several critical issues influencing knowledge discovery are explained, including the organization of domain vocabulary, the definition of semantic annotation rules and semantic affinity function, the formulation of reference ontology. Finally, an instance is given to demonstrate the knowledge discovery method and results of knowledge discovery are presented.

...read moreread less

5 citations

Cites background from "Knowledge Discovery from Semi-Struc..."

...[6] proposed a knowledge discovery schema for conceptual organization of semi-structured data like emails, bibliographic databases, customer complaints, video descriptions etc....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

A mathematical theory of communication

[...]

Claude E. Shannon

01 Jul 1948-Bell System Technical Journal

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.

...read moreread less

Abstract: In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases.

...read moreread less

65,425 citations

Journal Article•DOI•

Machine learning in automated text categorization

[...]

Fabrizio Sebastiani

01 Mar 2002-ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

7,539 citations

Journal Article•DOI•

Photobook: content-based manipulation of image databases

[...]

Alex Pentland¹, Rosalind W. Picard¹, Stan Sclaroff², Stan Sclaroff¹•Institutions (2)

Massachusetts Institute of Technology¹, Boston University²

01 Jun 1996-International Journal of Computer Vision

TL;DR: The Photobook system is described, which is a set of interactive tools for browsing and searching images and image sequences that make direct use of the image content rather than relying on text annotations to provide a sophisticated browsing and search capability.

...read moreread less

Abstract: We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These query tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on text annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We discuss three types of Photobook descriptions in detail: one that allows search based on appearance, one that uses 2-D shape, and a third that allows search based on textural properties. These image content descriptions can be combined with each other and with text-based descriptions to provide a sophisticated browsing and search capability. In this paper we demonstrate Photobook on databases containing images of people, video keyframes, hand tools, fish, texture swatches, and 3-D medical data.

...read moreread less

1,748 citations

Journal Article•DOI•

Concept Decompositions for Large Sparse Text Data Using Clustering

[...]

Inderjit S. Dhillon¹, Dharmendra S. Modha²•Institutions (2)

University of Texas at Austin¹, IBM²

01 Jan 2001-Machine Learning

TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.

...read moreread less

Abstract: Unlabeled document collections are becoming increasingly common and availables mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors–a few thousand dimensions and a sparsity of 95 to 99% is typical. In this paper, we study a certain spherical k-means algorithm for clustering such document vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. As our first contribution, we empirically demonstrate that, owing to the high-dimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain “fractal-like” and “self-similar” behavior. As our second contribution, we introduce concept decompositions to approximate the matrix of document vectorss these decompositions are obtained by taking the least-squares approximation onto the linear subspace spanned by all the concept vectors. We empirically establish that the approximation errors of the concept decompositions are close to the best possible, namely, to truncated singular value decompositions. As our third contribution, we show that the concept vectors are localized in the word space, are sparse, and tend towards orthonormality. In contrast, the singular vectors are global in the word space and are dense. Nonetheless, we observe the surprising fact that the linear subspaces spanned by the concept vectors and the leading singular vectors are quite close in the sense of small principal angles between them. In conclusion, the concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets.

...read moreread less

1,398 citations

Journal Article•DOI•

Concept mapping: A useful tool for science education

[...]

Joseph D. Novak¹•Institutions (1)

Cornell University¹

01 Dec 1990-Journal of Research in Science Teaching

TL;DR: The authors describes the genesis and development of concept mapping as a useful tool for science education and offers an overview of the contents of this special issue and comments on the current state of knowledge representation.

...read moreread less

Abstract: This article describes the genesis and development of concept mapping as a useful tool for science education. It also offers an overview of the contents of this special issue and comments on the current state of knowledge representation. Suggestions for further research are made throughout the article.

...read moreread less

995 citations

Knowledge Discovery from Semi-Structured Data for Conceptual Organization

Citations

Cites background from "Knowledge Discovery from Semi-Struc..."

References

Related Papers (5)