scispace - formally typeset
Search or ask a question

Showing papers by "Geun-Sik Jo published in 2008"


Book ChapterDOI
03 Sep 2008
TL;DR: A methodology of WordNet-based distance measures is proposed, and the meaning of concepts of upper ontologies are applied to an ontology integration process by providing semantic network called OnConceptSNet.
Abstract: While there is a large body of previous work focused on WordNet-based for finding the semantic similarity of concepts and words, the application of these word oriented methods to ontology integration tasks has not been yet explored. In this paper, we propose a methodology of WordNet-based distance measures, and we apply the meaning of concepts of upper ontologies to an ontology integration process by providing semantic network called OnConceptSNet. It is a semantic network of concepts of ontologies in which relations between concepts derived from upper ontology WordNet. We also describe a methodology for conflict in ontology integration process.

37 citations


Proceedings Article
01 Jan 2008
TL;DR: This paper applies word oriented methods to ontology integration tasks in which a noun phrase is analyzed to identify its head noun, which is useful to avoid wrong relations between entities and proposes a collaborative acquisition algorithm that combines WordNet-based and Text corpus.
Abstract: Most information in the world exists in the format of text, such as news articles and web pages. Different lines of research have been conducted to discover, understand and access knowledge about real-world entities and relations from text. However, the application of these word oriented methods to ontology integration tasks has not been yet explored. In this paper, we apply these word oriented methods to ontology integration tasks in which we analyze a noun phrase (NP) to identify its head noun, which is useful to avoid wrong relations between entities. We also propose a collaborative acquisition algorithm that combines WordNet-based and Text corpus.

16 citations


Proceedings ArticleDOI
09 Dec 2008
TL;DR: In this paper, the authors apply word oriented methods to ontology integration tasks in which they analyze a noun phrase (NP) to identify its head noun, which is useful to avoid wrong relations between entities.
Abstract: Most information in the world exists in the format of text, such as news articles and web pages. Different lines of research have been conducted to discover, understand and access knowledge about real-world entities and relations from text. However, the application of these word oriented methods to ontology integration tasks has not been yet explored. In this paper, we apply these word oriented methods to ontology integration tasks in which we analyze a noun phrase (NP) to identify its head noun, which is useful to avoid wrong relations between entities. We also propose a collaborative acquisition algorithm that combines WordNet-based and Text corpus.

15 citations


Proceedings ArticleDOI
10 Jul 2008
TL;DR: An adaptive learning system that filter spam emails based on user's action pattern as time goes by and relationship between user's actions such as what action is took after one action and how long does it take is considered.
Abstract: According to continuous increasing of spam email, 92.6% of recent total email is known spam email. In this research, we will show an adaptive learning system that filter spam emails based on user's action pattern as time goes by. In this paper, we consider relationship between user's actions such as what action is took after one action and how long does it take. They analyze that each action has how much meaning, and that it has an effect on filtering spam emails. And that in turn determines weight for each email. In experimentation, we will compare results of system of this research and weighted Bayesian classifier using real email data set. Also, we will show how to handle personalization for concept drift and adaptive learning.

10 citations


Proceedings ArticleDOI
10 Jul 2008
TL;DR: This paper presents a framework for displaying synchronized text around a speaker in video, which identifies speakers using face detection technologies and subsequently detects a subtitles region and adapts DFXP, which is interoperable timed text format of W3C, to support interchanging with existing legacy system.
Abstract: With the increasing popularity of online video, efficient captioning and displaying the captioned text (subtitles) have also been issued with the accessibility. However, in most cases, subtitles are shown on a separate display below a screen. As a result, some viewers lose condensed information about the contents of the video. To elevate readability and visibility of viewers, in this paper, we present a framework for displaying synchronized text around a speaker in video. The proposed approach first identifies speakers using face detection technologies and subsequently detects a subtitles region. In addition, we adapt DFXP, which is interoperable timed text format of W3C, to support interchanging with existing legacy system. In order to achieve smooth playback of multimedia presentation, such as SMIL and DFXP, a prototype system, namely MoNaPlayer, has been implemented. Our case studies show that the proposed system is feasible to several multimedia applications.

8 citations


Proceedings ArticleDOI
Taeho Jo1, Geun-Sik Jo1
10 Jul 2008
TL;DR: The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it into the specialized version where documents are encoded into not numerical vectors but alternative forms.
Abstract: This research proposes a modified version of single pass algorithm which is specialized for text clustering. Encoding documents into numerical vectors for using the traditional version of single pass algorithm causes the two main problems: huge dimensionality and sparse distribution. Therefore, in order to address the two problems, this research modifies the single pass algorithm into its version where documents are encoded into not numerical vectors but alternative forms. In the proposed version, documents are mapped into tables and a similarity of two documents is computed by comparing their tables with each other. The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it into the specialized version.

7 citations


Proceedings ArticleDOI
10 Jul 2008
TL;DR: In this article, the authors apply semantic web service technology, which provides a promising common interoperable framework in which information is given well-defined meaning in unambiguous and machine-interpretable form by using ontology such that data and services can be used for more effective discovery, automation, integration, and reuse across various applications.
Abstract: In the logistics, there are the variety of available data formats. This should make it difficult to quickly implement a system to communicate with other application systems. Furthermore, once the system which can handle the data formats agreed with each other has been implemented, a considerable amount of effort is still required to reformat the data for utilization in any other services like which shippers are able to monitor and track their freight on Web. To overcome these problems, we apply semantic Web service technology, which provides a promising common interoperable framework in which information is given well-defined meaning in unambiguous and machine-interpretable form by using ontology such that data and services can be used for more effective discovery, automation, integration, and reuse across various applications. Finally, we have shown the reasonability of adopting semantic Web service as a case study.

3 citations


Proceedings ArticleDOI
10 Jul 2008
TL;DR: A new automatic Web information extractor called dasiacatch crawlerpsila which uses style sheet to extract interesting data on a target site which gives over 90% of accuracy on average.
Abstract: Dataset should be free from noise for carrying out tasks of Web mining well. Generally commercial Web pages have a lot of noise which are not relevant to main contents such as navigation panel, advertisements, copyright notices or other service links. In this paper, we present a new automatic Web information extractor called dasiacatch crawlerpsila which uses style sheet to extract interesting data on a target site. Style sheets are generally used for uniform presentation of Web pages in a commercial Web site. To execute catch Crawler, a user lets catch Crawler know the interesting data area by clicking the data on a Web page. The catch Crawler automatically perceives the class of style sheet for the data and generates dataset from the whole Web site following the same style sheet class. Experimental results show that our approach for extracting noiseless Web data gives over 90% of accuracy on average.

2 citations


Book ChapterDOI
02 Dec 2008
TL;DR: This work proposes a collaborative approach to user modeling for generating personalized recommendations for users that first discovers useful and meaningful patterns of users, and then enriches a personal model with collaboration from other similar users.
Abstract: Recommender systems, which have emerged in response to the problem of information overload, provide users with recommendations of contents that are likely to fit their needs One notable challenge in a recommender system is the cold start problem To address this issue, we propose a collaborative approach to user modeling for generating personalized recommendations for users Our approach first discovers useful and meaningful patterns of users, and then enriches a personal model with collaboration from other similar users In order to evaluate the performance of our approach, we compare experimental results with those of a probabilistic learning model, a user-based collaborative filtering, and vector space model We present experimental results that show how our model performs better than existing work

2 citations


01 Jan 2008
TL;DR: A new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network and generates association rules based on clusters is proposed.
Abstract: Data mining is defined as the process of discovering meaningful and useful pattern in large volumes of data. In particular, finding associations rules between items in a database of customer transactions has become an important thing. Some data structures and algorithms had been proposed for storing meaningful information compressed from an original database to find frequent itemsets since Apriori algorithm. Though existing method find all association rules, we must have a lot of process to analyze association rules because there are too many rules. In this paper, we propose a new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network. In order to utilize FPN, We constitute FPN using item's frequency. And then we use a clustering method to group the vertices on the network into clusters so that the intracluster similarity is maximized and the intercluster similarity is minimized. We generate association rules based on clusters. Our experiments showed accuracy of clustering items on the network using confidence, correlation and edge weight similarity methods. And We generated association rules using clusters and compare traditional and our method. From the results, the confidence similarity had a strong influence than others on the frequent pattern network. And FPN had a flexibility to minimum support value.

1 citations


Proceedings ArticleDOI
Taeho Jo1, Geun-Sik Jo1
10 Jul 2008
TL;DR: The goal of the research is to improve the performance of text categorization by solving the two problems of huge dimensionality and sparse distribution.
Abstract: This research proposes an alternative approach to machine learning based ones for categorizing news articles given as in plain texts. In order to use one of machine learning based approaches for the task, documents should be encoded into numerical vectors; it causes two problems: huge dimensionality and sparse distribution. The proposed approach is intended to address the two problems. In other words, the two problems are avoided by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of the research is to improve the performance of text categorization by solving the two problems.

01 Mar 2008
TL;DR: In this article, a novel method that uses a diversity metric to select the dissimilar items among the recommended items from collaborative filtering, which together with the input when fed into content space let us improve and include new items in the recommendation.
Abstract: Combining collaborative filtering with some other technique is most common in hybrid recommender systems. As many recommended items from collaborative filtering seem to be similar with respect to content, the collaborative-content hybrid system suffers in terms of quality recommendation and recommending new items as well. To alleviate such problem, we have developed a novel method that uses a diversity metric to select the dissimilar items among the recommended items from collaborative filtering, which together with the input when fed into content space let us improve and include new items in the recommendation. We present experimental results on movielens dataset that shows how our approach performs better than simple content-based system and naive hybrid system.