Home
/
Authors
/
Sang-Jo Lee

Author

Sang-Jo Lee

Bio: Sang-Jo Lee is an academic researcher from Kyungpook National University. The author has contributed to research in topics: Ontology (information science) & Ontology-based data integration. The author has an hindex of 11, co-authored 57 publications receiving 470 citations.

Papers published on a yearly basis

2021
2019
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Stability and stabilization of T-S fuzzy systems with time-varying delays via augmented Lyapunov-Krasovskii functionals

[...]

Oh-Min Kwon¹, Myeong-Jin Park¹, Ju H. Park², Sang-Jo Lee³•Institutions (3)

Chungbuk National University¹, Yeungnam University², Kyungpook National University³

01 Dec 2016-Information Sciences

TL;DR: The advantage and effectiveness of the proposed criteria will be shown through the comparison of maximum delay bounds with some results obtained by recently published papers via four numerical examples.

...read moreread less

173 citations

Proceedings Article•DOI•

Automatic classification of Web pages based on the concept of domain ontology

[...]

Mu-Hee Song¹, Sooyeon Lim¹, Dong-Jin Kang¹, Sang-Jo Lee¹•Institutions (1)

Kyungpook National University¹

15 Dec 2005

TL;DR: Web pages are classified in real time not with experimental data or a learning process, but by similar calculations between the terminology information extracted from Web pages and ontology categories, which results in a more accurate document classification.

...read moreread less

Abstract: The use of ontology in order to provide a mechanism to enable machine reasoning has continuously increased during the last few years. This paper suggests an automated method for document classification using an ontology, which expresses terminology information and vocabulary contained in Web documents by way of a hierarchical structure. Ontology-based document classification involves determining document features that represent the Web documents most accurately, and classifying them into the most appropriate categories after analyzing their contents by using at least two predefined categories per given document features. In this paper, Web pages are classified in real time not with experimental data or a learning process, but by similar calculations between the terminology information extracted from Web pages and ontology categories. This results in a more accurate document classification since the meanings and relationships unique to each document are determined.

...read moreread less

40 citations

Journal Article•DOI•

Document indexing: a concept-based approach to term weight estimation

[...]

Bo-Yeong Kang¹, Sang-Jo Lee¹•Institutions (1)

Kyungpook National University¹

01 Sep 2005-Information Processing and Management

TL;DR: A new indexing formalism is developed that considers not only the terms in a document, but also the concepts, and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document.

...read moreread less

Abstract: Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.

...read moreread less

40 citations

Journal Article•DOI•

Exploiting concept clusters for content-based information retrieval

[...]

Bo-Yeong Kang¹, Dae-Won Kim², Sang-Jo Lee¹•Institutions (2)

Kyungpook National University¹, KAIST²

25 Feb 2005-Information Sciences

TL;DR: Through experiments on the TREC-2 collection of Wall Street Journal documents, it is shown that the proposed indexing formalism outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents.

...read moreread less

37 citations

Proceedings Article•DOI•

An automatic translation of tags for multimedia contents using folksonomy networks

[...]

Tae-Gil Noh¹, Seong-Bae Park¹, Hee-Geun Yoon¹, Sang-Jo Lee¹, Se-Young Park¹ - Show less +1 more•Institutions (1)

Kyungpook National University¹

19 Jul 2009

TL;DR: A novel method to translate tags attached to multimedia contents for cross-language retrieval by selecting the optimal one from possible candidates based on a network similarity even when neither the textual contexts nor sophisticated language resources are available.

...read moreread less

Abstract: This paper proposes a novel method to translate tags attached to multimedia contents for cross-language retrieval. The main issue in this problem is the sense disambiguation of tags given with few textual contexts. In order to solve this problem, the proposed method represents both tags and its translation candidates as networks of co-occurring tags since a network allows richer expression of contexts than other expressions such as co-occurrence vectors. The method translates a tag by selecting the optimal one from possible candidates based on a network similarity even when neither the textual contexts nor sophisticated language resources are available. The experiments on the MIR Flickr-2008 test set show that the proposed method achieves 90.44% accuracy in translating tags from English into German, which is significantly higher than the baseline methods of a frequency based translation and a co-occurrence-based translation.

...read moreread less

24 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

A Review of Machine Learning Algorithms for Text-Documents Classification

[...]

Baharum Baharudin, Lam Hong Lee, Khairullah Khan

02 Jan 2010-Journal of Advances in Information Technology

TL;DR: This paper provides a review of the theory and methods of document classification and text mining, focusing on the existing techniques and methodologies, focused mainly on text representation and machine learning techniques.

...read moreread less

Abstract: With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and know- ledge discovery. Proper classification of e-documents, online news, blogs, e-mails and digital libraries need text mining, machine learning and natural language processing tech- niques to get meaningful knowledge. The aim of this paper is to highlight the important techniques and methodologies that are employed in text documents classification, while at the same time making awareness of some of the interesting challenges that remain to be solved, focused mainly on text representation and machine learning techniques. This paper provides a review of the theory and methods of document classification and text mining, focusing on the existing litera- ture.

...read moreread less

546 citations

Proceedings Article•DOI•

New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative

[...]

Mark J. Huiskes¹, Bart Thomee¹, Michael S. Lew¹•Institutions (1)

Leiden University¹

29 Mar 2010

TL;DR: This paper provides an overview of the various strategies that were devised for automatic visual concept detection using the MIR Flickr collection, and discusses results from various experiments in combining social data and low-level content-based descriptors to improve the accuracy of visual concept classifiers.

...read moreread less

Abstract: The MIR Flickr collection consists of 25000 high-quality photographic images of thousands of Flickr users, made available under the Creative Commons license. The database includes all the original user tags and EXIF metadata. Additionally, detailed and accurate annotations are provided for topics corresponding to the most prominent visual concepts in the user tag data. The rich metadata allow for a wide variety of image retrieval benchmarking scenarios.In this paper, we provide an overview of the various strategies that were devised for automatic visual concept detection using the MIR Flickr collection. In particular we discuss results from various experiments in combining social data and low-level content-based descriptors to improve the accuracy of visual concept classifiers. Additionally, we present retrieval results obtained by relevance feedback methods, demonstrating (i) how their performance can be enhanced using features based on visual concept classifiers, and (ii) how their performance, based on small samples, can be measured relative to their large sample classifier counterparts.Additionally, we identify a number of promising trends and ideas in visual concept detection. To keep the MIR Flickr collection up-to-date on these developments, we have formulated two new initiatives to extend the original image collection. First, the collection will be extended to one million Creative Commons Flickr images. Second, a number of state-of-the-art content-based descriptors will be made available for the entire collection.

...read moreread less

374 citations

Proceedings Article•DOI•

Information storage and retrieval

[...]

Susan Brewer¹•Institutions (1)

General Electric¹

01 Sep 1959

TL;DR: The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful.

...read moreread less

Abstract: The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful. Certain letters simply cannot be followed by certain other letters and a search for them is senseless. Following this same line of reasoning, letters very frequently occur in the combinations that are germane to the particular language. The growing amount of alphanumeric information presently being stored on magnetic tape presents increasingly difficult problems in both the number of tape reels used and the time necessary to search this mass of information in order to extract pertinent literature. At the present time most of this literature on tape utilizes the standard IBM 6-bit code to express alphanumeric symbols. ~t is entirely feasible to record standard English literature on tape -be it professional abstracts or novels -using only approximately two-thirds of the binary bits utilized to represent the same piece of written material in the conventional code. This can be accomplished by setting up, in a 9-bit code, the 400-odd letter combinations occurring most frequently. A 9-bit representation allows the programmer to set up as many as 512 symbols, thus leaving sufficient leeway to assign symbols to the most frequentlyused words, mathematical symbols, professional expressions, that are expected to be encountered in the literature to be recorded. In addition, these relatively short 9-bit symbols can be assigned to all key words that it may be necessary to look for later, thereby accelerating any future library search.

...read moreread less

298 citations

Journal Article•DOI•

Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations

[...]

Seok Kee Lee¹, Yoon Ho Cho², Soung Hie Kim¹•Institutions (2)

KAIST¹, Kookmin University²

01 Jun 2010-Information Sciences

TL;DR: A CF-based recommendation methodology based on both implicit ratings and less ambitious ordinal scales is proposed, and a specific consensus model typically used in multi-criteria decision-making (MCDM) is employed to generate an ordinal scale-based customer profile.

...read moreread less

222 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

Collapse